BIO_sendmmsg/BIO_recvmmsg (API only) #18923

hlandau · 2022-08-01T09:43:20Z

To make merging this easier, as a first step I've separated out the API into a separate PR. There are no implementations provided here, though a third party could do so using the BIO APIs.

An updated version of the BIO_dgram implementation PR without the Windows compatibility issues will follow.

crypto/bio/bio_lib.c

doc/man3/BIO_sendmmsg.pod

mattcaswell · 2022-08-03T08:07:28Z

doc/man3/BIO_sendmmsg.pod

+a packed system error code is returned. The macro BIO_IS_ERRNO() evaluates to 1
+if a BIO_sendmmsg() or BIO_recvmmsg() return value represents a system error
+code. BIO_UNPACK_ERRNO() should then be used to obtain the system error code
+(EWOULDBLOCK, etc.) from the return value.


I wonder why we do this. In other BIO calls we just signal error and let the caller examine errno directly.

We would need to add set_last_socket_error but I see no portability issues here.
Or we could add an int *err parameter and avoid depending on thread local state.

If we ever use this for DTLS then we will need to ensure errno is set for a system call failure because that is what existing applications will expect. Plausibly we could do that in libssl rather than in the BIO code. So, I'm in two minds about this.

Why would we need set_last_socket_error?

hlandau · 2022-08-03T08:38:26Z

Updated.

crypto/bio/bio_lib.c

mattcaswell · 2022-08-03T09:11:13Z

doc/man3/BIO_sendmmsg.pod

+a packed system error code is returned. The macro BIO_IS_ERRNO() evaluates to 1
+if a BIO_sendmmsg() or BIO_recvmmsg() return value represents a system error
+code. BIO_UNPACK_ERRNO() should then be used to obtain the system error code
+(EWOULDBLOCK, etc.) from the return value.


If we ever use this for DTLS then we will need to ensure errno is set for a system call failure because that is what existing applications will expect. Plausibly we could do that in libssl rather than in the BIO code. So, I'm in two minds about this.

Why would we need set_last_socket_error?

hlandau · 2022-08-04T13:27:02Z

Alternatives to the current API:

Add an int *err argument to the function
Use errno
Use get/set_last_sys_error (which is either errno or Get/SetLastError)
Use get_last_socket_error (which is either errno or WSAGetLastError) and add set_last_socket_error

I don't see much use of errno in the code. I would prefer (1) by a long mile. I'd prefer to avoid adding more dependencies on thread locals.

Not sure what you mean about DTLS. This is a new API without existing users. If there are semantics of the DTLS APIs we need to preserve those belong in libssl...

hlandau · 2022-08-04T14:27:12Z

I've updated this to have a new int *err parameter. I think this is looking a lot better now. The semantics of the return value are also a lot clearer now and only indicate number of messages processed.

mattcaswell · 2022-08-04T14:37:15Z

Not sure what you mean about DTLS. This is a new API without existing users. If there are semantics of the DTLS APIs we need to preserve those belong in libssl...

Its a new API that could be used by some future alteration to the DTLS code. The libssl DTLS code currently calls the BIO_read function to receive a message. The current API is that the end user calls SSL_read and if a failure occurs at the system call level then the end user expects to receive an error from SSL_read(), with SSL_get_error() returning SSL_ERROR_SYSCALL and errno set to an appropriate error code. Libssl does not explicitly do anything with errno. That's the behaviour libssl in turn inherits from BIO_read(), i.e. if a syscall fails then an error return is signalled and errno is set accordingly. In fact this behaviour only applies to BIOs that cause errno to be set at all (such as BIO_dgram) but wouldn't apply to BIOs that don't make syscalls (such as a mem BIO). Nonetheless, lots of applications rely on it.

If we at some point in the future modify libssl to call BIO_recvmmsg instead of BIO_read then we still need to preserve the previous behaviour back to the caller of SSL_read, i.e. signal an error return and ensure errno is set. It would be easiest if the underlying BIO calls BIO_read vs BIO_recvmmsg worked in the same way with regards to signalling errors. This makes it an easier change in libssl.

Modifying libssl to set errno instead would work, but is kind of weird and not ideal. The errno behaviour is specific to BIOs that actually make syscalls and doesn't apply to (say) a mem BIO. Libssl doesn't actually know what kind of BIO it is dealing with.

I've updated this to have a new int *err parameter. I think this is looking a lot better now. The semantics of the return value are also a lot clearer now and only indicate number of messages processed.

How is err interpreted for BIOs that don't make sys calls (e.g. a mem BIO)?

hlandau · 2022-08-05T06:48:07Z

So basically we have a situation where:

We have accidentally exposed side effects of internal implementation details (errno);
Applications have ended up depending on this abstraction leak, in an instance of Hyrum's law.
We have system specific state (errno) which is leaked to the application sometimes (if a BIO which makes syscalls is used) and which doesn't other times.
The BIO API contract, as well as the libssl API contract, has accidentally incorporated the state of errno.

This is unfortunate. I feel like we should fix this in 4.0, probably by zeroing errno after syscalls. Nonetheless, we have to support the current expectation...

Some things worth noting, however. If applications are depending on errno in this way, can we even be confident that switching from recv(3) to recvmmsg(3) preserves these semantics? What if the OS uses the system-specific error codes differently?

I suspect the safest thing would be to make use of sendmmsg/recvmmsg in libssl opt-in, requiring the application to explicitly enable the new semantics (and forego use of the errno side channel).

int *err would use errno.h values, regardless of whether implemented via syscalls or in memory.

hlandau · 2022-08-05T06:56:46Z

(But leaving all of the above aside, since int *err provides a system error code, if you want to have libssl's calls to BIO_readmmsg work the same way, you would only have to add the line errno = err; after each call site. I don't think that's a problem and avoids introducing this unfortunate side channel into the new BIO API.)

mattcaswell · 2022-08-05T07:45:47Z

We have accidentally exposed side effects of internal implementation details (errno);
Applications have ended up depending on this abstraction leak, in an instance of Hyrum's law.
We have system specific state (errno) which is leaked to the application sometimes (if a BIO which makes syscalls is used) and which doesn't other times.
The BIO API contract, as well as the libssl API contract, has accidentally incorporated the state of errno.

That's a fair summary. Although whether it was accidental or not is lost in the mists of time. I think this behaviour has probably been there since SSLeay times (i.e. before OpenSSL even existed). It is so entrenched that it is now documented. From the man page for SSL_get_error():

SSL_ERROR_SYSCALL
Some non-recoverable, fatal I/O error occurred. The OpenSSL error queue may contain more information on the error. For socket I/O on Unix systems, consult errno for details. If this error occurs then no further I/O operations should be performed on the connection and SSL_shutdown() must not be called.

https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html

hlandau · 2022-08-05T08:15:08Z

That's some interesting history...

So basically the question for now is whether errno = err; is in BIO_sendmmsg or libssl. Since it's just one line, it doesn't make a massive difference either way, but I'd argue it should be libssl doing this:

If we put this in BIO_sendmmsg things like this will probably happen accidentally again in the future.
Callers of BIO_sendmmsg other than our existing SSL code will forget about the errno channel, and then callers of those callers end up depending on it. Will we always remember to put errno = 0; after every call to BIO_sendmmsg or BIO_recvmmsg in our QUIC implementation, etc. If we don't, this kind of unintended dependency is going to happen again, somehow...
The object of this is to preserve the documented semantics of the libssl API, so it's logical for libssl to do this.
Having it done explicitly in libssl also makes it less likely for this behaviour to be accidentally broken later, and serves to document this in code, as currently this is rather subtle. Nor is it burdensome, since it's literally one line.

The only concern I have is making libssl touch errno, which doesn't necessarily exist on all platforms (and needs to be WSASetLastError on Win32). Though it looks like it already does in one case. But this can be readily abstracted by adding a set_last_socket_error to match the existing get_last_socket_error.

So all libssl would have to do, if it decides to preserve these semantics when using sendmmsg/recvmmsg would be:

int err;

num_processed = BIO_sendmmsg(..., &err);
set_last_socket_error(err); /* The OS socket error variable is part of the SSL_read API */

mattcaswell · 2022-08-05T08:23:17Z

I can live with libssl doing this.

However, I have another thought (which may be slightly contradictory to my previous thought, but still needs to be considered). Should we be using OS specific error values at all in a new API? Or should we convert them to an OpenSSL error value of some form. Part of the point of the BIO abstraction is to abstract away system specific details. Applications written to use OpenSSL APIs should be portable. If you write an OpenSSL app on one platform, it should more-or-less work unmodified on a different platform (at least as far as the OpenSSL specific bits are concerned). By incorporating system specific error codes into the API we are breaking that. Users of the API will have to have different code to handle different platforms that they run on.

beldmit · 2022-08-05T08:28:16Z

Matt,

I don't think converting OS specific errors is a good idea. When you have os specific error, you can google the possible reasons more effectively.

hlandau · 2022-08-05T08:59:50Z

Option 1: Use our own error codes

I was thinking about this yesterday. It does seem like it would be better in principle to use our own error codes. My main concern is then we would have e.g. a function with a switch statement mapping from OS error codes to our own; but then what if the OS returns an error code we didn't add support for? Not a massive deal since OSes don't want to break compatibility either and it would be odd for them to add a new code in an existing syscall. But new platforms may have OS-specific error codes we don't handle. These would presumably be translated to some kind of BIO_UNKNOWN value. Then people would have to raise PRs to add handling for these error codes if they wanted to be able to discern them. (But then we would switch from returning BIO_UNKNOWN to returning a specific value, which is potentially a breaking change for any existing users that came to depend on us returning BIO_UNKNOWN.)

Option 2: Define-based abstraction

Another option is we do define our own error codes, but these just forward to OS-specific ones:

#ifdef OPENSSL_SYS_WINDOWS
#define BIO_ERR_WOULDBLOCK WSAEWOULDBLOCK
#else
#define BIO_ERR_WOULDBLOCK EWOULDBLOCK
#endif

The numeric BIO_ERR values needn't be the same on all platforms but still provide an effective abstraction. We may need to include the relevant header in bio.h.

But I don't think this works either; there may be multiple error codes on a given OS corresponding to one of our abstracted error codes (for example, EWOULDBLOCK and EAGAIN). Users would presumably test the error code with x == BIO_ERR_WOULDBLOCK so this creates problems.

Option 3: Predicate and opaque system-specific value

BIO_ERR *err outputs a system specific value, and we offer predicates to test it:

typedef int BIO_ERR;
int BIO_ERR_is_would_block(BIO_ERR err);
int BIO_ERR_is_conn_refused(BIO_ERR err);
int BIO_ERR_is_intr(BIO_ERR err);
int BIO_ERR_to_socket_error(BIO_ERR err);
BIO_ERR BIO_ERR_from_socket_error(int err);

I think this is probably the best option:

Abstracts OS values
The application can still access the OS value if it needs it using BIO_ERR_to_socket_error (to support libssl as discussed above)
We can map multiple OS error codes to one predicate (EAGAIN vs. EWOULDBLOCK, etc.)
We can map multiple predicates to one OS error code if desired.
We can add new calls to test new conditions on an error code without breaking backwards compatibility.

Might be a good idea to make BIO_ERR 64-bit or a structure with a couple of opaque ints in just in case. I'll update this PR soon with a prototype of this.

mattcaswell · 2022-08-05T09:21:03Z

Option 3 sounds plausible, but please consider how it interacts/intersects with numerous other BIO macros/functions in this area. E.g.

# define BIO_get_flags(b) BIO_test_flags(b, ~(0x0))
# define BIO_set_retry_special(b) \
                BIO_set_flags(b, (BIO_FLAGS_IO_SPECIAL|BIO_FLAGS_SHOULD_RETRY))
# define BIO_set_retry_read(b) \
                BIO_set_flags(b, (BIO_FLAGS_READ|BIO_FLAGS_SHOULD_RETRY))
# define BIO_set_retry_write(b) \
                BIO_set_flags(b, (BIO_FLAGS_WRITE|BIO_FLAGS_SHOULD_RETRY))

/* These are normally used internally in BIOs */
# define BIO_clear_retry_flags(b) \
                BIO_clear_flags(b, (BIO_FLAGS_RWS|BIO_FLAGS_SHOULD_RETRY))
# define BIO_get_retry_flags(b) \
                BIO_test_flags(b, (BIO_FLAGS_RWS|BIO_FLAGS_SHOULD_RETRY))

/* These should be used by the application to tell why we should retry */
# define BIO_should_read(a)              BIO_test_flags(a, BIO_FLAGS_READ)
# define BIO_should_write(a)             BIO_test_flags(a, BIO_FLAGS_WRITE)
# define BIO_should_io_special(a)        BIO_test_flags(a, BIO_FLAGS_IO_SPECIAL)
# define BIO_retry_type(a)               BIO_test_flags(a, BIO_FLAGS_RWS)
# define BIO_should_retry(a)             BIO_test_flags(a, BIO_FLAGS_SHOULD_RETRY)

int BIO_get_retry_reason(BIO *bio);
void BIO_set_retry_reason(BIO *bio, int reason);

int BIO_sock_should_retry(int i);
int BIO_sock_non_fatal_error(int error);

In particular codes like EAGAIN are closely related to the BIO retry macros above.

beldmit · 2022-08-05T09:26:34Z

Please, please, leave a plain option to get a system errno/LastError. Otherwise debugging becomes much more difficult.

hlandau · 2022-08-05T09:33:04Z

@beldmit Yes, this will be supported via BIO_ERR_to_socket_error.

@mattcaswell Most of the BIO calls read or write state on the BIO so are incompatible with our objectives of supporting concurrent use for the mmsg methods. BIO_sock_non_fatal_error is more plausible though does hold us to using an int rather than a structure. Anyway, I'll come up with a proposal.

mattcaswell · 2022-08-05T09:38:16Z

Most of the BIO calls read or write state on the BIO so are incompatible with our objectives of supporting concurrent use for the mmsg methods

Yes, good point.

hlandau · 2022-08-05T12:55:43Z

New update. Trying to avoid overkill and reinventing our existing error code infrastructure.

hlandau · 2022-08-15T15:48:15Z

Ping? This is holding a lot up now.

@mattcaswell

mattcaswell · 2022-08-16T11:13:41Z

doc/man3/BIO_sendmmsg.pod

+retrieved using C<ERR_GET_REASON(errcode)>. The packed error code can be
+retrieved by calling L<ERR_peek_last_error(3)> after the call to BIO_sendmmsg()
+or BIO_recvmmsg() returns 0. Whether the error is transient can be determined by
+passing the packed error code to BIO_err_is_non_fatal().


How would a non-socket based custom BIO implementation signal a retry (e.g. a mem BIO)?

Via the same means? This doesn't rely on OS functionality so there's no reason alternative BIO implementations can't signal the same way.

BIO_err_is_non_fatal() checks whether the error is ERR_LIB_SYS and if so what sys error code it is. It seems a bit weird for a mem bio to be indicating a syscall error when it hasn't made any syscalls?

hlandau · 2022-08-17T13:31:40Z

Updated:

There is now a BIO_R_NON_FATAL which non-syscall-based BIOs can use to indicate a non-fatal condition in a way which does not depend on system error codes.
The is_non_fatal function has been updated to test for this code.
Documentation has been updated to advise people to use is_non_fatal rather than testing for any specific error code.

t8m

Just two nits.

doc/man3/BIO_sendmmsg.pod

hlandau · 2022-08-17T15:38:31Z

Updated.

openssl-machine · 2022-08-19T08:00:14Z

This pull request is ready to merge

hlandau · 2022-08-19T08:01:59Z

Merged to master.

Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18923)

Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from openssl#18923)

BIO_sendmmsg/BIO_recvmmsg (API only)

636737e

hlandau added branch: master Merge to master branch approval: review pending This pull request needs review by a committer approval: otc review pending triaged: feature The issue/pr requests/adds a feature labels Aug 1, 2022

hlandau requested review from levitte and mattcaswell August 1, 2022 09:43

hlandau self-assigned this Aug 1, 2022

hlandau requested a review from t8m August 1, 2022 09:44

github-actions bot added the severity: fips change The pull request changes FIPS provider sources label Aug 1, 2022

hlandau mentioned this pull request Aug 1, 2022

BIO_dgram support for BIO_sendmmsg/BIO_recvmmsg #18270

Closed

t8m approved these changes Aug 1, 2022

View reviewed changes

t8m removed the approval: otc review pending label Aug 1, 2022

mattcaswell requested changes Aug 3, 2022

View reviewed changes

fixup! BIO_sendmmsg/BIO_recvmmsg (API only)

d2e9c56

mattcaswell requested changes Aug 3, 2022

View reviewed changes

fixup! BIO_sendmmsg/BIO_recvmmsg (API only)

a6bc7f2

fixup! BIO_sendmmsg/BIO_recvmmsg (API only)

6d198d9

fixup! BIO_sendmmsg/BIO_recvmmsg (API only)

252272e

mattcaswell requested changes Aug 16, 2022

View reviewed changes

fixup! BIO_sendmmsg/BIO_recvmmsg (API only)

3e5012f

t8m requested changes Aug 17, 2022

View reviewed changes

doc/man3/BIO_sendmmsg.pod Outdated Show resolved Hide resolved

doc/man3/BIO_sendmmsg.pod Outdated Show resolved Hide resolved

fixup! BIO_sendmmsg/BIO_recvmmsg (API only)

4407fc9

t8m approved these changes Aug 17, 2022

View reviewed changes

t8m requested a review from mattcaswell August 17, 2022 16:38

mattcaswell approved these changes Aug 18, 2022

View reviewed changes

mattcaswell added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Aug 18, 2022

openssl-machine removed the approval: done This pull request has the required number of approvals label Aug 19, 2022

openssl-machine added the approval: ready to merge The 24 hour grace period has passed, ready to merge label Aug 19, 2022

hlandau closed this Aug 19, 2022

openssl-machine pushed a commit that referenced this pull request Aug 19, 2022

BIO_sendmmsg/BIO_recvmmsg (API only)

e0c4e43

Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18923)

sftcd pushed a commit to sftcd/openssl that referenced this pull request Sep 24, 2022

BIO_sendmmsg/BIO_recvmmsg (API only)

c21e927

Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from openssl#18923)

Uh oh!

BIO_sendmmsg/BIO_recvmmsg (API only) #18923

BIO_sendmmsg/BIO_recvmmsg (API only) #18923

Uh oh!

Conversation

hlandau commented Aug 1, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattcaswell Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

hlandau Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

mattcaswell Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

hlandau commented Aug 3, 2022

Uh oh!

Uh oh!

Uh oh!

mattcaswell Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

hlandau commented Aug 4, 2022

Uh oh!

hlandau commented Aug 4, 2022

Uh oh!

mattcaswell commented Aug 4, 2022

Uh oh!

hlandau commented Aug 5, 2022

Uh oh!

hlandau commented Aug 5, 2022

Uh oh!

mattcaswell commented Aug 5, 2022

Uh oh!

hlandau commented Aug 5, 2022

Uh oh!

mattcaswell commented Aug 5, 2022

Uh oh!

beldmit commented Aug 5, 2022

Uh oh!

hlandau commented Aug 5, 2022

Option 1: Use our own error codes

Option 2: Define-based abstraction

Option 3: Predicate and opaque system-specific value

Uh oh!

mattcaswell commented Aug 5, 2022

Uh oh!

beldmit commented Aug 5, 2022

Uh oh!

hlandau commented Aug 5, 2022

Uh oh!

mattcaswell commented Aug 5, 2022

Uh oh!

hlandau commented Aug 5, 2022

Uh oh!

hlandau commented Aug 15, 2022

Uh oh!

mattcaswell Aug 16, 2022

Choose a reason for hiding this comment

Uh oh!

hlandau Aug 16, 2022

Choose a reason for hiding this comment

Uh oh!

mattcaswell Aug 16, 2022

Choose a reason for hiding this comment

Uh oh!

hlandau commented Aug 17, 2022

Uh oh!

t8m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!