Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIO_sendmmsg/BIO_recvmmsg (API only) #18923

Closed
wants to merge 7 commits into from
Closed

Conversation

hlandau
Copy link
Member

@hlandau hlandau commented Aug 1, 2022

To make merging this easier, as a first step I've separated out the API into a separate PR. There are no implementations provided here, though a third party could do so using the BIO APIs.

An updated version of the BIO_dgram implementation PR without the Windows compatibility issues will follow.

@hlandau hlandau added branch: master Merge to master branch approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member triaged: feature The issue/pr requests/adds a feature labels Aug 1, 2022
@hlandau hlandau self-assigned this Aug 1, 2022
@hlandau hlandau requested a review from t8m August 1, 2022 09:44
@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label Aug 1, 2022
@t8m t8m removed the approval: otc review pending This pull request needs review by an OTC member label Aug 1, 2022
crypto/bio/bio_lib.c Outdated Show resolved Hide resolved
crypto/bio/bio_lib.c Show resolved Hide resolved
crypto/bio/bio_lib.c Show resolved Hide resolved
doc/man3/BIO_sendmmsg.pod Outdated Show resolved Hide resolved
doc/man3/BIO_sendmmsg.pod Outdated Show resolved Hide resolved
doc/man3/BIO_sendmmsg.pod Outdated Show resolved Hide resolved
doc/man3/BIO_sendmmsg.pod Outdated Show resolved Hide resolved
a packed system error code is returned. The macro BIO_IS_ERRNO() evaluates to 1
if a BIO_sendmmsg() or BIO_recvmmsg() return value represents a system error
code. BIO_UNPACK_ERRNO() should then be used to obtain the system error code
(EWOULDBLOCK, etc.) from the return value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we do this. In other BIO calls we just signal error and let the caller examine errno directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to add set_last_socket_error but I see no portability issues here.
Or we could add an int *err parameter and avoid depending on thread local state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever use this for DTLS then we will need to ensure errno is set for a system call failure because that is what existing applications will expect. Plausibly we could do that in libssl rather than in the BIO code. So, I'm in two minds about this.

Why would we need set_last_socket_error?

@hlandau
Copy link
Member Author

hlandau commented Aug 3, 2022

Updated.

crypto/bio/bio_lib.c Outdated Show resolved Hide resolved
crypto/bio/bio_lib.c Outdated Show resolved Hide resolved
a packed system error code is returned. The macro BIO_IS_ERRNO() evaluates to 1
if a BIO_sendmmsg() or BIO_recvmmsg() return value represents a system error
code. BIO_UNPACK_ERRNO() should then be used to obtain the system error code
(EWOULDBLOCK, etc.) from the return value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever use this for DTLS then we will need to ensure errno is set for a system call failure because that is what existing applications will expect. Plausibly we could do that in libssl rather than in the BIO code. So, I'm in two minds about this.

Why would we need set_last_socket_error?

@hlandau
Copy link
Member Author

hlandau commented Aug 4, 2022

Alternatives to the current API:

  1. Add an int *err argument to the function
  2. Use errno
  3. Use get/set_last_sys_error (which is either errno or Get/SetLastError)
  4. Use get_last_socket_error (which is either errno or WSAGetLastError) and add set_last_socket_error

I don't see much use of errno in the code. I would prefer (1) by a long mile. I'd prefer to avoid adding more dependencies on thread locals.

Not sure what you mean about DTLS. This is a new API without existing users. If there are semantics of the DTLS APIs we need to preserve those belong in libssl...

@hlandau
Copy link
Member Author

hlandau commented Aug 4, 2022

I've updated this to have a new int *err parameter. I think this is looking a lot better now. The semantics of the return value are also a lot clearer now and only indicate number of messages processed.

@mattcaswell
Copy link
Member

Not sure what you mean about DTLS. This is a new API without existing users. If there are semantics of the DTLS APIs we need to preserve those belong in libssl...

Its a new API that could be used by some future alteration to the DTLS code. The libssl DTLS code currently calls the BIO_read function to receive a message. The current API is that the end user calls SSL_read and if a failure occurs at the system call level then the end user expects to receive an error from SSL_read(), with SSL_get_error() returning SSL_ERROR_SYSCALL and errno set to an appropriate error code. Libssl does not explicitly do anything with errno. That's the behaviour libssl in turn inherits from BIO_read(), i.e. if a syscall fails then an error return is signalled and errno is set accordingly. In fact this behaviour only applies to BIOs that cause errno to be set at all (such as BIO_dgram) but wouldn't apply to BIOs that don't make syscalls (such as a mem BIO). Nonetheless, lots of applications rely on it.

If we at some point in the future modify libssl to call BIO_recvmmsg instead of BIO_read then we still need to preserve the previous behaviour back to the caller of SSL_read, i.e. signal an error return and ensure errno is set. It would be easiest if the underlying BIO calls BIO_read vs BIO_recvmmsg worked in the same way with regards to signalling errors. This makes it an easier change in libssl.

Modifying libssl to set errno instead would work, but is kind of weird and not ideal. The errno behaviour is specific to BIOs that actually make syscalls and doesn't apply to (say) a mem BIO. Libssl doesn't actually know what kind of BIO it is dealing with.

I've updated this to have a new int *err parameter. I think this is looking a lot better now. The semantics of the return value are also a lot clearer now and only indicate number of messages processed.

How is err interpreted for BIOs that don't make sys calls (e.g. a mem BIO)?

@hlandau
Copy link
Member Author

hlandau commented Aug 5, 2022

So basically we have a situation where:

  • We have accidentally exposed side effects of internal implementation details (errno);
  • Applications have ended up depending on this abstraction leak, in an instance of Hyrum's law.
  • We have system specific state (errno) which is leaked to the application sometimes (if a BIO which makes syscalls is used) and which doesn't other times.
  • The BIO API contract, as well as the libssl API contract, has accidentally incorporated the state of errno.

This is unfortunate. I feel like we should fix this in 4.0, probably by zeroing errno after syscalls. Nonetheless, we have to support the current expectation...

Some things worth noting, however. If applications are depending on errno in this way, can we even be confident that switching from recv(3) to recvmmsg(3) preserves these semantics? What if the OS uses the system-specific error codes differently?

I suspect the safest thing would be to make use of sendmmsg/recvmmsg in libssl opt-in, requiring the application to explicitly enable the new semantics (and forego use of the errno side channel).

int *err would use errno.h values, regardless of whether implemented via syscalls or in memory.

@hlandau
Copy link
Member Author

hlandau commented Aug 5, 2022

(But leaving all of the above aside, since int *err provides a system error code, if you want to have libssl's calls to BIO_readmmsg work the same way, you would only have to add the line errno = err; after each call site. I don't think that's a problem and avoids introducing this unfortunate side channel into the new BIO API.)

@mattcaswell
Copy link
Member

We have accidentally exposed side effects of internal implementation details (errno);
Applications have ended up depending on this abstraction leak, in an instance of Hyrum's law.
We have system specific state (errno) which is leaked to the application sometimes (if a BIO which makes syscalls is used) and which doesn't other times.
The BIO API contract, as well as the libssl API contract, has accidentally incorporated the state of errno.

That's a fair summary. Although whether it was accidental or not is lost in the mists of time. I think this behaviour has probably been there since SSLeay times (i.e. before OpenSSL even existed). It is so entrenched that it is now documented. From the man page for SSL_get_error():

SSL_ERROR_SYSCALL
Some non-recoverable, fatal I/O error occurred. The OpenSSL error queue may contain more information on the error. For socket I/O on Unix systems, consult errno for details. If this error occurs then no further I/O operations should be performed on the connection and SSL_shutdown() must not be called.

https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html

@hlandau
Copy link
Member Author

hlandau commented Aug 5, 2022

That's some interesting history...

So basically the question for now is whether errno = err; is in BIO_sendmmsg or libssl. Since it's just one line, it doesn't make a massive difference either way, but I'd argue it should be libssl doing this:

  • If we put this in BIO_sendmmsg things like this will probably happen accidentally again in the future.
    Callers of BIO_sendmmsg other than our existing SSL code will forget about the errno channel, and then callers of those callers end up depending on it. Will we always remember to put errno = 0; after every call to BIO_sendmmsg or BIO_recvmmsg in our QUIC implementation, etc. If we don't, this kind of unintended dependency is going to happen again, somehow...
  • The object of this is to preserve the documented semantics of the libssl API, so it's logical for libssl to do this.
    Having it done explicitly in libssl also makes it less likely for this behaviour to be accidentally broken later, and serves to document this in code, as currently this is rather subtle. Nor is it burdensome, since it's literally one line.

The only concern I have is making libssl touch errno, which doesn't necessarily exist on all platforms (and needs to be WSASetLastError on Win32). Though it looks like it already does in one case. But this can be readily abstracted by adding a set_last_socket_error to match the existing get_last_socket_error.

So all libssl would have to do, if it decides to preserve these semantics when using sendmmsg/recvmmsg would be:

int err;

num_processed = BIO_sendmmsg(..., &err);
set_last_socket_error(err); /* The OS socket error variable is part of the SSL_read API */

@mattcaswell
Copy link
Member

I can live with libssl doing this.

However, I have another thought (which may be slightly contradictory to my previous thought, but still needs to be considered). Should we be using OS specific error values at all in a new API? Or should we convert them to an OpenSSL error value of some form. Part of the point of the BIO abstraction is to abstract away system specific details. Applications written to use OpenSSL APIs should be portable. If you write an OpenSSL app on one platform, it should more-or-less work unmodified on a different platform (at least as far as the OpenSSL specific bits are concerned). By incorporating system specific error codes into the API we are breaking that. Users of the API will have to have different code to handle different platforms that they run on.

@beldmit
Copy link
Member

beldmit commented Aug 5, 2022

Matt,

I don't think converting OS specific errors is a good idea. When you have os specific error, you can google the possible reasons more effectively.

@hlandau
Copy link
Member Author

hlandau commented Aug 5, 2022

Option 1: Use our own error codes

I was thinking about this yesterday. It does seem like it would be better in principle to use our own error codes. My main concern is then we would have e.g. a function with a switch statement mapping from OS error codes to our own; but then what if the OS returns an error code we didn't add support for? Not a massive deal since OSes don't want to break compatibility either and it would be odd for them to add a new code in an existing syscall. But new platforms may have OS-specific error codes we don't handle. These would presumably be translated to some kind of BIO_UNKNOWN value. Then people would have to raise PRs to add handling for these error codes if they wanted to be able to discern them. (But then we would switch from returning BIO_UNKNOWN to returning a specific value, which is potentially a breaking change for any existing users that came to depend on us returning BIO_UNKNOWN.)

Option 2: Define-based abstraction

Another option is we do define our own error codes, but these just forward to OS-specific ones:

#ifdef OPENSSL_SYS_WINDOWS
#define BIO_ERR_WOULDBLOCK WSAEWOULDBLOCK
#else
#define BIO_ERR_WOULDBLOCK EWOULDBLOCK
#endif

The numeric BIO_ERR values needn't be the same on all platforms but still provide an effective abstraction. We may need to include the relevant header in bio.h.

But I don't think this works either; there may be multiple error codes on a given OS corresponding to one of our abstracted error codes (for example, EWOULDBLOCK and EAGAIN). Users would presumably test the error code with x == BIO_ERR_WOULDBLOCK so this creates problems.

Option 3: Predicate and opaque system-specific value

BIO_ERR *err outputs a system specific value, and we offer predicates to test it:

typedef int BIO_ERR;
int BIO_ERR_is_would_block(BIO_ERR err);
int BIO_ERR_is_conn_refused(BIO_ERR err);
int BIO_ERR_is_intr(BIO_ERR err);
int BIO_ERR_to_socket_error(BIO_ERR err);
BIO_ERR BIO_ERR_from_socket_error(int err);

I think this is probably the best option:

  • Abstracts OS values
  • The application can still access the OS value if it needs it using BIO_ERR_to_socket_error (to support libssl as discussed above)
  • We can map multiple OS error codes to one predicate (EAGAIN vs. EWOULDBLOCK, etc.)
  • We can map multiple predicates to one OS error code if desired.
  • We can add new calls to test new conditions on an error code without breaking backwards compatibility.

Might be a good idea to make BIO_ERR 64-bit or a structure with a couple of opaque ints in just in case. I'll update this PR soon with a prototype of this.

@mattcaswell
Copy link
Member

Option 3 sounds plausible, but please consider how it interacts/intersects with numerous other BIO macros/functions in this area. E.g.

# define BIO_get_flags(b) BIO_test_flags(b, ~(0x0))
# define BIO_set_retry_special(b) \
                BIO_set_flags(b, (BIO_FLAGS_IO_SPECIAL|BIO_FLAGS_SHOULD_RETRY))
# define BIO_set_retry_read(b) \
                BIO_set_flags(b, (BIO_FLAGS_READ|BIO_FLAGS_SHOULD_RETRY))
# define BIO_set_retry_write(b) \
                BIO_set_flags(b, (BIO_FLAGS_WRITE|BIO_FLAGS_SHOULD_RETRY))

/* These are normally used internally in BIOs */
# define BIO_clear_retry_flags(b) \
                BIO_clear_flags(b, (BIO_FLAGS_RWS|BIO_FLAGS_SHOULD_RETRY))
# define BIO_get_retry_flags(b) \
                BIO_test_flags(b, (BIO_FLAGS_RWS|BIO_FLAGS_SHOULD_RETRY))

/* These should be used by the application to tell why we should retry */
# define BIO_should_read(a)              BIO_test_flags(a, BIO_FLAGS_READ)
# define BIO_should_write(a)             BIO_test_flags(a, BIO_FLAGS_WRITE)
# define BIO_should_io_special(a)        BIO_test_flags(a, BIO_FLAGS_IO_SPECIAL)
# define BIO_retry_type(a)               BIO_test_flags(a, BIO_FLAGS_RWS)
# define BIO_should_retry(a)             BIO_test_flags(a, BIO_FLAGS_SHOULD_RETRY)
int BIO_get_retry_reason(BIO *bio);
void BIO_set_retry_reason(BIO *bio, int reason);
int BIO_sock_should_retry(int i);
int BIO_sock_non_fatal_error(int error);

In particular codes like EAGAIN are closely related to the BIO retry macros above.

@beldmit
Copy link
Member

beldmit commented Aug 5, 2022

Please, please, leave a plain option to get a system errno/LastError. Otherwise debugging becomes much more difficult.

@hlandau
Copy link
Member Author

hlandau commented Aug 5, 2022

@beldmit Yes, this will be supported via BIO_ERR_to_socket_error.

@mattcaswell Most of the BIO calls read or write state on the BIO so are incompatible with our objectives of supporting concurrent use for the mmsg methods. BIO_sock_non_fatal_error is more plausible though does hold us to using an int rather than a structure. Anyway, I'll come up with a proposal.

@mattcaswell
Copy link
Member

Most of the BIO calls read or write state on the BIO so are incompatible with our objectives of supporting concurrent use for the mmsg methods

Yes, good point.

@hlandau
Copy link
Member Author

hlandau commented Aug 5, 2022

New update. Trying to avoid overkill and reinventing our existing error code infrastructure.

@hlandau
Copy link
Member Author

hlandau commented Aug 15, 2022

Ping? This is holding a lot up now.

@mattcaswell

retrieved using C<ERR_GET_REASON(errcode)>. The packed error code can be
retrieved by calling L<ERR_peek_last_error(3)> after the call to BIO_sendmmsg()
or BIO_recvmmsg() returns 0. Whether the error is transient can be determined by
passing the packed error code to BIO_err_is_non_fatal().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a non-socket based custom BIO implementation signal a retry (e.g. a mem BIO)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Via the same means? This doesn't rely on OS functionality so there's no reason alternative BIO implementations can't signal the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIO_err_is_non_fatal() checks whether the error is ERR_LIB_SYS and if so what sys error code it is. It seems a bit weird for a mem bio to be indicating a syscall error when it hasn't made any syscalls?

@hlandau
Copy link
Member Author

hlandau commented Aug 17, 2022

Updated:

  • There is now a BIO_R_NON_FATAL which non-syscall-based BIOs can use to indicate a non-fatal condition in a way which does not depend on system error codes.
  • The is_non_fatal function has been updated to test for this code.
  • Documentation has been updated to advise people to use is_non_fatal rather than testing for any specific error code.

Copy link
Member

@t8m t8m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two nits.

doc/man3/BIO_sendmmsg.pod Outdated Show resolved Hide resolved
doc/man3/BIO_sendmmsg.pod Outdated Show resolved Hide resolved
@hlandau
Copy link
Member Author

hlandau commented Aug 17, 2022

Updated.

@t8m t8m requested a review from mattcaswell August 17, 2022 16:38
@mattcaswell mattcaswell added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Aug 18, 2022
@openssl-machine openssl-machine removed the approval: done This pull request has the required number of approvals label Aug 19, 2022
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

@openssl-machine openssl-machine added the approval: ready to merge The 24 hour grace period has passed, ready to merge label Aug 19, 2022
@hlandau
Copy link
Member Author

hlandau commented Aug 19, 2022

Merged to master.

@hlandau hlandau closed this Aug 19, 2022
openssl-machine pushed a commit that referenced this pull request Aug 19, 2022
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18923)
sftcd pushed a commit to sftcd/openssl that referenced this pull request Sep 24, 2022
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from openssl#18923)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: ready to merge The 24 hour grace period has passed, ready to merge branch: master Merge to master branch severity: fips change The pull request changes FIPS provider sources triaged: feature The issue/pr requests/adds a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants