Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-95494: Fix transport EOF handling in OpenSSL 3.0 #95495

Merged
merged 4 commits into from Mar 22, 2023

Conversation

davidben
Copy link
Contributor

@davidben davidben commented Jul 31, 2022

GH-25309 enabled SSL_OP_IGNORE_UNEXPECTED_EOF by default, with a comment
that it restores OpenSSL 1.1.1 behavior, but this wasn't quite right.
That option causes OpenSSL to treat transport EOF as the same as
close_notify (i.e. SSL_ERROR_ZERO_RETURN), whereas Python actually has
distinct SSLEOFError and SSLZeroReturnError exceptions. (The latter is
usually mapped to a zero return from read.) In OpenSSL 1.1.1, the ssl
module would raise them for transport EOF and close_notify,
respectively. In OpenSSL 3.0, both act like close_notify.

Fix this by, instead, just detecting SSL_R_UNEXPECTED_EOF_WHILE_READING
and mapping that to the other exception type.

There doesn't seem to have been any unit test of this error, so fill in
the missing one. This had to be done with the BIO path because it's
actually slightly tricky to simulate a transport EOF with Python's fd
based APIs. (If you instruct the server to close the socket, it gets
confused, probably because the server's SSL object is still referencing
the now dead fd?)

pythonGH-25309 enabled SSL_OP_IGNORE_UNEXPECTED_EOF by default, with a comment
that it restores OpenSSL 1.1.1 behavior, but this wasn't quite right.
That option causes OpenSSL to treat transport EOF as the same as
close_notify (i.e. SSL_ERROR_ZERO_RETURN), whereas Python actually has
distinct SSLEOFError and SSLZeroReturnError exceptions. (The latter is
usually mapped to a zero return from read.) In OpenSSL 1.1.1, the ssl
module would raise them for transport EOF and close_notify,
respectively. In OpenSSL 3.0, both act like close_notify.

Fix this by, instead, just detecting SSL_R_UNEXPECTED_EOF_WHILE_READING
and mapping that to the other exception type.

There doesn't seem to have been any unit test of this error, so fill in
the missing one. This had to be done with the BIO path because it's
actually slightly tricky to simulate a transport EOF with Python's fd
based APIs. (If you instruct the server to close the socket, it gets
confused, probably because the server's SSL object is still referencing
the now dead fd?)
@davidben
Copy link
Contributor Author

EOF handling in OpenSSL has gotten unnecessarily hairy, largely due to upstream getting the design of BIO_read_ex and SSL_read_ex wrong. I ran into this while trying to work through this mess. We can probably simplify the EOF logic elsewhere in that function, but I'll leave that to when I've disentangled it more. This fix seems to be separable.

@davidben
Copy link
Contributor Author

Seem to have broken a few tests. Odd. Will take a look and update this.

@davidben
Copy link
Contributor Author

davidben commented Aug 1, 2022

Seem to have broken a few tests. Odd. Will take a look and update this.

Fixed, hopefully. The issue was I needed the new exception to have PY_SSL_ERROR_EOF.

Copy link
Member

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! Thank you David.

Could you please extend the blurb and mention that :attr:~ssl.SSLContext.options no longer has OP_IGNORE_UNEXPECTED_EOF?

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@tiran tiran added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 3, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @tiran for commit 38556b9 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 3, 2022
@davidben
Copy link
Contributor Author

davidben commented Aug 3, 2022

I have made the requested changes; please review again

Let me know if I did that right. I went and added the various role markers to the other links too.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@tiran: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from tiran August 3, 2022 17:54
@davidben
Copy link
Contributor Author

davidben commented Dec 20, 2022

Anything left to do on this PR on my end?

@davidben
Copy link
Contributor Author

@tiran Anything left for me to do before this is mergable?

@Yhg1s Yhg1s self-requested a review March 22, 2023 12:09
@Yhg1s
Copy link
Member

Yhg1s commented Mar 22, 2023

Merging, since Christian is taking a break and the only open comment is about the news entry, which was addressed.

@Yhg1s Yhg1s merged commit 420bbb7 into python:main Mar 22, 2023
@Yhg1s
Copy link
Member

Yhg1s commented Mar 22, 2023

The only question now is whether to backport this to 3.11 and 3.10. I think it makes sense to backport, but @pablogsal, @ambv, what do you think? (This changes the exception raised when using OpenSSL 3.0 and a TLS connection is unexpectedly terminated, but changes it to behave more like OpenSSL 1.1.)

@pablogsal
Copy link
Member

The only question now is whether to backport this to 3.11 and 3.10. I think it makes sense to backport, but @pablogsal, @ambv, what do you think? (This changes the exception raised when using OpenSSL 3.0 and a TLS connection is unexpectedly terminated, but changes it to behave more like OpenSSL 1.1.)

I am a bit hesitant to be honest. I would feel more comfortable if we can get sone feedback for some library authors that will potentially be impacted first but I don't see this as a blocker if everyone else agrees

@ambv
Copy link
Contributor

ambv commented Mar 22, 2023

This is entirely in Pablo's hands. He's right that with 3.10 being this far down the line, it's disruptive now.

However! 3.11 has only seen two bugfix releases so far. Maybe that's not too late to course-correct? People are still not using 3.11 very widely in the wild. If Black is any indication, pip installs from 3.11 account for barely 8% of the total (3.7 - 3.11). 3.10 is still 3X more popular than 3.11, and so is 3.9. In fact, 3.8 beats 3.11 by five times.

Numpy presents an even colder image of 3.11 adoption. Downloads from 3.11 constitute barely 2.0% - 2.3% of the total. 3.11 barely beats 2.7 there. 3.10 beats it by almost 4X, 3.9 by over 8X, 3.8 by a whopping 14.4X, and 3.7 by close to 16X.

Anaconda also doesn't use 3.11 yet, it's available in the main repository but they only just released 3.10 support so nobody is using that in the wild unless they opt-in.

What I'm saying is: In my opinion 3.11 can still fix things with relatively little consequence to external users.

@ambv
Copy link
Contributor

ambv commented Mar 22, 2023

Some bug fixes necessarily break what people used before. In my time I recall doing a small number of breaking fixes when the given Python version was the newest maintenance release. The way I thought about this at the time was this (version numbers adapted to the current situation):

  • 3.12 is still being developed
  • 3.11.3 will change behavior
  • therefore, a library maintainer only has to say if sys.version_info[:3] >= (3, 11, 3): to guard the change -- this covers 3.12 as well

Changing older maintenance releases would create annoying version cherry-picking for maintainers. This one in the example above? Not at all different from if sys.version_info[:2] >= (3, 12). It's still just one check.

@Yhg1s
Copy link
Member

Yhg1s commented Mar 23, 2023

Note that while this introduces a change when using OpenSSL 3.0, it actually makes it behave more like OpenSSL 1.1. There's no change when using OpenSSL 1.1. The only way (I can think of) this would affect user code is when they have only been using OpenSSL 3.0, and they're are handling SSLZeroReturnError but not EOFError. The code would already be problematic if run with OpenSSL 1.1 instead. I feel like the adoption of the Python version hardly matters in that equation.

@davidben
Copy link
Contributor Author

Distinguishing EOF (which an attacker can inject) and an authenticated close_notify can even be a security issue in some applications, if they care about secure EOF. E.g. a naive protocol that uses EOF to signal the data is complete rather than an in-protocol framing.

That said, in most applications, like HTTPS, close_notify is so eroded that this idea is fiction, and only in-protocol framing works. In which case this is moot.

@davidben
Copy link
Contributor Author

(To clarify, I have no particular stake in this being classified as a security issue or not. In so far as it is one, I doubt many, if any, applications care. Just realized it might matter to some folks, so thought I'd mention in case you all care.)

@pablogsal
Copy link
Member

pablogsal commented Mar 24, 2023

After reflecting a bit more about this I think I agree with @ambv and I would be ok backporting it to 3.11.

Actually after reflecting a bit more I think we should backport to both 3.11 and 3.10. Technically we don't fully support OpenSSL 3.0 (IIRC) and I think the "behaviour" change is not something that is going to be that common and furthermore, it can be seen as a bug under the appropriate lens.

@ambv ambv added needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Mar 24, 2023
@miss-islington
Copy link
Contributor

Thanks @davidben for the PR, and @Yhg1s for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @davidben for the PR, and @Yhg1s for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-103006 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Mar 24, 2023
@bedevere-bot
Copy link

GH-103007 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Mar 24, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 24, 2023
…5495)

pythonGH-25309 enabled SSL_OP_IGNORE_UNEXPECTED_EOF by default, with a comment
that it restores OpenSSL 1.1.1 behavior, but this wasn't quite right.
That option causes OpenSSL to treat transport EOF as the same as
close_notify (i.e. SSL_ERROR_ZERO_RETURN), whereas Python actually has
distinct SSLEOFError and SSLZeroReturnError exceptions. (The latter is
usually mapped to a zero return from read.) In OpenSSL 1.1.1, the ssl
module would raise them for transport EOF and close_notify,
respectively. In OpenSSL 3.0, both act like close_notify.

Fix this by, instead, just detecting SSL_R_UNEXPECTED_EOF_WHILE_READING
and mapping that to the other exception type.

There doesn't seem to have been any unit test of this error, so fill in
the missing one. This had to be done with the BIO path because it's
actually slightly tricky to simulate a transport EOF with Python's fd
based APIs. (If you instruct the server to close the socket, it gets
confused, probably because the server's SSL object is still referencing
the now dead fd?)
(cherry picked from commit 420bbb7)

Co-authored-by: David Benjamin <davidben@google.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 24, 2023
…5495)

pythonGH-25309 enabled SSL_OP_IGNORE_UNEXPECTED_EOF by default, with a comment
that it restores OpenSSL 1.1.1 behavior, but this wasn't quite right.
That option causes OpenSSL to treat transport EOF as the same as
close_notify (i.e. SSL_ERROR_ZERO_RETURN), whereas Python actually has
distinct SSLEOFError and SSLZeroReturnError exceptions. (The latter is
usually mapped to a zero return from read.) In OpenSSL 1.1.1, the ssl
module would raise them for transport EOF and close_notify,
respectively. In OpenSSL 3.0, both act like close_notify.

Fix this by, instead, just detecting SSL_R_UNEXPECTED_EOF_WHILE_READING
and mapping that to the other exception type.

There doesn't seem to have been any unit test of this error, so fill in
the missing one. This had to be done with the BIO path because it's
actually slightly tricky to simulate a transport EOF with Python's fd
based APIs. (If you instruct the server to close the socket, it gets
confused, probably because the server's SSL object is still referencing
the now dead fd?)
(cherry picked from commit 420bbb7)

Co-authored-by: David Benjamin <davidben@google.com>
Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this pull request Mar 27, 2023
…5495)

pythonGH-25309 enabled SSL_OP_IGNORE_UNEXPECTED_EOF by default, with a comment
that it restores OpenSSL 1.1.1 behavior, but this wasn't quite right.
That option causes OpenSSL to treat transport EOF as the same as
close_notify (i.e. SSL_ERROR_ZERO_RETURN), whereas Python actually has
distinct SSLEOFError and SSLZeroReturnError exceptions. (The latter is
usually mapped to a zero return from read.) In OpenSSL 1.1.1, the ssl
module would raise them for transport EOF and close_notify,
respectively. In OpenSSL 3.0, both act like close_notify.

Fix this by, instead, just detecting SSL_R_UNEXPECTED_EOF_WHILE_READING
and mapping that to the other exception type.

There doesn't seem to have been any unit test of this error, so fill in
the missing one. This had to be done with the BIO path because it's
actually slightly tricky to simulate a transport EOF with Python's fd
based APIs. (If you instruct the server to close the socket, it gets
confused, probably because the server's SSL object is still referencing
the now dead fd?)
ambv pushed a commit that referenced this pull request Mar 27, 2023
…#103006)

GH-25309 enabled SSL_OP_IGNORE_UNEXPECTED_EOF by default, with a comment
that it restores OpenSSL 1.1.1 behavior, but this wasn't quite right.
That option causes OpenSSL to treat transport EOF as the same as
close_notify (i.e. SSL_ERROR_ZERO_RETURN), whereas Python actually has
distinct SSLEOFError and SSLZeroReturnError exceptions. (The latter is
usually mapped to a zero return from read.) In OpenSSL 1.1.1, the ssl
module would raise them for transport EOF and close_notify,
respectively. In OpenSSL 3.0, both act like close_notify.

Fix this by, instead, just detecting SSL_R_UNEXPECTED_EOF_WHILE_READING
and mapping that to the other exception type.

There doesn't seem to have been any unit test of this error, so fill in
the missing one. This had to be done with the BIO path because it's
actually slightly tricky to simulate a transport EOF with Python's fd
based APIs. (If you instruct the server to close the socket, it gets
confused, probably because the server's SSL object is still referencing
the now dead fd?)
(cherry picked from commit 420bbb7)

Co-authored-by: David Benjamin <davidben@google.com>
ambv pushed a commit that referenced this pull request Mar 27, 2023
…#103007)

GH-25309 enabled SSL_OP_IGNORE_UNEXPECTED_EOF by default, with a comment
that it restores OpenSSL 1.1.1 behavior, but this wasn't quite right.
That option causes OpenSSL to treat transport EOF as the same as
close_notify (i.e. SSL_ERROR_ZERO_RETURN), whereas Python actually has
distinct SSLEOFError and SSLZeroReturnError exceptions. (The latter is
usually mapped to a zero return from read.) In OpenSSL 1.1.1, the ssl
module would raise them for transport EOF and close_notify,
respectively. In OpenSSL 3.0, both act like close_notify.

Fix this by, instead, just detecting SSL_R_UNEXPECTED_EOF_WHILE_READING
and mapping that to the other exception type.

There doesn't seem to have been any unit test of this error, so fill in
the missing one. This had to be done with the BIO path because it's
actually slightly tricky to simulate a transport EOF with Python's fd
based APIs. (If you instruct the server to close the socket, it gets
confused, probably because the server's SSL object is still referencing
the now dead fd?)
(cherry picked from commit 420bbb7)

Co-authored-by: David Benjamin <davidben@google.com>
warsaw pushed a commit to warsaw/cpython that referenced this pull request Apr 11, 2023
…5495)

pythonGH-25309 enabled SSL_OP_IGNORE_UNEXPECTED_EOF by default, with a comment
that it restores OpenSSL 1.1.1 behavior, but this wasn't quite right.
That option causes OpenSSL to treat transport EOF as the same as
close_notify (i.e. SSL_ERROR_ZERO_RETURN), whereas Python actually has
distinct SSLEOFError and SSLZeroReturnError exceptions. (The latter is
usually mapped to a zero return from read.) In OpenSSL 1.1.1, the ssl
module would raise them for transport EOF and close_notify,
respectively. In OpenSSL 3.0, both act like close_notify.

Fix this by, instead, just detecting SSL_R_UNEXPECTED_EOF_WHILE_READING
and mapping that to the other exception type.

There doesn't seem to have been any unit test of this error, so fill in
the missing one. This had to be done with the BIO path because it's
actually slightly tricky to simulate a transport EOF with Python's fd
based APIs. (If you instruct the server to close the socket, it gets
confused, probably because the server's SSL object is still referencing
the now dead fd?)
WillChilds-Klein added a commit to aws/aws-lc that referenced this pull request Jan 19, 2024
# Notes

This change implements `SSL_MODE_AUTO_RETRY`, which was previously ignored. When this option (off by default) is set, `SSL_get_error`s behavior hews a little more closely to OpenSSL when processing a return code of 0 (i.e. empty I/O). With this option set, we allow error processing for empty reads if we're in a retryable state. With this option unset, we treat (most) 0-valued return codes as `SSL_ERROR_SYSCALL`, using it as a sort-of alias for EOF in the underlying transport. Internal calls to `SSL_get_error` (such as [those in `bio_ssl.cc`](https://github.com/aws/aws-lc/blob/main/ssl/bio_ssl.cc#L29)) then observe the  `SSL_ERROR_SYSCALL` and fail to process potential retries.

OpenSSL implements this mode differently, only checking its value in its [main read loop](https://github.com/openssl/openssl/blob/398011848468c7e8e481b295f7904afc30934217/ssl/record/rec_layer_s3.c#L992). Neither OpenSSL [1.1.1][6] nor [OpenSSL 3.x][10] return `SSL_ERROR_SYSCALL` early for empty reads, instead allowing them to be processed for retryable state. BoringSSL [appears to have diverged][7] from this behavior in 2015, but that commit was just a revert of a regression committed ~1 month prior. Their [previous implementation][18] closely resembles [that of OpenSSL 1.0.2][13], guarding retryable error processing with `i < 0` checks. `SSL_get_error` in [OpenSSL 1.1.1][12] and [OpenSSL 3.0][11], however, have no such guards. This change resembles the latter two. 

Why is it important that we process 0-valued return codes more fully? The recommended SSL IO functions (`SSL_read`, `SSL_write`, etc.) are, by contract, allowed to return negative numbers to indicate an error or non-success condition. However, their `size_t`-clean counterparts (`SSL_read_ex`, `SSL_write_ex`, etc.) must, by contract, only return 0 or 1. This means that passing the return code from the various `_ex` functions to `SSL_get_error` loses error reporting information, resulting in inconsistent handling of otherwise identical error cases between the two function families.

Another solution to the `_ex` functions' loss of error reporting information would be to change the contract of those functions to allow negative return values, directly returning the return code from internal calls to `SSL_read` or `SSL_write`. I decided not to take this approach as it would break callers who do not account for negative return values in their error handling.

# Relevant Links

- [OpenSSL 1.0.2 documentation for `SSL_get_error`][3]
- [OpenSSL 1.1.1 documentation for `SSL_get_error`][4]
- [OpenSSL 3.0 documentation for `SSL_get_error`][5]
- [OpenSSL 1.0.2 implementation for `SSL_get_error`][13]
- [OpenSSL 1.1.1 implementation for `SSL_get_error`][12]
- [OpenSSL 3.0 implementation for `SSL_get_error`][11]
- [OpenSSL master (3.2) implementation for `SSL_get_error`][10]
- [OpenSSL 1.1.1 commit dropping `ret < 0` guards for hint processing][6]
- [Unresolved OpenSSL issue for distinguishing EOF from failure in `BIO_read_ex`][1]
- [BoringSSL commit introducing `SSL_ERROR_SYSCALL` for `ret == 0` case][7]
- [BoringSSL Issue 503: Stop preserving arbitrary BIO error return values][2]
- [BoringSSL Issue 24: Sanitize SSL and BIO return codes.][15]
- [OpenSSL 3.0 commit returning `SSL_ERROR_SSL`, add reason to err stack on EOF][14]
- [OpenSSL 3.0 commit introducing `SSL_OP_IGNORE_UNEXPECTED_EOF`  option][8]
- [CPython commit from @davidben removing CPython's use of `SSL_OP_IGNORE_UNEXPECTED_EOF`][9]
- [CPython commit switching from `SSL_read` and `SSL_write` to `SSL_read_ex` and `SSL_write_ex`][16]
- [OpenSSL 1.1.1 documentation for `SSL_MODE_AUTO_RETRY`][17]



[1]: openssl/openssl#8208
[2]: https://bugs.chromium.org/p/boringssl/issues/detail?id=503
[3]: https://www.openssl.org/docs/man1.0.2/man3/SSL_get_error.html
[4]: https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html
[5]: https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html
[6]: openssl/openssl@8051ab2
[7]: google/boringssl@9a38e92
[8]: openssl/openssl@09b90e0
[9]: python/cpython#95495
[10]: https://github.com/openssl/openssl/blob/master/ssl/ssl_lib.c#L4601
[11]: https://github.com/openssl/openssl/blob/openssl-3.0/ssl/ssl_lib.c#L3839
[12]: https://github.com/openssl/openssl/blob/OpenSSL_1_1_1-stable/ssl/ssl_lib.c#L3617
[13]: https://github.com/openssl/openssl/blob/OpenSSL_1_0_2-stable/ssl/ssl_lib.c#L2709
[14]: openssl/openssl@d924dbf
[15]: https://bugs.chromium.org/p/boringssl/issues/detail?id=24
[16]: python/cpython@89d1550
[17]: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_mode.html
[18]: google/boringssl@fcf2583#diff-38e1eecb6013c2fe8bf194e5020ce1afd6e31c371d7849e041464ad822fe745bL2271
WillChilds-Klein added a commit to WillChilds-Klein/aws-lc that referenced this pull request Jan 19, 2024
This change implements `SSL_MODE_AUTO_RETRY`, which was previously ignored. When this option (off by default) is set, `SSL_get_error`s behavior hews a little more closely to OpenSSL when processing a return code of 0 (i.e. empty I/O). With this option set, we allow error processing for empty reads if we're in a retryable state. With this option unset, we treat (most) 0-valued return codes as `SSL_ERROR_SYSCALL`, using it as a sort-of alias for EOF in the underlying transport. Internal calls to `SSL_get_error` (such as [those in `bio_ssl.cc`](https://github.com/aws/aws-lc/blob/main/ssl/bio_ssl.cc#L29)) then observe the  `SSL_ERROR_SYSCALL` and fail to process potential retries.

OpenSSL implements this mode differently, only checking its value in its [main read loop](https://github.com/openssl/openssl/blob/398011848468c7e8e481b295f7904afc30934217/ssl/record/rec_layer_s3.c#L992). Neither OpenSSL [1.1.1][6] nor [OpenSSL 3.x][10] return `SSL_ERROR_SYSCALL` early for empty reads, instead allowing them to be processed for retryable state. BoringSSL [appears to have diverged][7] from this behavior in 2015, but that commit was just a revert of a regression committed ~1 month prior. Their [previous implementation][18] closely resembles [that of OpenSSL 1.0.2][13], guarding retryable error processing with `i < 0` checks. `SSL_get_error` in [OpenSSL 1.1.1][12] and [OpenSSL 3.0][11], however, have no such guards. This change resembles the latter two.

Why is it important that we process 0-valued return codes more fully? The recommended SSL IO functions (`SSL_read`, `SSL_write`, etc.) are, by contract, allowed to return negative numbers to indicate an error or non-success condition. However, their `size_t`-clean counterparts (`SSL_read_ex`, `SSL_write_ex`, etc.) must, by contract, only return 0 or 1. This means that passing the return code from the various `_ex` functions to `SSL_get_error` loses error reporting information, resulting in inconsistent handling of otherwise identical error cases between the two function families.

Another solution to the `_ex` functions' loss of error reporting information would be to change the contract of those functions to allow negative return values, directly returning the return code from internal calls to `SSL_read` or `SSL_write`. I decided not to take this approach as it would break callers who do not account for negative return values in their error handling.

- [OpenSSL 1.0.2 documentation for `SSL_get_error`][3]
- [OpenSSL 1.1.1 documentation for `SSL_get_error`][4]
- [OpenSSL 3.0 documentation for `SSL_get_error`][5]
- [OpenSSL 1.0.2 implementation for `SSL_get_error`][13]
- [OpenSSL 1.1.1 implementation for `SSL_get_error`][12]
- [OpenSSL 3.0 implementation for `SSL_get_error`][11]
- [OpenSSL master (3.2) implementation for `SSL_get_error`][10]
- [OpenSSL 1.1.1 commit dropping `ret < 0` guards for hint processing][6]
- [Unresolved OpenSSL issue for distinguishing EOF from failure in `BIO_read_ex`][1]
- [BoringSSL commit introducing `SSL_ERROR_SYSCALL` for `ret == 0` case][7]
- [BoringSSL Issue 503: Stop preserving arbitrary BIO error return values][2]
- [BoringSSL Issue 24: Sanitize SSL and BIO return codes.][15]
- [OpenSSL 3.0 commit returning `SSL_ERROR_SSL`, add reason to err stack on EOF][14]
- [OpenSSL 3.0 commit introducing `SSL_OP_IGNORE_UNEXPECTED_EOF`  option][8]
- [CPython commit from @davidben removing CPython's use of `SSL_OP_IGNORE_UNEXPECTED_EOF`][9]
- [CPython commit switching from `SSL_read` and `SSL_write` to `SSL_read_ex` and `SSL_write_ex`][16]
- [OpenSSL 1.1.1 documentation for `SSL_MODE_AUTO_RETRY`][17]

[1]: openssl/openssl#8208
[2]: https://bugs.chromium.org/p/boringssl/issues/detail?id=503
[3]: https://www.openssl.org/docs/man1.0.2/man3/SSL_get_error.html
[4]: https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html
[5]: https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html
[6]: openssl/openssl@8051ab2
[7]: google/boringssl@9a38e92
[8]: openssl/openssl@09b90e0
[9]: python/cpython#95495
[10]: https://github.com/openssl/openssl/blob/master/ssl/ssl_lib.c#L4601
[11]: https://github.com/openssl/openssl/blob/openssl-3.0/ssl/ssl_lib.c#L3839
[12]: https://github.com/openssl/openssl/blob/OpenSSL_1_1_1-stable/ssl/ssl_lib.c#L3617
[13]: https://github.com/openssl/openssl/blob/OpenSSL_1_0_2-stable/ssl/ssl_lib.c#L2709
[14]: openssl/openssl@d924dbf
[15]: https://bugs.chromium.org/p/boringssl/issues/detail?id=24
[16]: python/cpython@89d1550
[17]: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_mode.html
[18]: google/boringssl@fcf2583#diff-38e1eecb6013c2fe8bf194e5020ce1afd6e31c371d7849e041464ad822fe745bL2271
dougch pushed a commit to dougch/aws-lc that referenced this pull request Jan 30, 2024
# Notes

This change implements `SSL_MODE_AUTO_RETRY`, which was previously ignored. When this option (off by default) is set, `SSL_get_error`s behavior hews a little more closely to OpenSSL when processing a return code of 0 (i.e. empty I/O). With this option set, we allow error processing for empty reads if we're in a retryable state. With this option unset, we treat (most) 0-valued return codes as `SSL_ERROR_SYSCALL`, using it as a sort-of alias for EOF in the underlying transport. Internal calls to `SSL_get_error` (such as [those in `bio_ssl.cc`](https://github.com/aws/aws-lc/blob/main/ssl/bio_ssl.cc#L29)) then observe the  `SSL_ERROR_SYSCALL` and fail to process potential retries.

OpenSSL implements this mode differently, only checking its value in its [main read loop](https://github.com/openssl/openssl/blob/398011848468c7e8e481b295f7904afc30934217/ssl/record/rec_layer_s3.c#L992). Neither OpenSSL [1.1.1][6] nor [OpenSSL 3.x][10] return `SSL_ERROR_SYSCALL` early for empty reads, instead allowing them to be processed for retryable state. BoringSSL [appears to have diverged][7] from this behavior in 2015, but that commit was just a revert of a regression committed ~1 month prior. Their [previous implementation][18] closely resembles [that of OpenSSL 1.0.2][13], guarding retryable error processing with `i < 0` checks. `SSL_get_error` in [OpenSSL 1.1.1][12] and [OpenSSL 3.0][11], however, have no such guards. This change resembles the latter two. 

Why is it important that we process 0-valued return codes more fully? The recommended SSL IO functions (`SSL_read`, `SSL_write`, etc.) are, by contract, allowed to return negative numbers to indicate an error or non-success condition. However, their `size_t`-clean counterparts (`SSL_read_ex`, `SSL_write_ex`, etc.) must, by contract, only return 0 or 1. This means that passing the return code from the various `_ex` functions to `SSL_get_error` loses error reporting information, resulting in inconsistent handling of otherwise identical error cases between the two function families.

Another solution to the `_ex` functions' loss of error reporting information would be to change the contract of those functions to allow negative return values, directly returning the return code from internal calls to `SSL_read` or `SSL_write`. I decided not to take this approach as it would break callers who do not account for negative return values in their error handling.

# Relevant Links

- [OpenSSL 1.0.2 documentation for `SSL_get_error`][3]
- [OpenSSL 1.1.1 documentation for `SSL_get_error`][4]
- [OpenSSL 3.0 documentation for `SSL_get_error`][5]
- [OpenSSL 1.0.2 implementation for `SSL_get_error`][13]
- [OpenSSL 1.1.1 implementation for `SSL_get_error`][12]
- [OpenSSL 3.0 implementation for `SSL_get_error`][11]
- [OpenSSL master (3.2) implementation for `SSL_get_error`][10]
- [OpenSSL 1.1.1 commit dropping `ret < 0` guards for hint processing][6]
- [Unresolved OpenSSL issue for distinguishing EOF from failure in `BIO_read_ex`][1]
- [BoringSSL commit introducing `SSL_ERROR_SYSCALL` for `ret == 0` case][7]
- [BoringSSL Issue 503: Stop preserving arbitrary BIO error return values][2]
- [BoringSSL Issue 24: Sanitize SSL and BIO return codes.][15]
- [OpenSSL 3.0 commit returning `SSL_ERROR_SSL`, add reason to err stack on EOF][14]
- [OpenSSL 3.0 commit introducing `SSL_OP_IGNORE_UNEXPECTED_EOF`  option][8]
- [CPython commit from @davidben removing CPython's use of `SSL_OP_IGNORE_UNEXPECTED_EOF`][9]
- [CPython commit switching from `SSL_read` and `SSL_write` to `SSL_read_ex` and `SSL_write_ex`][16]
- [OpenSSL 1.1.1 documentation for `SSL_MODE_AUTO_RETRY`][17]



[1]: openssl/openssl#8208
[2]: https://bugs.chromium.org/p/boringssl/issues/detail?id=503
[3]: https://www.openssl.org/docs/man1.0.2/man3/SSL_get_error.html
[4]: https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html
[5]: https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html
[6]: openssl/openssl@8051ab2
[7]: google/boringssl@9a38e92
[8]: openssl/openssl@09b90e0
[9]: python/cpython#95495
[10]: https://github.com/openssl/openssl/blob/master/ssl/ssl_lib.c#L4601
[11]: https://github.com/openssl/openssl/blob/openssl-3.0/ssl/ssl_lib.c#L3839
[12]: https://github.com/openssl/openssl/blob/OpenSSL_1_1_1-stable/ssl/ssl_lib.c#L3617
[13]: https://github.com/openssl/openssl/blob/OpenSSL_1_0_2-stable/ssl/ssl_lib.c#L2709
[14]: openssl/openssl@d924dbf
[15]: https://bugs.chromium.org/p/boringssl/issues/detail?id=24
[16]: python/cpython@89d1550
[17]: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_mode.html
[18]: google/boringssl@fcf2583#diff-38e1eecb6013c2fe8bf194e5020ce1afd6e31c371d7849e041464ad822fe745bL2271
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants