Bugfix: Send the TLS alert to the peer upon a `ssl.CertificateError`. by sasozivanovic · Pull Request #3413 · python-trio/trio

sasozivanovic · 2026-03-28T18:34:28Z

Disclaimer: I'm new to networking, async stuff, and especially SSL (just a regular programmer who wants to inform the client that their certificate expired), while SSLStream._retry is claimed to be "probably the single trickiest function in Trio" in the code comments. I tried my best, and learned a lot on the way, but this pull request really has to be reviewed, and thoroughly, by someone who knows very well what they are doing. 😁

This commit (partially) addresses python-trio/trio-websocket#199, which I traced to a bug in Trio itself. A corresponding bugfix relying on the present bugfix will be submitted to the Trio Websocket repo.

From the user's perspective, the issue was that a TLS client hanged after submitting an invalid (e.g. expired) client certificate.

Before this fix, when SSLStream._retry caught a ssl.CertificateError, the error was immediately re-raised (wrapped in a trio.BrokenResourceError). The TLS alert prepared by the ssl library and waiting in MemoryBIO was therefore never sent to the peer.

Now, upon catching a ssl.CertificateError, we first check whether we have any pending outgoing data (self._outgoing.pending). If so, the exception is stashed away and only raised (again, wrapped in a trio.BrokenResourceError) after sending out the pending data (which should contain the alert).

(I had first tried an alternative implementation, where the CertificateError was not stashed away. Rather, the error was raised immediately, but only if there was no pending outgoing data. The idea was to rely on the loop in SSLStream._retry. Upon seeing CertificateError for the first time, there would be pending data, so the exception would not be reraised and the pending data would be sent out, while upon seeing the CertificateError for the second time, there would be no pending data, so the exception would be re-raised. However, it turned out that the loop did not always continue (I'm not sure why), so there was no second time in some situations.)

Tests in `test_ssl.py`

`test_ssl_client_basics` (modified)

Here we test whether the TLS alert sent out by the client reaches the (blocking) server.

The second (no CA file) and the third (wrong host name) subtest of this test were modified to check that the server encounters the correct SSL error. In the old code, the server encountered UNEXPECTED_EOF_WHILE_READING (protocol error) in both subtests. After the fix, it correctly receives TLSV1_ALERT_UNKNOWN_CA and SSLV3_ALERT_BAD_CERTIFICATE, respectively.

To facilitate the modified test, function ssl_echo_serve_sync (called by ssl_echo_server_raw called by this test) now allows for a special value of keyword argument expect_fail: "raise". When given this value, the error is expected but raised nevertheless, the idea being that it should be caught and inspected in the test code.

`test_client_certificate` (new)

Here we test whether the TLS alert sent out by the server reaches the (blocking) client. The test is modeled on test_ssl_server_basics.

The server is configured to require client authentication (SERVER_CTX_REQ, defined at the top of the file). In the first subtest, the client submits a valid certificate; in the second subtest, an expired one.

There is a complication with the second subtest. If the client does not send out the TLS alert (like before the bugfix), the server hangs, but we don't want the test to hang. I could think of no other way to test whether the server hangs than imposing an arbitrary (small) timeout, and there is a (very) small chance that even the correct code will not finish within the allotted time, resulting in a false negative.

This commit (partially) addresses python-trio/trio-websocket#199, which I traced to a bug in Trio itself. A corresponding bugfix relying on the present bugfix will be submitted to the Trio Websocket repo. From the user's perspective, the issue was that a TLS client hanged after submitting an invalid (e.g. expired) client certificate. Before this fix, when `SSLStream._retry` caught a `ssl.CertificateError`, the error was immediately re-raised (wrapped in a `trio.BrokenResourceError`). The TLS alert prepared by the `ssl` library and waiting in MemoryBIO was therefore never sent to the peer. Now, upon catching a `ssl.CertificateError`, we first check whether we have any pending outgoing data (`self._outgoing.pending`). If so, the exception is stashed away and only raised (again, wrapped in a `trio.BrokenResourceError`) after sending out the pending data (which should contain the alert). I had first tried an alternative implementation, where the `CertificateError` was not stashed away. Rather, the error was raised immediately, but only if there was no pending outgoing data. The idea was to rely on the loop in `SSLStream._retry`. Upon seeing `CertificateError` for the first time, there would be pending data, so the exception would not be reraised and the pending data would be sent out, while upon seeing the `CertificateError` for the second time, there would be no pending data, so the exception would be re-raised. However, it turned out that the loop did not always continue (I'm not sure why), so there was no second time in some situations. TESTS in `test_ssl.py` `test_ssl_client_basics` (modified) Here we test whether the TLS alert sent out by the client reaches the (blocking) server. The second (no CA file) and the third (wrong host name) subtest of this test were modified to check that the server encounters the correct SSL error. In the old code, the server encountered `UNEXPECTED_EOF_WHILE_READING` (protocol error) in both subtests. After the fix, it correctly receives `TLSV1_ALERT_UNKNOWN_CA` and `SSLV3_ALERT_BAD_CERTIFICATE`, respectively. To facilitate the modified test, function `ssl_echo_serve_sync` (called by `ssl_echo_server_raw` called by this test) now allows for a special value of keyword argument `expect_fail`: `"raise"`. When given this value, the error is expected but raised nevertheless, the idea being that it should be caught and inspected in the test code. `test_client_certificate` (new) Here we test whether the TLS alert sent out by the server reaches the (blocking) client. The test is modeled on `test_ssl_server_basics`. The server is configured to require client authentication (`SERVER_CTX_REQ`, defined at the top of the file). In the first subtest, the client submits a valid certificate; in the second subtest, an expired one. There is a complication with the second subtest. If the client does not send out the TLS alert (like before the bugfix), the server hangs, but we don't want the test to hang. I could think of no other way to test whether the server hangs than imposing an arbitrary (small) timeout, and there is a (very) small chance that even the correct code will not finish within the allotted time, resulting in a false negative.

codecov · 2026-03-28T18:40:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00000%. Comparing base (82af3ab) to head (71fe831).

Additional details and impacted files

@@               Coverage Diff               @@
##                 main        #3413   +/-   ##
===============================================
  Coverage   100.00000%   100.00000%           
===============================================
  Files             128          128           
  Lines           19417        19482   +65     
  Branches         1318         1321    +3     
===============================================
+ Hits            19417        19482   +65

Files with missing lines	Coverage Δ
src/trio/_ssl.py	`100.00000% <100.00000%> (ø)`
src/trio/_tests/test_ssl.py	`100.00000% <100.00000%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sasozivanovic · 2026-03-28T18:50:09Z

On the coverage report. The missing code will never be reached assuming that every ssl.CertificateError produces pending outgoing data (a TLS alert). I have no idea whether this is the case.

This commit relies on the bugfix proposed in PR python-trio/trio#3413 to address issue python-trio#199. From the user's perspective, the issue was that a TLS client hanged after submitting an invalid (e.g. expired) client certificate. The trio bugfix makes sure that the server (in general, peer) sends out the TLS alert after catching a `ssl.CertificateError`. In this commit, we do two things. First, we intercept `ssl.SSLError`s created by the alerts when receiving data from the stream (`WebSocketConnection._reader_task`) and sending data to the stream (`WebSocketConnection._send`), and wrap them into a `HandshakeError`. Second, we catch and ignore these `HandshakeError`s in `WebSocketServer._handle_connection`, so that a certificate error does not crash the server. Here, I'm unsure whether we should ignore *all* SSL errors (currently, this is the case), or maybe only those generated by alerts, or only those having to do with certificates, and neither do I know how to compile a comprehensive list of the latter, if that is what should be done.

A5rocks · 2026-03-28T19:39:11Z

(similar to you, I don't especially know much about networking....)

I could think of no other way to test whether the server hangs than imposing an arbitrary (small) timeout, and there is a (very) small chance that even the correct code will not finish within the allotted time, resulting in a false negative.

You can drop this -- we already have general timeouts for tests. We have other tests which do similar evil things (specifically the REPL, if I remember correctly).

However, it turned out that the loop did not always continue (I'm not sure why), so there was no second time in some situations

Huh I'm surprised the issue isn't instead that CertificateError always adds pending data, meaning an infinite loop! Strange.

On the coverage report. The missing code will never be reached assuming that every ssl.CertificateError produces pending outgoing data (a TLS alert). I have no idea whether this is the case.

IMO add an assert instead of handling the case, even though the behavior seems right.

Looks like CI is failing! Also please rename the newsfragment, I think 3413 instead of NNNN should work. Thanks for the PR!

According to the coverage report, `self._outgoing.pending` is always non-empty.

`datetime.UTC` was only introduced in Python 3.11, so the CIs were failing on setups with Python 3.10.

The additional semantics for `expect_fail` still seems less invasive than a new keyword argument like `silent`.

sasozivanovic · 2026-03-29T06:41:37Z

Thanks for the quick reply!

I could think of no other way to test whether the server hangs than imposing an arbitrary (small) timeout, and there is a (very) small chance that even the correct code will not finish within the allotted time, resulting in a false negative.

You can drop this -- we already have general timeouts for tests. We have other tests which do similar evil things (specifically the REPL, if I remember correctly).

I'm not sure about this. The general timeout is not only rather long (1 minute), it also doesn't do the job: the test client works in a thread, but this thread is not abandoned upon the general timeout, so the test hangs.

I followed the rest of your suggestions — thanks for the ideas and reminders! I also looked at the CI reports, and I managed to fix all issues but one (occuring twice):

src/trio/_tests/test_ssl.py:482: error: Incompatible types in assignment (expression has type "ExceptionInfo[BrokenResourceError]", variable has type "ExceptionInfo[ExceptionGroup[SSLError]]") [assignment]

I first thought this was an issue with exception group nesting, so I switched from with pytest.raises(BrokenResourceError) to pytest.RaisesGroup(BrokenResourceError, allow_unwrapped=True, flatten_subgroups=True) In 02aee02, but this was a wrong move. The real problem is BrokenResourceError vs SSLError. I find this very weird: CI / Ubuntu (3.13, check formatting) is the only check which complains ... and if I switch from BrokenResourceError to SSLError, the test doesn't pass on my machine — which I understand: expecting a BrokenResourceError is what makes sense, because ._reply (called by .send_all) should wrap the SSLError into a BrokenResourceError. I'll look more into this, but help would be greatly appreciated.

A5rocks · 2026-03-29T06:47:53Z

My first guess about the 3 mypy errors (not looking at the code) is that the first is because you forgot to explicitly provide a type somewhere, and the two you mention I suspect are because the variable name is reused? (mypy is very picky that variables always mean the same type! at least by default, I forget if we changed the right settings...)

The general timeout is not only rather long (1 minute), it also doesn't do the job: the test client works in a thread, but this thread is not abandoned upon the general timeout, so the test hangs.

Hm, yeah, I suppose. There's other tests that hang rather than fail, though that's definitely a worse experience (but... as long as CI fails if this is broken, all's well!). I guess it's up to you, though make sure to size the timeout large enough because the free GitHub Actions runners we use are... pretty bad -- a sleep(1) can sleep 2 seconds!

A5rocks

Hopefully these specific suggestions help? It's possible there's a mypy bug or something weirder going on...

src/trio/_ssl.py

src/trio/_tests/test_ssl.py

sasozivanovic · 2026-03-29T07:07:41Z

@A5rocks Many thanks, I'll look into this right away!

sasozivanovic · 2026-03-29T07:39:07Z

@A5rocks Fantastic, your suggestions worked like a charm!

sasozivanovic · 2026-03-29T07:45:25Z

The general timeout is not only rather long (1 minute), it also doesn't do the job: the test client works in a thread, but this thread is not abandoned upon the general timeout, so the test hangs.

Hm, yeah, I suppose. There's other tests that hang rather than fail, though that's definitely a worse experience (but... as long as CI fails if this is broken, all's well!). I guess it's up to you, though make sure to size the timeout large enough because the free GitHub Actions runners we use are... pretty bad -- a sleep(1) can sleep 2 seconds!

I'm thinking, maybe increase the timeout from 0.1s to 1s, just to be on the safe side. The thing is, this is not a timeout on the entire test, but on a very specific part of it, between the client receiving the TLS alert and setting the trio.Event.

sasozivanovic mentioned this pull request Mar 28, 2026

Re-raise SSLErrors produced by TLS alerts as HandshakeErrors. python-trio/trio-websocket#201

Open

sasozivanovic added 6 commits March 29, 2026 07:05

Convert the conditional into the assertion.

34bd8d8

According to the coverage report, `self._outgoing.pending` is always non-empty.

Rename the news fragment.

284ded4

Added a comment on the timeout in test_client_certificate.

1356ae5

Tests: use datetime.timezone.utc instead of datetime.UTC.

86b33f3

`datetime.UTC` was only introduced in Python 3.11, so the CIs were failing on setups with Python 3.10.

Tests: Add | str to the annotations of expect_fail.

dc74283

The additional semantics for `expect_fail` still seems less invasive than a new keyword argument like `silent`.

test_ssl_client_basics: use RaisesGroup instead of raises.

02aee02

A5rocks reviewed Mar 29, 2026

View reviewed changes

src/trio/_ssl.py Outdated Show resolved Hide resolved

src/trio/_tests/test_ssl.py Outdated Show resolved Hide resolved

src/trio/_tests/test_ssl.py Outdated Show resolved Hide resolved

src/trio/_tests/test_ssl.py Outdated Show resolved Hide resolved

Another attempt to fix type checking.

71fe831

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix: Send the TLS alert to the peer upon a `ssl.CertificateError`.#3413

Bugfix: Send the TLS alert to the peer upon a `ssl.CertificateError`.#3413
sasozivanovic wants to merge 8 commits intopython-trio:mainfrom
sasozivanovic:ssl_errors

sasozivanovic commented Mar 28, 2026

Uh oh!

codecov bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

sasozivanovic commented Mar 28, 2026

Uh oh!

A5rocks commented Mar 28, 2026

Uh oh!

sasozivanovic commented Mar 29, 2026

Uh oh!

A5rocks commented Mar 29, 2026 •

edited

Loading

Uh oh!

A5rocks left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sasozivanovic commented Mar 29, 2026

Uh oh!

sasozivanovic commented Mar 29, 2026 •

edited

Loading

Uh oh!

sasozivanovic commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sasozivanovic commented Mar 28, 2026

Tests in test_ssl.py

test_ssl_client_basics (modified)

test_client_certificate (new)

Uh oh!

codecov bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sasozivanovic commented Mar 28, 2026

Uh oh!

A5rocks commented Mar 28, 2026

Uh oh!

sasozivanovic commented Mar 29, 2026

Uh oh!

A5rocks commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

A5rocks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sasozivanovic commented Mar 29, 2026

Uh oh!

sasozivanovic commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sasozivanovic commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tests in `test_ssl.py`

`test_ssl_client_basics` (modified)

`test_client_certificate` (new)

codecov bot commented Mar 28, 2026 •

edited

Loading

A5rocks commented Mar 29, 2026 •

edited

Loading

sasozivanovic commented Mar 29, 2026 •

edited

Loading