Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Reduce danger of accidental exception suppression #30
Consider the following code. It consists of an echo server with a bug, and a unit test for the server:
#!/usr/bin/env python3 import pytest import pytest_trio import trio import logging log = logging.getLogger(__name__) PORT = 12346 BUFSIZE = 16384 async def echo_server(server_sock): try: with server_sock: await server_sock.recv(BUFSIZE) raise RuntimeError('Damn!') except Exception as exc: await trio.sleep(0.1) # make problem deterministic log.exception('Things went wrong...') async def echo_listener(nursery): with trio.socket.socket() as listen_sock: await listen_sock.bind(("127.0.0.1", PORT)) listen_sock.listen() while True: server_sock, _ = await listen_sock.accept() nursery.start_soon(echo_server, server_sock) async def parent(): async with trio.open_nursery() as nursery: nursery.start_soon(echo_listener, nursery) @pytest.mark.trio async def test_server(nursery): nursery.start_soon(parent) await trio.sleep(0.5) # wait for server to listen with trio.socket.socket() as client_sock: await client_sock.connect(("127.0.0.1", PORT)) await client_sock.send(b'hello, world') resp = await client_sock.recv(BUFSIZE) assert resp == b'hello, world'
As expected, the unittest fails - but the root cause is nowhere to be seen:
If we force the test to wait a moment before failing...
@pytest.mark.trio async def test_server(nursery): nursery.start_soon(parent) await trio.sleep(0.5) # wait for server to listen try: with trio.socket.socket() as client_sock: await client_sock.connect(("127.0.0.1", PORT)) await client_sock.send(b'hello, world') resp = await client_sock.recv(BUFSIZE) assert resp == b'hello, world' finally: await trio.sleep(0.5)
...we get much better results:
I believe the problem is that the failing test causes trio to cancel the server tasks before it has a chance to log its exception. Note that the call to
I am not sure how to best fix this. Maybe the trio test runner could, if the test fails, give the other nursery task a short grace period in which they have a chance to report any errors before cancelling them?
So first, as an FYI, your example code can be simplified a lot by using Trio's high-level networking interface – it's kinda new and not documented as well as it could be yet, so I'm not surprised you missed it, but this is definitely the recommended way to go:
async def echo_server(server_stream): try: # Normally, this 'with' isn't really necessary, because serve_tcp will automatically # close server_stream when the echo_server function returns. But it is needed to # trigger the problem that this issue is about :-). async with server_stream: await server_stream.receive_some(BUFSIZE) raise RuntimeError('Damn!') except Exception as exc: await trio.sleep(0.1) # make problem deterministic log.exception('Things went wrong...') @pytest.mark.trio async def test_server(nursery): await nursery.start(partial(trio.serve_tcp, PORT, echo_server, host="127.0.0.1")) # nursery.start and trio.serve_tcp coordinate to make sure that we don't continue # until the server is ready to accept connections, so no need to sleep here async with trio.open_tcp_stream("127.0.0.1", PORT) as client_stream: await client_stream.send_all(b'hello, world') resp = await client_stream.receive_some(BUFSIZE) assert resp == b'hello, world'
This also lets makes it easy to handle IPv6 and SSL by just changing the serve_tcp call. You could also let the OS automatically pick a free port by making it:
listeners = await nursery.start(partial(trio.serve_tcp, 0, echo_server, host="127.0.0.1")) async with trio.testing.open_stream_to_socket_listener(listeners) as client_stream: ...
I think I'm also required by the ancient laws governing socket programmers to point out that this test is broken in any case, because you can't assume that the call to
Anyway, none of that has to do with your actual problem, I just wanted to point that out :-).
So regarding your actual issue, what's happening here is:
This is tricky!
The key features of the server that make this happen seem to be: it raises an error, the error handling ends up sending a message to the client (in this case by closing the socket), and then after that the error handling tries to report something, but in between those it has a checkpoint where it can be cancelled.
So in your original example, the call to
Though you could fix that case too, by moving the
Really this problem is just inherent in how exceptions work. If you block inside an
Maybe what we really need here is some kind of better tracking of exception
I agree that the only plausible workaround at the pytest-trio level is do make the test running code look something like:
try: await actual_test_fn(**kwargs) except Exception: await trio.sleep(0.1) raise
I am... really unsure about whether this is a good idea. Running tests fast is really important, and adding a sleep to every failure will slow down everyone, even though I suspect that this particular case is quite rare. Though that's another thing that makes me nervous: we have no idea how often this happens or any way to figure out whether this hack is actually helping. Is 0.1 seconds even long enough? Is it way too long? How do we know?
Yeah, I know it can be simplified and improved. The goal was to quickly get a minimal example that reproduces the issue.
I agree, the core of the problem is actually not related to pytest at all but a result of the way that exceptions and nurseries work. If you have a nursery that starts two tasks, and the interaction of these tasks results in an exception being raised in each class, then only one exception is guaranteed to make it out of the nursery. The second one has to race against the
That said, why wouldn't it be feasible to introduce an optional grace period for task cancellation - at least in the context of testing? In other words, if one task terminates with exception, wait up to TIMEOUT seconds for other tasks to raise exceptions too, and only afterwards send the cancellation request. There is no good value for
It could be done, but I'd really rather avoid doing that if at all possible. Almost no-one's going to know that this mode exists, and for those who do, the only way to realize you should turn it on is if you've already figured out what's happening :-).
See python-trio/trio#416 for another idea.
I got confronted to this trouble in my own project.
The solution I came up with is to wrap the socket used in the tests to catch the
Then the code responsible to start the server and the connection to it is packaged in an async context manager that will catch the custom BrokenStreamError and wait forever instead of re-raising it.
This way we can be sure the exception coming from the server will be the one that will bubble up to the user.