Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StreamReader.readexactly() raises GeneratorExit on ProactorEventLoop #83297

Closed
twisteroidambassador mannequin opened this issue Dec 21, 2019 · 18 comments
Closed

StreamReader.readexactly() raises GeneratorExit on ProactorEventLoop #83297

twisteroidambassador mannequin opened this issue Dec 21, 2019 · 18 comments
Labels
pending The issue will be closed if no feedback is provided topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@twisteroidambassador
Copy link
Mannequin

twisteroidambassador mannequin commented Dec 21, 2019

BPO 39116
Nosy @asvetlov, @gsalgado, @1st1, @twisteroidambassador, @MatthewScholefield
Files
  • prettysocks.py
  • error_log_on_linux_python38.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-12-21.16:04:36.435>
    labels = ['type-bug', '3.8', '3.9', 'expert-asyncio']
    title = 'StreamReader.readexactly() raises GeneratorExit on ProactorEventLoop'
    updated_at = <Date 2020-11-29.00:12:06.904>
    user = 'https://github.com/twisteroidambassador'

    bugs.python.org fields:

    activity = <Date 2020-11-29.00:12:06.904>
    actor = 'matthew'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['asyncio']
    creation = <Date 2019-12-21.16:04:36.435>
    creator = 'twisteroid ambassador'
    dependencies = []
    files = ['49532', '49534']
    hgrepos = []
    issue_num = 39116
    keywords = []
    message_count = 5.0
    messages = ['358774', '378931', '379149', '379205', '382027']
    nosy_count = 5.0
    nosy_names = ['asvetlov', 'salgado', 'yselivanov', 'twisteroid ambassador', 'matthew']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39116'
    versions = ['Python 3.8', 'Python 3.9']

    @twisteroidambassador
    Copy link
    Mannequin Author

    twisteroidambassador mannequin commented Dec 21, 2019

    I have been getting these strange exception since Python 3.8 on my Windows 10 machine. The external symptoms are many errors like "RuntimeError: aclose(): asynchronous generator is already running" and "Task was destroyed but it is pending!".

    By adding try..except..logging around my code, I found that my StreamReaders would raise GeneratorExit on readexactly(). Digging deeper, it seems like the following line in StreamReader._wait_for_data():

    await self._waiter

    would raise a GeneratorExit.

    There are only two other methods on StreamReader that actually does anything to _waiter, set_exception() and _wakeup_waiter(), but neither of these methods were called before GeneratorExit is raised. In fact, both these methods sets self._waiter to None, so normally after _wait_for_data() does "await self._waiter", self._waiter is None. However, after GeneratorExit is raised, I can see that self._waiter is not None. So it seems the GeneratorExit came from nowhere.

    I have not been able to reproduce this behavior in other code. This is with Python 3.8.1 on latest Windows 10 1909, using ProactorEventLoop. I don't remember seeing this ever on Python 3.7.

    @twisteroidambassador twisteroidambassador mannequin added 3.8 only security fixes topic-asyncio type-bug An unexpected behavior, bug, or error labels Dec 21, 2019
    @twisteroidambassador
    Copy link
    Mannequin Author

    twisteroidambassador mannequin commented Oct 19, 2020

    This problem still exists on Python 3.9 and latest Windows 10.

    I tried to catch the GeneratorExit and turn it into a normal Exception, and things only got weirder from here. Often several lines later another await statement would raise another GeneratorExit, such as writer.write() or even asyncio.sleep(). Doesn't matter whether I catch the additional GeneratorExit or not, once code exits this coroutine a RuntimeError('coroutine ignored GeneratorExit') is raised. And it doesn't matter what I do with this RuntimeError, the outermost coroutine's Task always generates an 'asyncio Task was destroyed but it is pending!' error message.

    Taking a step back from this specific problem. Does a "casual" user of asyncio need to worry about handling GeneratorExits? Can I assume that I should not see GeneratorExits in user code?

    @twisteroidambassador twisteroidambassador mannequin added 3.9 only security fixes labels Oct 19, 2020
    @twisteroidambassador
    Copy link
    Mannequin Author

    twisteroidambassador mannequin commented Oct 20, 2020

    I have attached a script that should be able to reproduces this problem. It's not a minimal reproduction, but hopefully easy enough to trigger.

    The script is a SOCKS5 proxy server listening on localhost:1080. In its current form it does not need any external dependencies. Run it on Windows 10 + Python 3.9, set a browser to use the proxy server, and browse a little bit, it should soon start printing mysterious errors involving GeneratorExit.

    @twisteroidambassador
    Copy link
    Mannequin Author

    twisteroidambassador mannequin commented Oct 21, 2020

    Well this is unexpected, the same code running on Linux is throwing GeneratorExit-related mysterious exceptions as well. I'm not sure whether this is the same problem, but this one has a clearer traceback. I will attach the full error log, but the most pertinent part seems to be this:

    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "/usr/lib/python3.8/contextlib.py", line 662, in __aexit__
        cb_suppress = await cb(*exc_details)
      File "/usr/lib/python3.8/contextlib.py", line 189, in __aexit__
        await self.gen.athrow(typ, value, traceback)
      File "/opt/prettysocks/prettysocks.py", line 332, in closing_writer
        await writer.wait_closed()
      File "/usr/lib/python3.8/asyncio/streams.py", line 376, in wait_closed
        await self._protocol._get_close_waiter(self)
    RuntimeError: cannot reuse already awaited coroutine

    closing_writer() is an async context manager that calls close() and await wait_closed() on the given StreamWriter. So it looks like wait_closed() can occasionally reuse a coroutine?

    @matthewscholefield
    Copy link
    Mannequin

    matthewscholefield mannequin commented Nov 29, 2020

    Let me preface this by declaring that I am very new to Python async so it is very possible that I am missing something seemingly obvious. That being said, I've been looking at various resources to try to understand the internals of asyncio and it hasn't led to any insights on this problem thus far.
    -----------------

    This all sounds quite similar to an experience I am dealing with. I'm working with pub sub within aioredis which internally uses a StreamReader with a function equivalent to readexactly. This all started from debugging "Task was destroyed but it is pending!" to which attempted fixes led to multiple "RuntimeError: aclose(): asynchronous generator is already running" errors.

    I did the same thing, adding try excepts everywhere in my code to understand what was happening and this led me to identifying that a regular async function would raise GeneratorExit during await. However, even if I suppress this, the caller awaiting on this function would also raise a GeneratorExit. Suppressing this exception at the top level leads to an unsuspecting (to me) error "coroutine ignored GeneratorExit".

    I understand that GeneratorExit is raised in unfinished generators when garbage collected to handle cleanup. And I understand that async functions are essentially a generator in the sense that they yield when they await. So, if the entire coroutine were garbage collected this might trigger GeneratorExit in each nested coroutine. However, from all of my logging I am sure that prior to the GeneratorExit, nothing returns upwards so there should still be valid references to every object.

    I'll include some errors below, in case they may be of relevance:

    === Exception in await of inner async function ===

    Traceback (most recent call last):
      File ".../site-packages/uvicorn/protocols/http/httptools_impl.py", line 165, in data_received
        self.parser.feed_data(data)
      File "httptools/parser/parser.pyx", line 196, in httptools.parser.parser.HttpParser.feed_data
    httptools.parser.errors.HttpParserUpgrade: 858
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File ".../my_code.py", line 199, in wait_for_update
        return await self.waiter.wait_for_value()
    GeneratorExit

    === Exception when suppressing GeneratorExit on the top level ===

    Exception ignored in: <coroutine object parent_async_function at 0x0b...>
    Traceback (most recent call last):
      File ".../site-packages/websockets/protocol.py", line 229, in __init__
        self.reader = asyncio.StreamReader(limit=read_limit // 2, loop=loop)
    RuntimeError: coroutine ignored GeneratorExit

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kumaraditya303
    Copy link
    Contributor

    First the reproducer is very large, you should try to reduce it. I tried it and wasn't able to reproduce the reported errors. It is possible that is already fixed as this bug is 3 years old. @twisteroidambassador Can you try it on latest main?

    @kumaraditya303 kumaraditya303 added pending The issue will be closed if no feedback is provided and removed 3.9 only security fixes 3.8 only security fixes labels Sep 29, 2022
    @gvanrossum
    Copy link
    Member

    I wonder if the problem is solved by GH-96323? There was a task created by connection_made() (only when there was a callback for that event) that wasn't kept alive, so a poorly timed GC run would destroy the task.

    But the reproducer is definitely too big to definitively declare that this was the root cause.

    @twisteroidambassador
    Copy link
    Contributor

    Reproducing this bug has always been difficult. It happens maybe once or twice a day for my program that runs all day long. The only thing I can say for certain is that it still exists on Python 3.9.9 + Windows 10.

    I see that GH-96323 has been merged and backported to 3.10 and 3.11, but it haven't made into any release yet. I am going to build a 3.10 from git and see whether it still has this bug.

    @twisteroidambassador
    Copy link
    Contributor

    twisteroidambassador commented Oct 11, 2022

    I just caught this bug in 3.10.7 (official installer).

    The traceback is no help at all. It's only one layer deep, like this:

    Traceback (most recent call last):
      File "***", line ***, in ***
        buf = await ureader.readexactly(2)
    GeneratorExit
    

    It is happening when a freshly-connected StreamReader is being read for the first time (although the associated StreamWriter has been written to once already), so GH-96323 does sound plausible.

    EDIT: oh, and a few lines below:

    ERROR    asyncio Task was destroyed but it is pending!
    task: <Task pending name='Task-2850' coro=<*** done, defined at ***> wait_for=<Future pending cb=[Task.task_wakeup()]>>
    

    @gvanrossum
    Copy link
    Member

    Could you post the exact code to reproduce? The traceback by itself is (as you say) not helpful.

    @gvanrossum
    Copy link
    Member

    FWIW that error about task destroyed but pending could mean you have code that calls create_task() but doesn't hold onto the task object until it's completed. (We added a note about this to the docs recently.)

    @gvanrossum
    Copy link
    Member

    [me]

    Could you post the exact code to reproduce? The traceback by itself is (as you say) not helpful.

    Never mind, I realize now that you're observing this vry occasionally in a large production app. And that the 3.10.7 you're using there does not have the GH-96323 fix. And that the orphan task being reaped is an internal task (which that PR fixes). Also, as we all know by now from earlier comments, GeneratorExit comes from a coroutine (the result of calling an async function) getting closed. So this basically boils down to "no new info yet".

    @gvanrossum
    Copy link
    Member

    Also, Python 3.10.8 was just released, for an unrelated security fix, but it has the fix. So you can upgrade your production to that (I know, that's probably a pain) and eventually you can report that it's gone... Or not!

    @twisteroidambassador
    Copy link
    Contributor

    twisteroidambassador commented Oct 12, 2022

    FWIW that error about task destroyed but pending could mean you have code that calls create_task() but doesn't hold onto the task object until it's completed. (We added a note about this to the docs recently.)

    Thanks for pointing this out! I believe this may be my immediate problem. I do have a task that I start up and forget about around the problematic area.

    By having a background thread calling gc.collect() on a loop, I was able to trigger a GeneratorExit in relatively short time. I have added a strong reference to this task, and will see whether it fixes my issue.

    Side note: I'm having a hard time wrapping my head around the "task being gc'd" concept. Say I write a coroutine that does not explicitly refer to its own task (no calling asyncio.current_task() or anything similar). I create a Task from the coroutine, and does not keep a strong reference to it. Now the Task has no strong references at all. When might the Task be garbage collected? Does it matter whether the Task is currently being run by the event loop or not?

    Say I write another coroutine, which keeps a reference to asyncio.current_task(). I create a Task from this new coroutine, but still does not keep a strong reference to it otherwise. Now the Task has a circular reference to itself, so will it be garbage collected by a sufficiently motivated GC?

    In the case of GH-96323, the fix is to keep a reference of <Task _client_connected_cb> in the StreamReaderProtocol instance. This task is called with arguments (StreamReader, StreamWriter), so once the Task starts to run it will hold a circular reference to itself. All of these are in context of a asyncio.base_events.Server receiving a new connection, so the Task is started by asyncio machinery, and the programmer only gain access to this Task until it starts running, and only from within the Task itself.

    So, is there a possibility that a) the StreamReaderProtocol gets gc'd before <Task _client_connected_cb> starts running, or b) StreamReaderProtocol, <Task _client_connected_cb> and related instances all gets gc'd after the Task started but before it finishes, if the programmer does not explicitly save a strong reference to the Task somewhere?

    @twisteroidambassador
    Copy link
    Contributor

    I'll try to trace the creation of <Task _client_connected_cb> with a SelectorEventLoop:

    When an incoming connection is being accepted, the loop calls selector_events.BaseSelectorEventLoop._accept_connection which calls selector_events.BaseSelectorEventLoop._accept_connection2. It creates an instance of StreamReaderProtocol by calling protocol_factory, which is defined here. (StreamReaderProtocol holds a strong reference to StreamReader it needs.) Then a Transport is created (I'm only tracing the non-ssl tranport here), which holds a strong reference to the Protocol, and saves a strong reference of the Protocol and itself in the event loop via call_soon).

    The loop first runs streams.StreamReaderProtocol.connection_made. It creates a StreamWriter (which holds a strong reference to the Transport), then creates a Task with client_connected_cb, binding StreamReader and StreamWriter to the Task, and saves a strong reference of the Task (GH-96323). It then removes the Protocol's direct strong reference to StreamReader.

    The loop then runs selector_events._SelectorTransport._add_reader, binding the Transport to the event loop.

    So at this point, the event loop keeps the Transport alive, the Transport keeps the Protocol alive, the Protocol has a strong reference to <Task _client_connected_cb> which keeps StreamReader and StreamWriter alive. Sounds good. I believe the event loop should keep the Transport alive until the underlying connection is done?

    @gvanrossum
    Copy link
    Member

    I think that's correct. I've never heard of problems where transports or protocols are GC'ed prematurely -- AFAICT this is only an issue with Tasks. I don't actually know (because it's so hard to trace) what state the coroutine is in when this is happening, but it must be in a state where closing it will send a GeneratorExit exception into the coroutine (have fun with genobject.c if you want to trace this more precisely).

    I hope that at some point you can confirm that either updating to 3.10.8 or by adding a hard ref to your own task(s) the problem is gone for you.

    One final note, probably a red herring. In your attached error log I notice that one of your traceback lines occurs in concurrent/futures/thread.py -- that's a totally different kind of Future (though it has a similar API). If you're not being careful with threads (I can't tell from the error alone if you are or aren't), some very different caveats apply -- you should never do anything from a different thread than the one where the loop is running, except calling loop.call_soon_threadsafe().

    @twisteroidambassador
    Copy link
    Contributor

    Since making sure every task created has hard references, I haven't seen a single GeneratorExit in the past month. I think it's safe to declare the problem solved!

    @gvanrossum
    Copy link
    Member

    Thanks for checking!

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    pending The issue will be closed if no feedback is provided topic-asyncio type-bug An unexpected behavior, bug, or error
    Projects
    Status: Done
    Development

    No branches or pull requests

    3 participants