Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[document removal of] the deprecated 'loop' parameter asyncio API in 3.10 #86558

Closed
1st1 opened this issue Nov 17, 2020 · 37 comments
Closed

[document removal of] the deprecated 'loop' parameter asyncio API in 3.10 #86558

1st1 opened this issue Nov 17, 2020 · 37 comments
Labels
3.10 only security fixes docs Documentation in the Doc dir topic-asyncio type-feature A feature request or enhancement

Comments

@1st1
Copy link
Member

1st1 commented Nov 17, 2020

BPO 42392
Nosy @terryjreedy, @gpshead, @asvetlov, @1st1, @miss-islington, @aeros, @uriyyo, @Fidget-Spinner
PRs
  • bpo-42392: Remove loop parameter form asyncio locks and Queue #23420
  • bpo-42392: Update code after merge review from 1st1 #23499
  • bpo-42392: Remove loop parameter from asyncio.streams #23517
  • bpo-42392: Remove loop parameter from asyncio.tasks and asyncio.subprocess #23521
  • bpo-42392: Remove deprecated loop parameter from docs #23552
  • bpo-42392: Mention loop removal in whatsnew for 3.10 #24256
  • bpo-42392: [docs] Add deprecated-removed loop labels for asyncio #26357
  • [3.10] bpo-42392: [docs] Add deprecated-removed loop labels for asyncio (GH-26357) #26390
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-05-26.22:29:40.913>
    created_at = <Date 2020-11-17.19:04:05.483>
    labels = ['type-feature', 'docs', '3.10', 'expert-asyncio']
    title = "[document removal of] the deprecated 'loop' parameter asyncio API in 3.10"
    updated_at = <Date 2021-05-26.22:29:40.913>
    user = 'https://github.com/1st1'

    bugs.python.org fields:

    activity = <Date 2021-05-26.22:29:40.913>
    actor = 'gregory.p.smith'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2021-05-26.22:29:40.913>
    closer = 'gregory.p.smith'
    components = ['Documentation', 'asyncio']
    creation = <Date 2020-11-17.19:04:05.483>
    creator = 'yselivanov'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42392
    keywords = ['patch']
    message_count = 36.0
    messages = ['381275', '381276', '381277', '381278', '381304', '381305', '381306', '381310', '381329', '381370', '381372', '381382', '381385', '381389', '381451', '381456', '381766', '381771', '381776', '381780', '381794', '381822', '381823', '381882', '381883', '381979', '381985', '382060', '382079', '382133', '385383', '394270', '394345', '394474', '394477', '394480']
    nosy_count = 9.0
    nosy_names = ['terry.reedy', 'gregory.p.smith', 'asvetlov', 'docs@python', 'yselivanov', 'miss-islington', 'aeros', 'uriyyo', 'kj']
    pr_nums = ['23420', '23499', '23517', '23521', '23552', '24256', '26357', '26390']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue42392'
    versions = ['Python 3.10']

    @1st1
    Copy link
    Member Author

    1st1 commented Nov 17, 2020

    asyncio.Lock and other primitives should no longer accept the loop parameter. They should also stop storing the current loop in the self._loop attribute. Instead, they should use get_running_loop() whenever they need to access the loop.

    @1st1 1st1 added 3.10 only security fixes topic-asyncio labels Nov 17, 2020
    @1st1
    Copy link
    Member Author

    1st1 commented Nov 17, 2020

    (or alternatively they can cache the running loop, but the first loop lookup should be performed with asyncio.get_running_loop())

    @1st1
    Copy link
    Member Author

    1st1 commented Nov 17, 2020

    Kyle, lmk if you want to work on this.

    @asvetlov
    Copy link
    Contributor

    This was on my to-do list but I very much appreciate if somebody champions this issue.
    I should finish sslproto PR first.

    @aeros
    Copy link
    Contributor

    aeros commented Nov 17, 2020

    Sure, I would be interested in helping with this. Although if a newer contributor takes it up before I'm able to, I wouldn't be opposed to that either (my main priority at the moment is helping with PEP-594 since it's a concrete goal of my internship w/ the PSF, but I should have some time to work on this as well).

    As far as I can tell though, there's currently a similar PR open: #18195 . This attempts to deprecate the loop argument and creating primitives such as asyncio.Lock outside of a running event loop with the following approach:

    def __init__(self, *, loop=None):
            self._waiters = None
            self._locked = False
            if loop is None:
                self._loop = events._get_running_loop()
                if self._loop is None:
                    warnings.warn("The creation of asyncio objects without a running "
                                  "event loop is deprecated as of Python 3.9.",
                                  DeprecationWarning, stacklevel=2)
                    self._loop = events.get_event_loop()
            else:
                    warnings.warn("The loop argument is deprecated since Python 3.8, "
                              "and scheduled for removal in Python 3.10.",
                              DeprecationWarning, stacklevel=2) 
    

    So, do we want to add more strictness to that with always using get_running_loop() to access the event loop each time instead of accessing self._loop, and effectively ignore the user-supplied one? Presumably, this would start with a warning for passing a loop arg and then be removed entirely as a parameter ~two versions later.

    (or alternatively they can cache the running loop, but the first loop lookup should be performed with asyncio.get_running_loop())

    AFAIK, at the C-API extension level, get_running_loop() already caches the running loop in cached_running_holder. (

    rl = cached_running_holder; // borrowed
    ). So from a performance perspective, wouldn't it effectively be the same if we repeatedly use get_running_loop() to access the same event loop? I think it also adds a nice integrity check to be certain that the primitive wasn't initially created within a running event loop, and then later accessed outside of one.

    The only concern that I can see with this approach is that users could potentially create a primitive in one running event loop and then access it in a separate loop running in a different thread (without using something like self._loop, the primitive would no longer be associated with a specific event loop and could potentially used within *any* running event loop). I'm not entirely certain if that is a real problem though, and if anything, it seems like it could prove to be useful in some multi-loop environment. I just want to make sure that it's intended.

    @1st1
    Copy link
    Member Author

    1st1 commented Nov 17, 2020

    Oh my. FWIW I think that we need to implement this differently. I don't think it matters where, say, an asyncio.Lock was instantiated. It can be created anywhere. So IMO its init shouldn't try to capture the current loop -- there's no need for that. The loop can be and should be captured when the first await lock.acquire() is called.

    I'm writing a piece of code right now that would need to jump through the hoops to simply create a new asyncio.Lock() in a place where there's no asyncio loop yet.

    @aeros
    Copy link
    Contributor

    aeros commented Nov 18, 2020

    Oh my. FWIW I think that we need to implement this differently. I don't think it matters where, say, an asyncio.Lock was instantiated. It can be created anywhere. So IMO its init shouldn't try to capture the current loop -- there's no need for that. The loop can be and should be captured when the first await lock.acquire() is called.

    That's good to know and I think more convenient to work with, so +1 from me. I guess my remaining question though is whether it's okay to await lock.acquire() on a single lock instance from multiple different running event loops (presumably each in their own separate threads) or if there's a need to restrict it to only one event loop.

    I'm writing a piece of code right now that would need to jump through the hoops to simply create a new asyncio.Lock() in a place where there's no asyncio loop yet.

    From what I've seen of asyncio user code, it seems reasonably common to create async primitives (lock, semaphore, queue, etc.) in the init for some class prior to using the event loop, which would fail with usage of get_running_loop() in the init for the primitives. So, if it's not an issue to wait until accessing the event loop until it's actually needed (e.g. in the lock.acquire() or queue.get/put()), I think we should definitely try to be conscious about when we call get_running_loop() going forward to ensure we're not imposing arbitrary inconveniences on users.

    @1st1
    Copy link
    Member Author

    1st1 commented Nov 18, 2020

    That's good to know and I think more convenient to work with, so +1 from me. I guess my remaining question though is whether it's okay to await lock.acquire() on a single lock instance from multiple different running event loops (presumably each in their own separate threads) or if there's a need to restrict it to only one event loop.

    No, it's not OK to use one lock across multiple loops at the same time. But most asyncio code out there doesn't have protections against that, and we never advertised that having multiple loops run in parallel is a good idea. So while we could build protections against that, I'm not sure its needed.

    Andrew, thoughts?

    From what I've seen of asyncio user code, it seems reasonably common to create async primitives (lock, semaphore, queue, etc.) in the init for some class prior to using the event loop, which would fail with usage of get_running_loop() in the init for the primitives. So, if it's not an issue to wait until accessing the event loop until it's actually needed (e.g. in the lock.acquire() or queue.get/put()), I think we should definitely try to be conscious about when we call get_running_loop() going forward to ensure we're not imposing arbitrary inconveniences on users.

    Yep. This sums up how I think of this now.

    @asvetlov
    Copy link
    Contributor

    My initial thought was protecting the Lock (and other primitives) creation when a loop is not running.

    Yuri insists that Lock can be created without a loop. Technically it is possible, sure.
    But the lock is tightly coupled with a loop instance. In other words, the loop belongs to the loop.
    The lock cannot be used after the loop dies (stopped and closed).

    Thus, the global scoped lock object still looks suspicious in my eyes.
    The lock's lifecycle should be closer than the lifecycle of the loop, isn't it? I know, asyncio.Lock() can safely live after the loop closing but should we encourage this style? There are many other asyncio objects like HTTP clients and DB drivers that should be closed before the loop finishing for graceful closing TCP transports etc.

    Another thing to consider is: whether to cache a loop inside a lock; whether to add a check when the lock is used by two loops?

    I think for the last question the answer is "yes". I recall many questions and bug reports on StackOverflow and aiohttp bug tracker when people use the multithreaded model for some reason and tries to get access to a shared object from different threads that executes each own loop.

    The check becomes extremely important for synchronization primitives like asyncio.Lock class; threading.Lock is supposed to be shared between threads and users can apply the same pattern for asyncio.Lock by oversight.
    Also, supporting global asyncio.Lock instances increase the chance that the lock is used by different loops.

    @1st1
    Copy link
    Member Author

    1st1 commented Nov 18, 2020

    Yeah, I follow the reasoning.

    My use case:

       class Something:
          def __init__(self):
              self._lock = asyncio.Lock()
    
          async def do_something():
              async with self._lock:
                  ...

    And Something won't be created in a coroutine. So now I have to jump through the hoops to implement lazy lock instantiation.

    But the lock is tightly coupled with a loop instance. In other words, the loop belongs to the loop.
    The lock cannot be used after the loop dies (stopped and closed).

    I agree. Maybe the solution should be this then:

    class asyncio.Lock:
    
           def _get_loop(self):
               loop = asyncio.get_running_loop()
               if self._loop is None:
                    self._loop = loop
               if loop is not self._loop: raise
               if not loop.is_running(): raise
    
           async def acquire(self):
               loop = self._get_loop()
               ...

    This is what would guarantee all protections you want and would also allow to instantiate asyncio.Lock in class constructors, simplifying code.

    @asvetlov
    Copy link
    Contributor

    Despite the fact that asyncio.get_running_loop() never returns None but raises RuntimeError (and maybe other tiny cleanups),
    I can live with the proposal.

    It doesn't make a system worse at least and backward compatible.
    We can return to the idea of raising a warning from the constructor later, on collecting more feedback.

    P.S.
    There is a subtle non-deterministic behavior in the proposal: if the lock is accessed from a concurrent thread, the exception about wrong usage will be raised later at an arbitrary code point.
    This is a well-known problem of the lazy initialization pattern and maybe we should do nothing with it. The programming is a compromise always.

    @1st1
    Copy link
    Member Author

    1st1 commented Nov 18, 2020

    Despite the fact that asyncio.get_running_loop() never returns None but raises RuntimeError (and maybe other tiny cleanups),

    It never raises when called from inside a coroutine. And I propose to do that. So it will never fail.

    @aeros
    Copy link
    Contributor

    aeros commented Nov 19, 2020

    Regarding the example _get_loop():

           def _get_loop(self):
               loop = asyncio.get_running_loop()
               if self._loop is None:
                    self._loop = loop
               if loop is not self._loop: raise
               if not loop.is_running(): raise
    

    Would this be added to all asyncio primitives to be called anytime a reference to the loop is needed within a coroutine?

    Also, regarding the last line "if not loop.is_running(): raise" I'm not 100% certain that I understand the purpose. Wouldn't it already raise a RuntimeError from asyncio.get_running_loop() if the event loop wasn't running?

    The only thing I can think of where it would have an effect is if somehow the event loop was running at the start of _get_loop() and then the event loop was stopped (e.g. a loop in an alternative thread was stopped by the main thread while the alternative thread was in the middle of executing _get_loop()). But to me, that seems to be enough of an edge case to simplify it to the following:

           def _get_loop(self):
               loop = asyncio.get_running_loop()
               if self._loop is None:
                    self._loop = loop
               if loop is not self._loop: raise
    

    (Unless you intended for the first line loop = asyncio.get_running_loop() to instead be using the private asyncio._get_running_loop(), which returns None and doesn't raise. In that case, the original would be good to me.)

    Other than that, I think the approach seems solid.

    @asvetlov
    Copy link
    Contributor

    Perhaps Kyle is right, I had a misunderstanding with get_running_loop() vs _get_running_loop().

    The last version seems good except for the rare chance of race condition.

    The safe code can look like:

    global_lock = threading.Lock()  like GIL
    
           def _get_loop(self):
               loop = asyncio.get_running_loop()
               if self._loop is None:
                    # the lock is required because
                    # the thread switch can happen
                    # between `self._loop is None` check
                    # and `self._loop = loop` assignment
                    with global_lock:
                        if self._loop is not None:
                            self._loop = loop
               if loop is not self._loop: raise

    The alternative is using the fast C atomic compare_and_swap function
    which is executed under the hold GIL.
    We need the pure-Python fallback anyway.

    Multithreading is hard...

    @1st1
    Copy link
    Member Author

    1st1 commented Nov 19, 2020

    The safe code can look like:

    global_lock = threading.Lock() like GIL

    SGTM. We can use this strategy for all synchronization primitives and for objects like asyncio.Queue.

    @asvetlov
    Copy link
    Contributor

    Perfect!

    We have a consensus now and waiting for a champion who propose a Pull Request.

    @asvetlov
    Copy link
    Contributor

    New changeset 0ec34ca by Yurii Karabas in branch 'master':
    bpo-42392: Remove loop parameter form asyncio locks and Queue (bpo-23420)
    0ec34ca

    @asvetlov asvetlov added the type-feature A feature request or enhancement label Nov 24, 2020
    @asvetlov asvetlov added the type-feature A feature request or enhancement label Nov 24, 2020
    @1st1
    Copy link
    Member Author

    1st1 commented Nov 24, 2020

    Sorry, there are a few things in the committed patch that should be fixed. See the PR for my comments.

    @1st1 1st1 reopened this Nov 24, 2020
    @1st1 1st1 reopened this Nov 24, 2020
    @uriyyo
    Copy link
    Member

    uriyyo commented Nov 24, 2020

    Is there anyone who is assigned to removing the deprecated loop parameter from asyncio codebase?

    If not I can take this task, I believe I have enough free time and curiosity to do that :)

    @aeros
    Copy link
    Contributor

    aeros commented Nov 24, 2020

    Is there anyone who is assigned to removing the deprecated loop parameter from asyncio codebase?

    If not I can take this task, I believe I have enough free time and curiosity to do that :)

    You can certainly feel free to work on that and it would definitely be appreciated! However, I would recommend splitting it into several PRs, basically as "Remove *loop* parameter from x` rather than doing a massive PR that removes it from everywhere that it was deprecated. This makes the review process easier.

    Also, keep in mind that there are some uses of loop that are still perfectly valid and will remain so, such as in asyncio.run_coroutine_threadsafe(). It should only be removed in locations where there was a deprecation warning from 3.8 or sooner present (as we give two major versions of deprecation notice before most breaking changes are made -- this has been made official policy as of PEP-387).

    @uriyyo
    Copy link
    Member

    uriyyo commented Nov 29, 2020

    Looks like we have done everything, we can close this issue

    @asvetlov
    Copy link
    Contributor

    Thanks for your help!

    @miss-islington
    Copy link
    Contributor

    New changeset dcea78f by Ken Jin in branch 'master':
    bpo-42392: Mention loop removal in whatsnew for 3.10 (GH-24256)
    dcea78f

    @gpshead
    Copy link
    Member

    gpshead commented May 24, 2021

    There appear to be no versionchanged:: 3.10 in the asyncio docs on the APIs that formerly accepted a loop= parameter linking people to information on when that went away (3.10) and why. Specifically I'm talking about https://docs.python.org/3.10/library/asyncio-stream.html.

    The asyncio stack traces people will face when porting code to 3.10 are mystifying (they may not even show use of a loop parameter) when this comes up, so we should really leave more bread crumbs than expecting people to find the What's New doc.

    ...
    E    server = event_loop.run_until_complete(coro)
    E  File "/opt/hostedtoolcache/Python/3.10.0-beta.1/x64/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete
    E    return future.result()
    E  File "/opt/hostedtoolcache/Python/3.10.0-beta.1/x64/lib/python3.10/asyncio/streams.py", line 113, in start_unix_server
    E    return await loop.create_unix_server(factory, path, **kwds)
    E  TypeError: _UnixSelectorEventLoop.create_unix_server() got an unexpected keyword argument 'loop'
    

    Arguably something similar to that whatsnew text should've been added to the docs in 3.8 with the loop deprecation. Something like this?

    .. versionchanged:: 3.7
    
       This function now implicitly gets the
       current thread's running event loop.
    
    .. versionchanged:: 3.10
       That `loop` parameter has been removed.
    

    including a ReST link to more info in the whats new doc on the last entry would be useful.

    @gpshead gpshead reopened this May 24, 2021
    @gpshead gpshead reopened this May 24, 2021
    @gpshead gpshead added the docs Documentation in the Doc dir label May 24, 2021
    @gpshead gpshead added the docs Documentation in the Doc dir label May 24, 2021
    @gpshead gpshead changed the title remove the deprecated 'loop' parameter asyncio API [document removal of] the deprecated 'loop' parameter asyncio API in 3.10 May 24, 2021
    @gpshead gpshead changed the title remove the deprecated 'loop' parameter asyncio API [document removal of] the deprecated 'loop' parameter asyncio API in 3.10 May 24, 2021
    @Fidget-Spinner
    Copy link
    Member

    There appear to be no versionchanged:: 3.10 in the asyncio docs on the APIs that formerly accepted a loop= parameter

    Sorry, I missed that. Working on it.

    @gpshead
    Copy link
    Member

    gpshead commented May 26, 2021

    New changeset d8fd8c8 by Ken Jin in branch 'main':
    bpo-42392: [docs] Add deprecated-removed loop labels for asyncio (GH-26357)
    d8fd8c8

    @miss-islington
    Copy link
    Contributor

    New changeset 150a8e8 by Miss Islington (bot) in branch '3.10':
    [3.10] bpo-42392: [docs] Add deprecated-removed loop labels for asyncio (GH-26357) (GH-26390)
    150a8e8

    @gpshead
    Copy link
    Member

    gpshead commented May 26, 2021

    thanks!

    @graingert
    Copy link
    Contributor

    graingert commented Jul 28, 2022

    @asvetlov

    The alternative is using the fast C atomic compare_and_swap function
    which is executed under the hold GIL.
    We need the pure-Python fallback anyway.

    I was checking some minor detail in socketpair and noticed this:

    vars(self).setdefault('_threads', _Threads())

    and it looks like it benchmarks a 100ns faster than a global lock:

    #!/usr/bin/env python3
    import pyperf
    from asyncio import events
    
    """Event loop mixins."""
    
    import threading
    
    _global_lock = threading.Lock()
    
    # Used as a sentinel for loop parameter
    _marker = object()
    
    
    class _LoopBoundMixin:
        _loop = None
    
        def __init__(self, *, loop=_marker):
            if loop is not _marker:
                raise TypeError(
                    f'As of 3.10, the *loop* parameter was removed from '
                    f'{type(self).__name__}() since it is no longer necessary'
                )
    
        def _get_loop(self):
            loop = events._get_running_loop()
    
            if self._loop is None:
                with _global_lock:
                    if self._loop is None:
                        self._loop = loop
            if loop is not self._loop:
                raise RuntimeError(f'{self!r} is bound to a different event loop')
            return loop
    
    
    class _LoopBoundMixinSetDefault:
        _loop = None
    
        def __init__(self, *, loop=_marker):
            if loop is not _marker:
                raise TypeError(
                    f'As of 3.10, the *loop* parameter was removed from '
                    f'{type(self).__name__}() since it is no longer necessary'
                )
    
        def _get_loop(self):
            loop = events._get_running_loop()
            if self._loop is None:
                self.__dict__.setdefault("_loop", loop)
    
            if self._loop is not loop:
                raise RuntimeError(f'{self!r} is bound to a different event loop')
    
            return loop
    
    runner = pyperf.Runner()
    runner.timeit(name="get loop with lock",
                  stmt="lbm = _LoopBoundMixin(); lbm._get_loop(); lbm._get_loop()",
                  setup="import asyncio; from __main__ import _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop())")
    
    runner.timeit(name="get loop with setdefault",
                  stmt="lbm = _LoopBoundMixin(); lbm._get_loop(); lbm._get_loop()",
                  setup="import asyncio; from __main__ import _LoopBoundMixinSetDefault as _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop())")
    
    runner.timeit(name="get loop already set with lock",
                  stmt="lbm._get_loop()",
                  setup="import asyncio; from __main__ import _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop()); lbm = _LoopBoundMixin(); lbm._get_loop()")
    
    runner.timeit(name="get loop already set with setdefault",
                  stmt="lbm._get_loop()",
                  setup="import asyncio; from __main__ import _LoopBoundMixinSetDefault as _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop()); lbm = _LoopBoundMixin(); lbm._get_loop()")

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes docs Documentation in the Doc dir topic-asyncio type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants