Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispatcher.wait_for future handling race condition #77

Closed
FasterSpeeding opened this issue Aug 23, 2020 · 2 comments · Fixed by #82
Closed

dispatcher.wait_for future handling race condition #77

FasterSpeeding opened this issue Aug 23, 2020 · 2 comments · Fixed by #82
Labels
bug Something isn't working
Milestone

Comments

@FasterSpeeding
Copy link
Collaborator

Steps to reproduce

Trying to handle the chunk events for a large guild too quickly using the event dispatcher's wait_for method leads to the event manager trying to handle the wait_for listener's generated future at an invalid state leading to a traceback being raised to logging and some events being missed.

import hikari.utilities.date
guild_id = LARGE_GUILD_ID
nonce = hikari.utilities.date.uuid()
await bot.shards[0].request_guild_members(guild_id, nonce=nonce)
results = []

def predicate(event):
    return event.nonce == nonce 

while True:
    try:
        chunk = await bot.dispatcher.wait_for(
            hikari.events.shard_events.MemberChunkEvent, 5, predicate=predicate
        )
        results.append(chunk.chunk_index)
    except asyncio.TimeoutError:
        break

print(results)

Expected Result

the following to be printed

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]

Actual Result

[0, 1, 2, 3, 4, 5, 6, 7, 9, 11, 13, 15, 17, 20, 22, 24, 26, 29, 31, 33, 36, 38, 40, 42, 45, 47, 49, 51, 54, 56, 58, 60, 62]

is printed along with the following traceback making it's way to logging for each missed chunk event.

E 2020-08-23 03:34:04,967 asyncio: Task exception was never retrieved
future: <Task finished name='gateway shard 0 dispatch GUILD_MEMBERS_CHUNK' coro=<EventManagerBase.consume_raw_event() done, defined at C:\Users\Snab\PycharmProjects\hikari\hikari\impl\event_manager_base.py:81> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
  File "C:\Users\Snab\PycharmProjects\hikari\hikari\impl\event_manager_base.py", line 89, in consume_raw_event
    await callback(shard, payload)
  File "C:\Users\Snab\PycharmProjects\hikari\hikari\impl\stateful_event_manager.py", line 252, in on_guild_members_chunk
    await self.dispatch(event)
  File "C:\Users\Snab\PycharmProjects\hikari\hikari\impl\event_manager_base.py", line 247, in _test_waiter
    future.set_result(event)
asyncio.exceptions.InvalidStateError: invalid state

System info

hikari v2.0.0 HEAD
located at C:\Users\Snab\PycharmProjects\hikari\hikari
CPython 3.8.2 MSC v.1916 64 bit (AMD64)

Further info

Checklist

  • [X ] I have made sure to remove ANY sensitive information (bot token, passwords,
    credentials, personal details, etc).
  • [ X] I have searched the issue tracker and have made sure it's not a duplicate.
    If it is a follow up of another issue, I have specified it.
@FasterSpeeding FasterSpeeding added the bug Something isn't working label Aug 23, 2020
@FasterSpeeding FasterSpeeding assigned ghost Aug 23, 2020
@davfsa davfsa assigned davfsa and unassigned ghost Aug 23, 2020
@davfsa
Copy link
Member

davfsa commented Aug 23, 2020

Managed to fix this 👍

@davfsa davfsa mentioned this issue Aug 23, 2020
2 tasks
@davfsa davfsa assigned ghost and unassigned davfsa Aug 23, 2020
ghost pushed a commit that referenced this issue Aug 23, 2020
Looks like, at least on my machine, asyncio immediately invokes anything you await
rather than switching to another task on the queue first unless the call does raw
IO. I have confirmed this with Epoll, Poll and Select selector implementations on
a non-debug asyncio SelectorEventLoop implementation.

This means that the bulk of dispatching an event would currently occur as soon as
the event is dispatched rather than after another task runs, which could lead to
immediate slowdown if other tasks are queued.

Switching to sync dispatching and using create task to invoke the callback
management "later" seems to speed up this implementation significantly and allows
other race conditions we have not accounted for properly as part of #77 to be
detectable with test scripts that saturate the event loop.
@ghost
Copy link

ghost commented Aug 23, 2020

Reproduced with optimisations introduced in b9d2855 and the following test script.

$ python -m hikari
hikari v2.0.0 HEAD
located at /home/guesswho/code/hikari/hikari
CPython 3.8.3 GCC 10.1.0
Linux z170x 5.4.52-1-MANJARO #1 SMP PREEMPT Thu Jul 16 16:07:11 UTC 2020 x86_64
import asyncio
import logging
import selectors

import hikari.impl.event_manager_base


logging.basicConfig(level="DEBUG")


class FooEvent(hikari.Event):

    @property
    def app(self) -> hikari.RESTAware:
        return NotImplemented


class Dispatcher(hikari.impl.event_manager_base.EventManagerBase):
    async def on_foo(self, _, __):
        await self.dispatch(FooEvent())


instance = Dispatcher(NotImplemented, None)


async def producer():
    i = 0
    while True:
        # Do a really short sleep, which puts an event on the selector and 
        # causes a different task to be invoked. Asyncio otherwise appears
        # to immediately invoke the same task, meaning we would keep producing
        # and never get a chance to consume. This might be a CPython bug, I
        # am not sure.

        # Note, setting to anything NON ZERO (even 0.0000000001) seems to hide
        # this bug, annoyingly.
        await asyncio.sleep(0)

        i += 1
        if not (i % 1000):
            print("produced", i)

        instance.consume_raw_event(NotImplemented, "FOO", {})


async def consumer():
    i = 0
    while True:
        i += 1
        if not (i % 1000):
            print("consumed", i)

        def pred(curry):
            def predicate(event):
                nonlocal curry
                if curry:
                    return True
                else:
                    curry = True
                    return False


        await asyncio.gather(
            instance.wait_for(FooEvent, timeout=None, predicate=pred(True)),
            instance.wait_for(FooEvent, timeout=None, predicate=pred(False)),
            instance.wait_for(FooEvent, timeout=None, predicate=pred(True)),
            instance.wait_for(FooEvent, timeout=None, predicate=pred(False)),
        )



async def runner():
    asyncio.create_task(consumer())
    asyncio.create_task(producer())


selector = selectors.SelectSelector()
asyncio.set_event_loop(asyncio.SelectorEventLoop(selector))
asyncio.get_event_loop().set_debug(True)
asyncio.get_event_loop().run_until_complete(runner())
asyncio.get_event_loop().run_forever()

ghost pushed a commit that referenced this issue Aug 23, 2020
@ghost ghost mentioned this issue Aug 23, 2020
ghost pushed a commit that referenced this issue Aug 23, 2020
Looks like, at least on my machine, asyncio immediately invokes anything you await
rather than switching to another task on the queue first unless the call does raw
IO. I have confirmed this with Epoll, Poll and Select selector implementations on
a non-debug asyncio SelectorEventLoop implementation.

This means that the bulk of dispatching an event would currently occur as soon as
the event is dispatched rather than after another task runs, which could lead to
immediate slowdown if other tasks are queued.

Switching to sync dispatching and using create task to invoke the callback
management "later" seems to speed up this implementation significantly and allows
other race conditions we have not accounted for properly as part of #77 to be
detectable with test scripts that saturate the event loop.
ghost pushed a commit that referenced this issue Aug 23, 2020
ghost pushed a commit that referenced this issue Aug 23, 2020
ghost pushed a commit that referenced this issue Aug 24, 2020
ghost pushed a commit that referenced this issue Aug 24, 2020
@ghost ghost closed this as completed in #82 Aug 24, 2020
ghost pushed a commit that referenced this issue Aug 24, 2020
* Optimised raw event dispatching to uncover bug.

Looks like, at least on my machine, asyncio immediately invokes anything you await
rather than switching to another task on the queue first unless the call does raw
IO. I have confirmed this with Epoll, Poll and Select selector implementations on
a non-debug asyncio SelectorEventLoop implementation.

This means that the bulk of dispatching an event would currently occur as soon as
the event is dispatched rather than after another task runs, which could lead to
immediate slowdown if other tasks are queued.

Switching to sync dispatching and using create task to invoke the callback
management "later" seems to speed up this implementation significantly and allows
other race conditions we have not accounted for properly as part of #77 to be
detectable with test scripts that saturate the event loop.

* Updated CLi script to show OS type as well.

* Added code to allow debugging of asyncio loop blocking incidents.

* Fixes #77 dispatcher wait_for race condition.

* Removed async predicates for wait_for, removing last parts of race condition hopefully.

* Fixes #77 dispatcher wait_for race condition.
@ghost ghost added this to the 2.0.0 milestone Sep 4, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants