dispatcher.wait_for future handling race condition #77

FasterSpeeding · 2020-08-23T02:45:08Z

Steps to reproduce

Trying to handle the chunk events for a large guild too quickly using the event dispatcher's wait_for method leads to the event manager trying to handle the wait_for listener's generated future at an invalid state leading to a traceback being raised to logging and some events being missed.

import hikari.utilities.date
guild_id = LARGE_GUILD_ID
nonce = hikari.utilities.date.uuid()
await bot.shards[0].request_guild_members(guild_id, nonce=nonce)
results = []

def predicate(event):
    return event.nonce == nonce 

while True:
    try:
        chunk = await bot.dispatcher.wait_for(
            hikari.events.shard_events.MemberChunkEvent, 5, predicate=predicate
        )
        results.append(chunk.chunk_index)
    except asyncio.TimeoutError:
        break

print(results)

Expected Result

the following to be printed

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]

Actual Result

[0, 1, 2, 3, 4, 5, 6, 7, 9, 11, 13, 15, 17, 20, 22, 24, 26, 29, 31, 33, 36, 38, 40, 42, 45, 47, 49, 51, 54, 56, 58, 60, 62]

is printed along with the following traceback making it's way to logging for each missed chunk event.

E 2020-08-23 03:34:04,967 asyncio: Task exception was never retrieved
future: <Task finished name='gateway shard 0 dispatch GUILD_MEMBERS_CHUNK' coro=<EventManagerBase.consume_raw_event() done, defined at C:\Users\Snab\PycharmProjects\hikari\hikari\impl\event_manager_base.py:81> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
  File "C:\Users\Snab\PycharmProjects\hikari\hikari\impl\event_manager_base.py", line 89, in consume_raw_event
    await callback(shard, payload)
  File "C:\Users\Snab\PycharmProjects\hikari\hikari\impl\stateful_event_manager.py", line 252, in on_guild_members_chunk
    await self.dispatch(event)
  File "C:\Users\Snab\PycharmProjects\hikari\hikari\impl\event_manager_base.py", line 247, in _test_waiter
    future.set_result(event)
asyncio.exceptions.InvalidStateError: invalid state

System info

hikari v2.0.0 HEAD
located at C:\Users\Snab\PycharmProjects\hikari\hikari
CPython 3.8.2 MSC v.1916 64 bit (AMD64)

Further info

Checklist

[X ] I have made sure to remove ANY sensitive information (bot token, passwords,
credentials, personal details, etc).
[ X] I have searched the issue tracker and have made sure it's not a duplicate.
If it is a follow up of another issue, I have specified it.

davfsa · 2020-08-23T10:14:04Z

Managed to fix this 👍

Looks like, at least on my machine, asyncio immediately invokes anything you await rather than switching to another task on the queue first unless the call does raw IO. I have confirmed this with Epoll, Poll and Select selector implementations on a non-debug asyncio SelectorEventLoop implementation. This means that the bulk of dispatching an event would currently occur as soon as the event is dispatched rather than after another task runs, which could lead to immediate slowdown if other tasks are queued. Switching to sync dispatching and using create task to invoke the callback management "later" seems to speed up this implementation significantly and allows other race conditions we have not accounted for properly as part of #77 to be detectable with test scripts that saturate the event loop.

ghost · 2020-08-23T11:34:29Z

Reproduced with optimisations introduced in b9d2855 and the following test script.

$ python -m hikari
hikari v2.0.0 HEAD
located at /home/guesswho/code/hikari/hikari
CPython 3.8.3 GCC 10.1.0
Linux z170x 5.4.52-1-MANJARO #1 SMP PREEMPT Thu Jul 16 16:07:11 UTC 2020 x86_64

import asyncio
import logging
import selectors

import hikari.impl.event_manager_base


logging.basicConfig(level="DEBUG")


class FooEvent(hikari.Event):

    @property
    def app(self) -> hikari.RESTAware:
        return NotImplemented


class Dispatcher(hikari.impl.event_manager_base.EventManagerBase):
    async def on_foo(self, _, __):
        await self.dispatch(FooEvent())


instance = Dispatcher(NotImplemented, None)


async def producer():
    i = 0
    while True:
        # Do a really short sleep, which puts an event on the selector and 
        # causes a different task to be invoked. Asyncio otherwise appears
        # to immediately invoke the same task, meaning we would keep producing
        # and never get a chance to consume. This might be a CPython bug, I
        # am not sure.

        # Note, setting to anything NON ZERO (even 0.0000000001) seems to hide
        # this bug, annoyingly.
        await asyncio.sleep(0)

        i += 1
        if not (i % 1000):
            print("produced", i)

        instance.consume_raw_event(NotImplemented, "FOO", {})


async def consumer():
    i = 0
    while True:
        i += 1
        if not (i % 1000):
            print("consumed", i)

        def pred(curry):
            def predicate(event):
                nonlocal curry
                if curry:
                    return True
                else:
                    curry = True
                    return False


        await asyncio.gather(
            instance.wait_for(FooEvent, timeout=None, predicate=pred(True)),
            instance.wait_for(FooEvent, timeout=None, predicate=pred(False)),
            instance.wait_for(FooEvent, timeout=None, predicate=pred(True)),
            instance.wait_for(FooEvent, timeout=None, predicate=pred(False)),
        )



async def runner():
    asyncio.create_task(consumer())
    asyncio.create_task(producer())


selector = selectors.SelectSelector()
asyncio.set_event_loop(asyncio.SelectorEventLoop(selector))
asyncio.get_event_loop().set_debug(True)
asyncio.get_event_loop().run_until_complete(runner())
asyncio.get_event_loop().run_forever()

Looks like, at least on my machine, asyncio immediately invokes anything you await rather than switching to another task on the queue first unless the call does raw IO. I have confirmed this with Epoll, Poll and Select selector implementations on a non-debug asyncio SelectorEventLoop implementation. This means that the bulk of dispatching an event would currently occur as soon as the event is dispatched rather than after another task runs, which could lead to immediate slowdown if other tasks are queued. Switching to sync dispatching and using create task to invoke the callback management "later" seems to speed up this implementation significantly and allows other race conditions we have not accounted for properly as part of #77 to be detectable with test scripts that saturate the event loop.

* Optimised raw event dispatching to uncover bug. Looks like, at least on my machine, asyncio immediately invokes anything you await rather than switching to another task on the queue first unless the call does raw IO. I have confirmed this with Epoll, Poll and Select selector implementations on a non-debug asyncio SelectorEventLoop implementation. This means that the bulk of dispatching an event would currently occur as soon as the event is dispatched rather than after another task runs, which could lead to immediate slowdown if other tasks are queued. Switching to sync dispatching and using create task to invoke the callback management "later" seems to speed up this implementation significantly and allows other race conditions we have not accounted for properly as part of #77 to be detectable with test scripts that saturate the event loop. * Updated CLi script to show OS type as well. * Added code to allow debugging of asyncio loop blocking incidents. * Fixes #77 dispatcher wait_for race condition. * Removed async predicates for wait_for, removing last parts of race condition hopefully. * Fixes #77 dispatcher wait_for race condition.

FasterSpeeding added the bug Something isn't working label Aug 23, 2020

FasterSpeeding assigned ghost Aug 23, 2020

davfsa assigned davfsa and unassigned ghost Aug 23, 2020

davfsa mentioned this issue Aug 23, 2020

Fix bug in wait_for #78

Closed

2 tasks

davfsa assigned ghost and unassigned davfsa Aug 23, 2020

ghost pushed a commit that referenced this issue Aug 23, 2020

Fixes #77 dispatcher wait_for race condition.

2ef79ca

ghost mentioned this issue Aug 23, 2020

Bugfix/77 wait for race #82

Merged

ghost pushed a commit that referenced this issue Aug 23, 2020

Fixes #77 dispatcher wait_for race condition.

cbbf2b1

ghost pushed a commit that referenced this issue Aug 23, 2020

Fixes #77 dispatcher wait_for race condition.

44d6113

ghost pushed a commit that referenced this issue Aug 24, 2020

Fixes #77 dispatcher wait_for race condition.

529a6f1

ghost pushed a commit that referenced this issue Aug 24, 2020

Fixes #77 dispatcher wait_for race condition.

6f1b4c2

ghost closed this as completed in #82 Aug 24, 2020

ghost added this to the 2.0.0 milestone Sep 4, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dispatcher.wait_for future handling race condition #77

dispatcher.wait_for future handling race condition #77

FasterSpeeding commented Aug 23, 2020

davfsa commented Aug 23, 2020

ghost commented Aug 23, 2020 •

edited by ghost

Loading

dispatcher.wait_for future handling race condition #77

dispatcher.wait_for future handling race condition #77

Comments

FasterSpeeding commented Aug 23, 2020

Steps to reproduce

Expected Result

Actual Result

System info

Further info

Checklist

davfsa commented Aug 23, 2020

ghost commented Aug 23, 2020 • edited by ghost Loading

ghost commented Aug 23, 2020 •

edited by ghost

Loading