Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate listener task exceptions to the main loop. #3378

Merged
merged 8 commits into from
May 9, 2024

Conversation

fselmo
Copy link
Collaborator

@fselmo fselmo commented May 3, 2024

What was wrong?

Closes #3375 (along with #3387)

How was it fixed?

As I understand it, the way asyncio works is each task operates under its own context. Directly communicating with another context and trying to immediately propagate an exception to the main loop, for example, does not seem to be a very straightforward thing to do surprisingly. For better or worse, I think when we poll for messages we should check if the listener task is done and, if it is, if an exception was set. If an exception was set then we can raise it in the main loop where we are polling for messages.

  • When we poll for messages, poll to see if the background task is finished. If it is, handle any exceptions in the main loop. On subscription streams, since we hang awaiting a message to appear in the subscription queue, we need to signal when the background task is done since no messages will be coming in and we will hang indefinitely. Push a new exception TaskNotRunning to the queue. Revamp the queue to raise when this is pushed to it. Handle this exception in the subscription polling.
  • Refactor some of the logic. When disconnecting persistent connection providers, shut down the listener task before closing the connection to avoid any sort of errors when the task is still trying to read from the closing connection.

I added some nice logging that really shouldn't be too noisy unless the connection is reconnecting all the time (which I forced for the example here).

INFO:web3.providers.WebSocketProvider:Connecting to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Successfully connected to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:WebSocketProvider listener background task started. Storing all messages in appropriate request processor queues / caches to be processed.
newHeads sub_id: 0xad65c6e2aefa417dafda6999d0bd774b
INFO:web3.providers.WebSocketProvider:Subscription response queue synced with websocket message stream.
blocknum: 19829151
blocknum: 19829152
ERROR:web3.providers.WebSocketProvider:Connection interrupted, attempting to reconnect...
INFO:web3.providers.WebSocketProvider:Message listener background task successfully shut down.
INFO:web3.providers.WebSocketProvider:Successfully disconnected from: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Connecting to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Successfully connected to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:WebSocketProvider listener background task started. Storing all messages in appropriate request processor queues / caches to be processed.
newHeads sub_id: 0x718852a668ed437fbdc86243db929405
blocknum: 19829153
blocknum: 19829154
ERROR:web3.providers.WebSocketProvider:Connection interrupted, attempting to reconnect...
INFO:web3.providers.WebSocketProvider:Message listener background task successfully shut down.
INFO:web3.providers.WebSocketProvider:Successfully disconnected from: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Connecting to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Successfully connected to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:WebSocketProvider listener background task started. Storing all messages in appropriate request processor queues / caches to be processed.
newHeads sub_id: 0xf46690d2853c4afba527ac71dd1915e0
blocknum: 19829155
...

Bonus:

I've been meaning to do this for some time and I think this is an appropriate time. Refactor the commonality of the listener task down to the base PersistentConnectionProvider class and define provider-specific methods that either need to be implemented or can be overridden to fine-tune logic specific to each persistent connection provider. This keeps things quite a bit DRYer.

Todo:

  • Clean up commit history
  • Add or update documentation related to these changes
  • Add entry to the release notes

Cute Animal Picture

31205

@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from 4f8e964 to 8f9a081 Compare May 3, 2024 22:42
@fselmo fselmo marked this pull request as ready for review May 3, 2024 22:44
@fselmo fselmo requested review from kclowes, pacrob and reedsa May 3, 2024 22:44
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch 3 times, most recently from 600504d to 827b661 Compare May 3, 2024 23:11
@fselmo fselmo marked this pull request as draft May 3, 2024 23:26
@fselmo
Copy link
Collaborator Author

fselmo commented May 3, 2024

I'm actually going to re-structure this a bit. Will ping for review when ready again.

- Raise exceptions in the message listener task if not silenced. Check
  if the task is completed and if an exception was set when polling for
  results in the caches and raise the exception if it was set.

- Refactor some of the logic. When disconnecting persistent connection
  providers, shut down the listener task before closing the connection
  to avoid any sort of errors when the task is still trying to read from
  the closing connection.
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from 827b661 to 1296e2b Compare May 7, 2024 18:21
@fselmo fselmo marked this pull request as ready for review May 7, 2024 18:26
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch 3 times, most recently from 17da8d7 to 781b5d3 Compare May 7, 2024 19:00
- Define a base pattern for the listener task on the base
``PersistentConnectionProvider`` class. Define provider-specific
methods that can be configured on the implementation classes in order
to handle the provider-specific logic and error logging.
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from 781b5d3 to e4ebe38 Compare May 7, 2024 19:21
@fselmo
Copy link
Collaborator Author

fselmo commented May 7, 2024

Just kidding this is still in progress 😔

@fselmo fselmo marked this pull request as draft May 7, 2024 22:55
fselmo added 2 commits May 8, 2024 13:47
- Simultaneously check for subscription messages while checking
  if the listener task is done, raising any exceptions if they occurred
  in the listener task.

- Add tests for subscriptions with iterator pattern (would've failed before
  this commit).

- This adds a bit of added overhead for the subscription message stream.
  Change INFO logs to DEBUG when the message stream is out of sync with
  the websocket connection. These aren't super useful to the end user
  and can be noisy now.
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from 0427c95 to bf56dd2 Compare May 8, 2024 19:47
@fselmo fselmo marked this pull request as ready for review May 8, 2024 20:04
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from e995b4e to ea812c5 Compare May 8, 2024 21:07
@fselmo
Copy link
Collaborator Author

fselmo commented May 8, 2024

@kclowes @pacrob @reedsa Hey it's me again. This is ready now 😄. We were missing a good test for subscription polling with the iterator pattern. This would've failed or hanged forever. That should be addressed now.

- Instead of using ``asyncio.wait()`` to poll which task finishes
  first, simply push a ``None`` value to the sub queue as part of
  the callback when the listener task finishes. This tells any
  message stream that the listener task has finished, pops the
  iterator out of polling the queue, and allows it to address any
  listener task exceptions that may have occurred.
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch 3 times, most recently from 49e07bc to ca4d424 Compare May 9, 2024 01:13
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from ca4d424 to 8a2cb8f Compare May 9, 2024 01:23
Copy link
Collaborator

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests and nice refactor! This looks good to me overall. Pushing None to the queue is the only thing that stands out as something that might be a little foot-gun-y. In my head, it seems like it would be more descriptive and maybe less error prone if we could raise a custom exception there, but I can't quite see the path to how that could work. So I'm good leaving as-is, but wanted to flag in case you saw a nice solution off the bat.

web3/_utils/module_testing/module_testing_utils.py Outdated Show resolved Hide resolved
web3/providers/persistent/websocket.py Show resolved Hide resolved
web3/main.py Show resolved Hide resolved
@fselmo
Copy link
Collaborator Author

fselmo commented May 9, 2024

Pushing None to the queue is the only thing that stands out as something that might be a little foot-gun-y.

I think you're absolutely right. I don't see None being pushed to the cache in another way but I think we can isolate this case to be more unique. If I can't, I think we can still use None to un-brick the awaiting of a message and we can make sure we make the proper checks of whether or not the task really did finish and if it didn't we just move on to the next iteration of the subscription messages - essentially causing a no-op in that scenario.

fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024
- More gracefully and explicitly handle when a queue relying on a task
  that is not running is being awaited on for a result. This involves
  raising a new ``TaskNotRunning`` exception when a task is not running
  which is pushed to the queue in the reliant task's completion callback.
fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from 4332b03 to 0ee8a22 Compare May 9, 2024 19:39
fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from 0ee8a22 to 5a5c4f3 Compare May 9, 2024 19:41
fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024
@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from 5a5c4f3 to 8ed6df9 Compare May 9, 2024 19:44
@fselmo
Copy link
Collaborator Author

fselmo commented May 9, 2024

@kclowes I came up with a more elegant solution for the queue to operate correctly as long as the listener task is running

In my head, it seems like it would be more descriptive and maybe less error prone if we could raise a custom exception there

taking this as inspiration, lmk what you think: a4790d6

...other comments addressed in 8ed6df9

@fselmo fselmo force-pushed the persistent-connection-iterator-bugfix branch from 8ed6df9 to b4a3cd4 Compare May 9, 2024 20:05
@kclowes
Copy link
Collaborator

kclowes commented May 9, 2024

I came up with a more elegant solution for the queue to operate correctly as long as the listener task is running

super elegant 🤩 🚀

@fselmo fselmo merged commit bc5601b into ethereum:main May 9, 2024
71 checks passed
fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024
@fselmo fselmo mentioned this pull request May 9, 2024
2 tasks
fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024
fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024
@fselmo fselmo deleted the persistent-connection-iterator-bugfix branch May 9, 2024 22:19
fselmo added a commit that referenced this pull request May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Async iteration of AsyncWeb3 doesn't work as expected
2 participants