Propagate listener task exceptions to the main loop. #3378

fselmo · 2024-05-03T22:38:40Z

What was wrong?

Closes #3375 (along with #3387)

How was it fixed?

As I understand it, the way asyncio works is each task operates under its own context. Directly communicating with another context and trying to immediately propagate an exception to the main loop, for example, does not seem to be a very straightforward thing to do surprisingly. For better or worse, I think when we poll for messages we should check if the listener task is done and, if it is, if an exception was set. If an exception was set then we can raise it in the main loop where we are polling for messages.

When we poll for messages, poll to see if the background task is finished. If it is, handle any exceptions in the main loop. On subscription streams, since we hang awaiting a message to appear in the subscription queue, we need to signal when the background task is done since no messages will be coming in and we will hang indefinitely. Push a new exception TaskNotRunning to the queue. Revamp the queue to raise when this is pushed to it. Handle this exception in the subscription polling.
Refactor some of the logic. When disconnecting persistent connection providers, shut down the listener task before closing the connection to avoid any sort of errors when the task is still trying to read from the closing connection.

I added some nice logging that really shouldn't be too noisy unless the connection is reconnecting all the time (which I forced for the example here).

INFO:web3.providers.WebSocketProvider:Connecting to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Successfully connected to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:WebSocketProvider listener background task started. Storing all messages in appropriate request processor queues / caches to be processed.
newHeads sub_id: 0xad65c6e2aefa417dafda6999d0bd774b
INFO:web3.providers.WebSocketProvider:Subscription response queue synced with websocket message stream.
blocknum: 19829151
blocknum: 19829152
ERROR:web3.providers.WebSocketProvider:Connection interrupted, attempting to reconnect...
INFO:web3.providers.WebSocketProvider:Message listener background task successfully shut down.
INFO:web3.providers.WebSocketProvider:Successfully disconnected from: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Connecting to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Successfully connected to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:WebSocketProvider listener background task started. Storing all messages in appropriate request processor queues / caches to be processed.
newHeads sub_id: 0x718852a668ed437fbdc86243db929405
blocknum: 19829153
blocknum: 19829154
ERROR:web3.providers.WebSocketProvider:Connection interrupted, attempting to reconnect...
INFO:web3.providers.WebSocketProvider:Message listener background task successfully shut down.
INFO:web3.providers.WebSocketProvider:Successfully disconnected from: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Connecting to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:Successfully connected to: wss://mainnet.infura.io/ws/v3/c364...
INFO:web3.providers.WebSocketProvider:WebSocketProvider listener background task started. Storing all messages in appropriate request processor queues / caches to be processed.
newHeads sub_id: 0xf46690d2853c4afba527ac71dd1915e0
blocknum: 19829155
...

Bonus:

I've been meaning to do this for some time and I think this is an appropriate time. Refactor the commonality of the listener task down to the base PersistentConnectionProvider class and define provider-specific methods that either need to be implemented or can be overridden to fine-tune logic specific to each persistent connection provider. This keeps things quite a bit DRYer.

Todo:

Clean up commit history
Add or update documentation related to these changes
Add entry to the release notes

Cute Animal Picture

fselmo · 2024-05-03T23:27:13Z

I'm actually going to re-structure this a bit. Will ping for review when ready again.

- Raise exceptions in the message listener task if not silenced. Check if the task is completed and if an exception was set when polling for results in the caches and raise the exception if it was set. - Refactor some of the logic. When disconnecting persistent connection providers, shut down the listener task before closing the connection to avoid any sort of errors when the task is still trying to read from the closing connection.

- Define a base pattern for the listener task on the base ``PersistentConnectionProvider`` class. Define provider-specific methods that can be configured on the implementation classes in order to handle the provider-specific logic and error logging.

fselmo · 2024-05-07T22:54:58Z

Just kidding this is still in progress 😔

- Simultaneously check for subscription messages while checking if the listener task is done, raising any exceptions if they occurred in the listener task. - Add tests for subscriptions with iterator pattern (would've failed before this commit). - This adds a bit of added overhead for the subscription message stream. Change INFO logs to DEBUG when the message stream is out of sync with the websocket connection. These aren't super useful to the end user and can be noisy now.

fselmo · 2024-05-08T21:12:29Z

@kclowes @pacrob @reedsa Hey it's me again. This is ready now 😄. We were missing a good test for subscription polling with the iterator pattern. This would've failed or hanged forever. That should be addressed now.

- Instead of using ``asyncio.wait()`` to poll which task finishes first, simply push a ``None`` value to the sub queue as part of the callback when the listener task finishes. This tells any message stream that the listener task has finished, pops the iterator out of polling the queue, and allows it to address any listener task exceptions that may have occurred.

kclowes

Nice tests and nice refactor! This looks good to me overall. Pushing None to the queue is the only thing that stands out as something that might be a little foot-gun-y. In my head, it seems like it would be more descriptive and maybe less error prone if we could raise a custom exception there, but I can't quite see the path to how that could work. So I'm good leaving as-is, but wanted to flag in case you saw a nice solution off the bat.

web3/_utils/module_testing/module_testing_utils.py

web3/providers/persistent/websocket.py

web3/main.py

fselmo · 2024-05-09T17:31:49Z

Pushing None to the queue is the only thing that stands out as something that might be a little foot-gun-y.

I think you're absolutely right. I don't see None being pushed to the cache in another way but I think we can isolate this case to be more unique. If I can't, I think we can still use None to un-brick the awaiting of a message and we can make sure we make the proper checks of whether or not the task really did finish and if it didn't we just move on to the next iteration of the subscription messages - essentially causing a no-op in that scenario.

- More gracefully and explicitly handle when a queue relying on a task that is not running is being awaited on for a result. This involves raising a new ``TaskNotRunning`` exception when a task is not running which is pushed to the queue in the reliant task's completion callback.

fselmo · 2024-05-09T19:46:44Z

@kclowes I came up with a more elegant solution for the queue to operate correctly as long as the listener task is running

In my head, it seems like it would be more descriptive and maybe less error prone if we could raise a custom exception there

taking this as inspiration, lmk what you think: a4790d6

...other comments addressed in 8ed6df9

kclowes · 2024-05-09T20:15:37Z

I came up with a more elegant solution for the queue to operate correctly as long as the listener task is running

super elegant 🤩 🚀

fselmo force-pushed the persistent-connection-iterator-bugfix branch from 4f8e964 to 8f9a081 Compare May 3, 2024 22:42

fselmo marked this pull request as ready for review May 3, 2024 22:44

fselmo requested review from kclowes, pacrob and reedsa May 3, 2024 22:44

fselmo force-pushed the persistent-connection-iterator-bugfix branch 3 times, most recently from 600504d to 827b661 Compare May 3, 2024 23:11

fselmo marked this pull request as draft May 3, 2024 23:26

fselmo force-pushed the persistent-connection-iterator-bugfix branch from 827b661 to 1296e2b Compare May 7, 2024 18:21

fselmo marked this pull request as ready for review May 7, 2024 18:26

fselmo force-pushed the persistent-connection-iterator-bugfix branch 3 times, most recently from 17da8d7 to 781b5d3 Compare May 7, 2024 19:00

fselmo force-pushed the persistent-connection-iterator-bugfix branch from 781b5d3 to e4ebe38 Compare May 7, 2024 19:21

fselmo marked this pull request as draft May 7, 2024 22:55

fselmo added 2 commits May 8, 2024 13:47

Move more shared logic to base class for persistent connection providers

bf56dd2

fselmo force-pushed the persistent-connection-iterator-bugfix branch from 0427c95 to bf56dd2 Compare May 8, 2024 19:47

fselmo marked this pull request as ready for review May 8, 2024 20:04

fselmo force-pushed the persistent-connection-iterator-bugfix branch from e995b4e to ea812c5 Compare May 8, 2024 21:07

fselmo force-pushed the persistent-connection-iterator-bugfix branch 3 times, most recently from 49e07bc to ca4d424 Compare May 9, 2024 01:13

Add some useful logging around the persistent provider connection.

8a2cb8f

fselmo force-pushed the persistent-connection-iterator-bugfix branch from ca4d424 to 8a2cb8f Compare May 9, 2024 01:23

kclowes approved these changes May 9, 2024

View reviewed changes

web3/_utils/module_testing/module_testing_utils.py Outdated Show resolved Hide resolved

web3/providers/persistent/websocket.py Show resolved Hide resolved

web3/main.py Show resolved Hide resolved

fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024

address comments from PR ethereum#3378: clear caches + add tests

4332b03

fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024

address comments from PR ethereum#3378: clear caches + add tests

0ee8a22

fselmo force-pushed the persistent-connection-iterator-bugfix branch from 4332b03 to 0ee8a22 Compare May 9, 2024 19:39

fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024

address comments from PR ethereum#3378: clear caches + add tests

5a5c4f3

fselmo force-pushed the persistent-connection-iterator-bugfix branch from 0ee8a22 to 5a5c4f3 Compare May 9, 2024 19:41

fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024

address comments from PR ethereum#3378: clear caches + add tests

8ed6df9

fselmo force-pushed the persistent-connection-iterator-bugfix branch from 5a5c4f3 to 8ed6df9 Compare May 9, 2024 19:44

address comments from PR ethereum#3378: clear caches + add tests

b4a3cd4

fselmo force-pushed the persistent-connection-iterator-bugfix branch from 8ed6df9 to b4a3cd4 Compare May 9, 2024 20:05

kclowes approved these changes May 9, 2024

View reviewed changes

fselmo merged commit bc5601b into ethereum:main May 9, 2024
71 checks passed

fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024

v6 cherry-pick of PR ethereum#3378

4e2e4c7

fselmo mentioned this pull request May 9, 2024

v6 cherry-pick of PR #3378 #3387

Merged

2 tasks

fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024

v6 cherry-pick of PR ethereum#3378

1f5a1ca

fselmo added a commit to fselmo/web3.py that referenced this pull request May 9, 2024

v6 cherry-pick of PR ethereum#3378

cc3d02d

fselmo deleted the persistent-connection-iterator-bugfix branch May 9, 2024 22:19

fselmo added a commit that referenced this pull request May 9, 2024

v6 cherry-pick of PR #3378

d138ff7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate listener task exceptions to the main loop. #3378

Propagate listener task exceptions to the main loop. #3378

fselmo commented May 3, 2024 •

edited

Loading

fselmo commented May 3, 2024

fselmo commented May 7, 2024

fselmo commented May 8, 2024

kclowes left a comment

fselmo commented May 9, 2024 •

edited

Loading

fselmo commented May 9, 2024 •

edited

Loading

kclowes commented May 9, 2024

Propagate listener task exceptions to the main loop. #3378

Propagate listener task exceptions to the main loop. #3378

Conversation

fselmo commented May 3, 2024 • edited Loading

What was wrong?

How was it fixed?

Todo:

Cute Animal Picture

fselmo commented May 3, 2024

fselmo commented May 7, 2024

fselmo commented May 8, 2024

kclowes left a comment

Choose a reason for hiding this comment

fselmo commented May 9, 2024 • edited Loading

fselmo commented May 9, 2024 • edited Loading

kclowes commented May 9, 2024

fselmo commented May 3, 2024 •

edited

Loading

fselmo commented May 9, 2024 •

edited

Loading

fselmo commented May 9, 2024 •

edited

Loading