RRR: Reconnection & Request Reissuance #2181

prestwich · 2023-02-23T21:56:33Z

Adds automatic reconnection and request-reissuance logic to WebSockets

Motivation

Users keep making a fuss about it and also it's generally a good thing to have :)

Solution

WS now consists of 2 tasks:

WsBackend - manages dispatching raw values to the WebSocket
RequestManager - manages caching request bodies until the request has resolved

WsBackend dispatches requests and routes responses and notifications. It also has a simple ping-based keepalive (when not compiled to wasm), to prevent inactivity from triggering server-side closes

RequestManager holds a BackendDriver, to communicate with the current backend. Reconnection is accomplished by instantiating a new WsBackend and swapping out the manager's BackendDriver

The RequestManager holds copies of all pending requests (as InFlight), and active subscriptions (as ActiveSub). When reconnection occurs, all pending requests are re-dispatched to the new backend, and all active subs are re-subscribed

In order to provide continuity of subscription IDs to the client, the RequestManager also keeps a SubscriptionManager. This struct manages the relationship between the u64 request ID, and U256 server-side subscription ID. It does this by aliasing the server ID to the request ID, and returning the Request ID to the caller (hiding the server ID in the SubscriptionManager internals.) Giving the caller a "fake" subscription id allows the subscription to behave consistently across reconnections

The behavior is accessed by the WsClient frontend, which implements JsonRpcClient. The WsClient is cloneable (no need for an arc 😄). It communicates to the request manager via a channel, and receives notifications in a shared map for the client to retrieve

The RequestManager shuts down and drops when all WsClient instances have been dropped (because all UnboundedSender instances will have dropped).

The WsBackend shuts down when instructed to by the RequestManager or when the RequestManager drops (because the inbound channel will close)

Note on use of mutexes

This work does NOT use async mutexes, as we never need to hold past an await point. Because each use of mutex is fast (simple hashmap insert/remove), we're not concerned about blocking on locking

Limitations:

reconnection may cause dropped events in subscriptions, as the new sub may resume at a later height. This seems unavoidable? 🤔
faking the subscription ID requires an extra deser and reser on each eth_subscribe call so that the manager can inspect the sub id
memory usage will be higher, as we're keeping a copy of in-flight request params while they're in flight, and a permanent copy of the subscription params

Follow-up work:

Because BackendDriver does not contain any WS-specific behavior, we can add a RequestManager in front of IPC channels to add the same reconnection + reissuance logic. This doesn't seem high priority, as WS is much more prone to ephemeral failures than IPC

PR Checklist

Added Tests
Added Documentation
Updated the changelog
Breaking changes

gakonst · 2023-02-27T23:07:16Z

@mattsse PTAL - I'll cut a release after we get this PR in

prestwich · 2023-02-27T23:21:00Z

going to move all previous WS behavior under legacy_ws and add a feature flag to use it

gakonst

LGTM. Much needed feature. Looking for a pass from @mattsse before merging.

ethers-providers/src/rpc/transports/mod.rs

ethers-providers/src/rpc/transports/ws/backend.rs

ethers-providers/src/rpc/transports/ws/mod.rs

ethers-providers/src/rpc/transports/ws/manager.rs

ethers-providers/src/rpc/transports/ws/mod.rs

mattsse

if I understood this right, a WsClient now has two different tasks?
I haven't looked into it much, but that's kind of weird I guess.
manager, backend, backenddriver is a but confusing.

it would be great to have some docs on the various channel fields because there are like 4 different channels used in this setup.

this now auto-reconnects on any error error caused by ws connection?

ethers-providers/src/rpc/transports/ws/backend.rs

ethers-providers/src/rpc/transports/ws/mod.rs

mattsse · 2023-02-28T16:39:46Z

ethers-providers/src/rpc/transports/ws/manager.rs

+                            None => if let Err(e) = self.reconnect().await {
+                                break Err(e);
+                            }


I could totally be wrong here because it wasn't obvious to which component to_handle is wired here. but it looks like the other half belongs to WsBackend?

I believe there's an issue with how the WsBackend task resolves, which could cause two reconnects if the task exits because of an error?
first the channel will be None, then the self.backend.error receives the error message and reconnects again?

I believe there's an issue with how the WsBackend task resolves, which could cause two reconnects if the task exits because of an error?

No, only 1 select branch is reachable in any pass through the loop. So if the to_handle channel is dead, the WS will reconnect. Then by the time the loop restarts, the BackendDriver will be different, and the error channel will be fresh and good

I could totally be wrong here because it wasn't obvious to which component to_handle is wired here. but it looks like the other half belongs to WsBackend?

the naming convention is you send a chunk of work to the handler via handler. the handler accepts work that it should handle via to_handle

same convention with dispatcher and to_dispatch. you send to the dispatching task via a dispatcher channel. the dispatcher accepts the requests via to_dispatch

you can think of the BackendDriver as control and ownership of the running backend task. It has the ability to shut down that task, send/receive messages from it, and dropping the driver causes the task to shut down automatically

The communication flow is

client ----instructions---> manager
manager -----requests-----> backend
backend <-----WS Protocol-----> server
backend -----PubSubItem----> manager
manager ---sub notifications---> sub channels
manager ---sub channels---> client
manager ---responses----> request future channels

this would go well in a swim lanes but I don't have time to make one rn

prestwich · 2023-02-28T17:44:22Z

if I understood this right, a WsClient now has two different tasks?

yeah, because reconnection requires the connection to be disposable, there must be >=2 tasks. One for the connection, and 1 for detecting and fixing connection failures.

it would be great to have some docs on the various channel fields because there are like 4 different channels used in this setup.

added comments to all struct fields

this now auto-reconnects on any error error caused by ws connection?

yes, up to a fixed number of retries. It also includes a keepalive loop when not running in wasm

ethers-providers/src/rpc/transports/ws/mod.rs

Co-authored-by: Georgios Konstantopoulos <me@gakonst.com>

gakonst

@mattsse merging cautiously; if anybody experiences issues they can use the legacy-ws feature. This is a bit hard to unit test further, open to ideas here.

gakonst · 2023-03-01T00:45:07Z

ethers-providers/src/rpc/transports/ws/manager.rs

+        #[cfg(not(target_arch = "wasm32"))]
+        tokio::spawn(fut);
+    }
+}


nit: could add a reconnects(..) builder method to override the default, but can do in followup

shrimpynuts · 2023-03-07T22:27:21Z

@prestwich Is it reasonable to try to keep a subscription through WsClient alive indefinitely?

If yes, would the best way be to just instantiate a new WsClient every time the RequestManger reaches the reconnect limit?

prestwich · 2023-03-07T22:28:38Z

Is it reasonable to try to keep a subscription through WsClient alive indefinitely?

Yes it's pretty reasonable

If yes, would the best way be to just instantiate a new WsClient every time the RequestManger reaches the reconnect limit?

You should just set an arbitrarily high reconnection count :) u64::max should be fine

* wip: ws2 * ws2 backend compiles * refactor: rename PubSubItem and BackendDriver * feature: dispatch request to end subscription * refactor: move ws2 to ws, fix reconnection and deser on subs * chore: improve use of tracing in manager * refactor: feature legacy_ws to enable backwards compatibility * nit: mod file ordering * docs: copy PR description to ws structs * fixes: remove unused macros file, remove err formats * docs: add comments to struct fields * docs: comment client struct fields * chore: changelog * fix: unused imports in ws_errors test * docs: missing comment Co-authored-by: Georgios Konstantopoulos <me@gakonst.com> * fix: legacy-ws feature in root crate, hyphen not underscore * fix: a couple bad imports/exports --------- Co-authored-by: Georgios Konstantopoulos <me@gakonst.com>

wminshew · 2023-07-13T09:02:42Z

reconnection may cause dropped events in subscriptions, as the new sub may resume at a later height. This seems unavoidable? 🤔

dropping events seems fine to me but out of curiosity why is it unavoidable? couldn't SubscriptionManager keep note of the most recent block each sub has received an event from & poll from there on re-connect?

prestwich added the enhancement New feature or request label Feb 23, 2023

prestwich requested a review from gakonst February 23, 2023 21:56

prestwich self-assigned this Feb 23, 2023

prestwich force-pushed the prestwich/ws-reconnect branch from 7390950 to 22a0e33 Compare February 25, 2023 16:10

prestwich marked this pull request as ready for review February 27, 2023 23:12

gakonst approved these changes Feb 28, 2023

View reviewed changes

gakonst reviewed Feb 28, 2023

View reviewed changes

ethers-providers/src/rpc/transports/ws/mod.rs Outdated Show resolved Hide resolved

prestwich force-pushed the prestwich/ws-reconnect branch from 69be2fc to eb6976e Compare February 28, 2023 03:01

gakonst changed the title ~~[WIP] RRR: Reconnection & Request Reissuance~~ RRR: Reconnection & Request Reissuance Feb 28, 2023

mattsse reviewed Feb 28, 2023

View reviewed changes

prestwich force-pushed the prestwich/ws-reconnect branch from 209d37b to 128c54a Compare March 1, 2023 00:06

prestwich added 12 commits February 28, 2023 16:30

wip: ws2

484c8b6

ws2 backend compiles

c42006e

refactor: rename PubSubItem and BackendDriver

4f779c0

feature: dispatch request to end subscription

c3853b2

refactor: move ws2 to ws, fix reconnection and deser on subs

a63f9f7

chore: improve use of tracing in manager

9c49d08

refactor: feature legacy_ws to enable backwards compatibility

0ab4ff0

nit: mod file ordering

0a09d28

docs: copy PR description to ws structs

c993270

fixes: remove unused macros file, remove err formats

e1170f7

docs: add comments to struct fields

dbaac2b

docs: comment client struct fields

669e4fb

prestwich force-pushed the prestwich/ws-reconnect branch from 128c54a to 669e4fb Compare March 1, 2023 00:32

prestwich added 2 commits February 28, 2023 16:34

chore: changelog

d81d48a

fix: unused imports in ws_errors test

532998a

gakonst reviewed Mar 1, 2023

View reviewed changes

ethers-providers/src/rpc/transports/ws/mod.rs Outdated Show resolved Hide resolved

prestwich and others added 2 commits February 28, 2023 16:40

docs: missing comment

04607f1

Co-authored-by: Georgios Konstantopoulos <me@gakonst.com>

fix: legacy-ws feature in root crate, hyphen not underscore

5525708

gakonst approved these changes Mar 1, 2023

View reviewed changes

fix: a couple bad imports/exports

b1ff1e6

gakonst merged commit 73636a9 into master Mar 1, 2023

gakonst deleted the prestwich/ws-reconnect branch March 1, 2023 01:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RRR: Reconnection & Request Reissuance #2181

RRR: Reconnection & Request Reissuance #2181

prestwich commented Feb 23, 2023 •

edited

Loading

gakonst commented Feb 27, 2023

prestwich commented Feb 27, 2023

gakonst left a comment

mattsse left a comment

mattsse Feb 28, 2023 •

edited

Loading

prestwich Feb 28, 2023

prestwich Feb 28, 2023

prestwich Feb 28, 2023 •

edited

Loading

prestwich commented Feb 28, 2023

gakonst left a comment •

edited

Loading

gakonst Mar 1, 2023

shrimpynuts commented Mar 7, 2023

prestwich commented Mar 7, 2023

wminshew commented Jul 13, 2023

RRR: Reconnection & Request Reissuance #2181

RRR: Reconnection & Request Reissuance #2181

Conversation

prestwich commented Feb 23, 2023 • edited Loading

Motivation

Solution

Note on use of mutexes

Limitations:

Follow-up work:

PR Checklist

gakonst commented Feb 27, 2023

prestwich commented Feb 27, 2023

gakonst left a comment

Choose a reason for hiding this comment

mattsse left a comment

Choose a reason for hiding this comment

mattsse Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

prestwich Feb 28, 2023

Choose a reason for hiding this comment

prestwich Feb 28, 2023

Choose a reason for hiding this comment

prestwich Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

prestwich commented Feb 28, 2023

gakonst left a comment • edited Loading

Choose a reason for hiding this comment

gakonst Mar 1, 2023

Choose a reason for hiding this comment

shrimpynuts commented Mar 7, 2023

prestwich commented Mar 7, 2023

wminshew commented Jul 13, 2023

prestwich commented Feb 23, 2023 •

edited

Loading

mattsse Feb 28, 2023 •

edited

Loading

prestwich Feb 28, 2023 •

edited

Loading

gakonst left a comment •

edited

Loading