refactor(webrtc): rework Substream <-> SubstreamHandle communication by gab8i · Pull Request #600 · paritytech/litep2p

gab8i · 2026-05-27T12:21:22Z

This PR attempt the refactor of the substream module explained here: #593 (comment)

Contrary to what's described in the issue, I ended up using the Mutex + AtomicWaker pattern, because it turned out to be the cleanest and most effective option.

One thing worth knowing: there are roughly 3 states, with multiple tasks modifying and reading them, so there can be multiple producers and consumers. Every attempt to use a data structure that handles the polling internally, so that no task is left behind, stuck waiting on some other parallel event, ended up making the code messy: it added many fields and, most importantly, required remembering to notify/update a watcher (or push to several different places) at every relevant point in the codebase.

What the PR does is:

Rework the communication mechanism between Substream and its SubstreamHandle. State is shared through Mutex and AtomicWaker abstracted behind a small helper, which makes them easy to use and ensures the relevant tasks are woken whenever the shared state changes.

This also decouples the reading half from the writing half: a graceful close of either half no longer implies closing the other. An abrupt RESET_STREAM still tears down both, as required by the spec.

Rework the communication mechanism between `Substream` and its `SubstreamHandle`. State is shared through `Mutex` and `AtomicWaker` abstracted behind a small helper, which makes them easy to use and ensures the relevant tasks are woken whenever the shared state changes. This also decouples the reading half from the writing half: a graceful close of either half no longer implies closing the other. An abrupt RESET_STREAM still tears down both, as required by the spec.

lexnv · 2026-05-27T12:26:48Z

+    }
+
+    fn get(&self) -> T {
+        self.state.lock().clone()


nit: Could we get into TOCTOU issues? We lock here, clone, on drop the lock goes away, we use the cloned state, and by the time we read the clone, the actual state behind mutex changes?

lexnv · 2026-05-28T07:50:39Z

                            self.handles.remove(&channel_id);
                        }
                    }
-                    Some((_, Some(SubstreamEvent::RecvClosed))) => {}


lexnv · 2026-05-28T11:34:22Z

+                // could cause this method to wait and thus stall the entire webrtc
+                // connection. Solution would be to implement reading
+                // backpressure, keeping track of pending incoming messages.
+                let _ = message_tx


nit: Lets add a debug log here on errors

lexnv · 2026-05-28T11:36:04Z

    /// TX channel for sending inbound messages from `peer` to the associated `Substream`.
-    inbound_tx: Sender<Event>,
-
+    message_tx: Option<Sender<Message>>,


I would extend the comment to document that we are dropping the channel upon receiving the Fin packet, after which no more messages are sent.

lexnv · 2026-05-28T11:39:18Z

-                && self
-                    .inbound_tx
-                    .send(Event::Message {
+        match (self.message_tx.as_ref(), message.payload) {


message_tx is dropped upon receiving the Fin packet.

Practically, we could still receive a message payload here even tho we received the fin and dropped the tx channel.
IIRC, spec states that no further messages are received, maybe we should emit an warning (that will become debug once we stabilize)?

Yes, this makes sense!! str0m should guarantee the order of things and thus after FIN no other message should arrive, beside just flags!

lexnv · 2026-05-28T11:40:59Z

-                        // Wake up any task waiting on shutdown
-                        self.substream_shutdown_waker.wake();
-                        self.shutdown_waker.wake();
+                    if matches!(self.writer_state.get(), WriterState::Fin) {


nit: Let's cache the writer_state in a variable, by the time we read it again under the mutex it might have already mutated so the debug log will make investigations a bit more difficult

lexnv · 2026-05-28T11:42:28Z

                            target: LOG_TARGET,
-                            ?state,
+                            state = ?self.writer_state.get(),
                            "received FIN_ACK in unexpected state, ignoring"


We expect the state to be exactly Fin to transition to FinAck. How could we make this a bit more robust here just in case we introduced a bug somewhere or the other peer is misbehaving?

maybe we could clean up the states regardless or send a reset or return ConnectionClosed

We should reset the connection if we received FIN_ACK from an unexpected state, this seems to make perfectly sense! Within spec FIN_ACK should be received only in one scenario so I think it is right to expect it only when we have a WriterState::FIN

lexnv · 2026-05-28T11:44:16Z

-                    // Stream thus FinAck will only be sent once something else awake it.
-                    *self.reading_state.lock() = ReadingState::Fin;
+                    self.reader_state.set(ReaderState::Fin);
+                    let _ = self.message_tx.take();


nit:

Suggested change

let _ = self.message_tx.take();

if self.message_tx.take().is_none() {

warn!("Unexpected to have channel already dropped / similar")

}

lexnv · 2026-05-28T11:45:59Z

-            }
+    // This function carries forward the writer half close process.
+    //
+    // It is expected to:


nit or similar: The following behaviors are expected on:

WriterState::Open state:

etc

WriteState::Fin state:

etc

lexnv · 2026-05-28T11:47:25Z

                let mut timeout = Box::pin(tokio::time::sleep(FIN_ACK_TIMEOUT));
                // Poll the timeout once to register it with tokio's timer
                // This ensures we'll be woken when it expires
                let _ = timeout.as_mut().poll(cx);


if timer.poll().is_ready() { error! ("Misconfigured timer is not supposed to be ready") }

lexnv · 2026-05-28T11:54:06Z

+        //   incoming messages, there are 2 side effects which connects the two streams:
+        //   1. If FIN arrived then FIN_ACK is expected to be sent back.
+        //   2. If FIN_ACK arrived the writer_state is updated.
        {


I think we can safely drop the { } block, we are hiding the locks under set / register_and_get

Yup, that's a vestigial from when a lock was acquired and dropped at the end of the sub-scope! 👍

lexnv · 2026-05-28T12:02:28Z

-                    .send(Event::Message {
+        match (self.message_tx.as_ref(), message.payload) {
+            (Some(message_tx), Some(payload)) if !payload.is_empty() => {
+                // TODO: awaiting here makes the entire connection


Lets create an issue for this one if not already and place it in the comment here

lexnv · 2026-05-28T12:07:33Z

+}
+
+struct Inner<T> {
+    state: Mutex<T>,


I would take this oen step further and replace the Mutex with an AtomicU8

Accidently, this is what I've proposed in the past to the original shutdown impl from #513 (comment).

Since we follow the behavior:

set: mutex.lock (ie .store(Release))

get: mutex.lock (ie .load(Acquire))

It should be straight forward (and cause no side-effects) to replace it with a lock free atomic:

enum WriterState { /// The writing stream is open. Open = 0, /// A Fin flag was sent. Fin = 1, /// FinAck was received. FinAck = 2, /// StopSending was received. StopSending = 3, }

Then we could also close: #523

That's a great idea!! And given the current SharedState abstraction is seems pretty straight forward to implement!!!

lexnv · 2026-05-28T12:11:18Z

Nice job here @gab8i! This simplifies the state machines quite a lot!

While at it, AtomicU8 should be a drop-in replacement of Arc<Mutex<> without side-effects. Since everything is placed behind a nice set/get API, I would tackle that replacement here 🙏 Other than that left some tiny nits about comments and debuggability (feel free to ingnore where it doesnt make sense)

lexnv reviewed May 27, 2026

View reviewed changes

Comment thread src/transport/webrtc/substream.rs Outdated

lexnv reviewed May 28, 2026

View reviewed changes

gab8i added 2 commits May 28, 2026 10:54

refactor(webrtc): simplify wakers usage

5fa99fc

doc(webrtc): typos and doc corrections

b3af4c3

lexnv reviewed May 28, 2026

View reviewed changes

gab8i added 2 commits May 28, 2026 15:38

chore(webrtc): improve comments and debuggability

0864da4

chore: trigger CI

2bd1907

gab8i marked this pull request as ready for review May 28, 2026 13:43

gab8i changed the title ~~[WIP] refactor(webrtc): rework Substream <-> SubstreamHandle communication~~ refactor(webrtc): rework Substream <-> SubstreamHandle communication May 28, 2026

gab8i merged commit 90aa132 into gab_webrtc_multiple_fixes_v2 May 28, 2026
3 checks passed

gab8i deleted the gab_webrtc_substream_refactor branch May 28, 2026 13:47

-                    let _ = self.message_tx.take();
+if self.message_tx.take().is_none() {
+    warn!("Unexpected to have channel already dropped / similar")
+}

Conversation

gab8i commented May 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lexnv commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants