Skip to content

refactor(webrtc): rework Substream <-> SubstreamHandle communication#600

Merged
gab8i merged 5 commits into
gab_webrtc_multiple_fixes_v2from
gab_webrtc_substream_refactor
May 28, 2026
Merged

refactor(webrtc): rework Substream <-> SubstreamHandle communication#600
gab8i merged 5 commits into
gab_webrtc_multiple_fixes_v2from
gab_webrtc_substream_refactor

Conversation

@gab8i
Copy link
Copy Markdown
Contributor

@gab8i gab8i commented May 27, 2026

This PR attempt the refactor of the substream module explained here: #593 (comment)

Contrary to what's described in the issue, I ended up using the Mutex + AtomicWaker pattern, because it turned out to be the cleanest and most effective option.

One thing worth knowing: there are roughly 3 states, with multiple tasks modifying and reading them, so there can be multiple producers and consumers. Every attempt to use a data structure that handles the polling internally, so that no task is left behind, stuck waiting on some other parallel event, ended up making the code messy: it added many fields and, most importantly, required remembering to notify/update a watcher (or push to several different places) at every relevant point in the codebase.

What the PR does is:

Rework the communication mechanism between Substream and its SubstreamHandle. State is shared through Mutex and AtomicWaker abstracted behind a small helper, which makes them easy to use and ensures the relevant tasks are woken whenever the shared state changes.

This also decouples the reading half from the writing half: a graceful close of either half no longer implies closing the other. An abrupt RESET_STREAM still tears down both, as required by the spec.

Rework the communication mechanism between `Substream` and its
`SubstreamHandle`. State is shared through `Mutex` and `AtomicWaker`
abstracted behind a small helper, which makes them easy to use and
ensures the relevant tasks are woken whenever the shared state changes.

This also decouples the reading half from the writing half: a graceful
close of either half no longer implies closing the other. An abrupt
RESET_STREAM still tears down both, as required by the spec.
Comment thread src/transport/webrtc/substream.rs Outdated
}

fn get(&self) -> T {
self.state.lock().clone()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we get into TOCTOU issues? We lock here, clone, on drop the lock goes away, we use the cloned state, and by the time we read the clone, the actual state behind mutex changes?

Comment thread src/transport/webrtc/substream.rs Outdated
self.handles.remove(&channel_id);
}
}
Some((_, Some(SubstreamEvent::RecvClosed))) => {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment thread src/transport/webrtc/substream.rs Outdated
// could cause this method to wait and thus stall the entire webrtc
// connection. Solution would be to implement reading
// backpressure, keeping track of pending incoming messages.
let _ = message_tx
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Lets add a debug log here on errors

/// TX channel for sending inbound messages from `peer` to the associated `Substream`.
inbound_tx: Sender<Event>,

message_tx: Option<Sender<Message>>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extend the comment to document that we are dropping the channel upon receiving the Fin packet, after which no more messages are sent.

&& self
.inbound_tx
.send(Event::Message {
match (self.message_tx.as_ref(), message.payload) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message_tx is dropped upon receiving the Fin packet.

Practically, we could still receive a message payload here even tho we received the fin and dropped the tx channel.
IIRC, spec states that no further messages are received, maybe we should emit an warning (that will become debug once we stabilize)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense!! str0m should guarantee the order of things and thus after FIN no other message should arrive, beside just flags!

Comment thread src/transport/webrtc/substream.rs Outdated
// Wake up any task waiting on shutdown
self.substream_shutdown_waker.wake();
self.shutdown_waker.wake();
if matches!(self.writer_state.get(), WriterState::Fin) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's cache the writer_state in a variable, by the time we read it again under the mutex it might have already mutated so the debug log will make investigations a bit more difficult

Comment thread src/transport/webrtc/substream.rs Outdated
target: LOG_TARGET,
?state,
state = ?self.writer_state.get(),
"received FIN_ACK in unexpected state, ignoring"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect the state to be exactly Fin to transition to FinAck. How could we make this a bit more robust here just in case we introduced a bug somewhere or the other peer is misbehaving?

  • maybe we could clean up the states regardless or send a reset or return ConnectionClosed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reset the connection if we received FIN_ACK from an unexpected state, this seems to make perfectly sense! Within spec FIN_ACK should be received only in one scenario so I think it is right to expect it only when we have a WriterState::FIN

Comment thread src/transport/webrtc/substream.rs Outdated
// Stream thus FinAck will only be sent once something else awake it.
*self.reading_state.lock() = ReadingState::Fin;
self.reader_state.set(ReaderState::Fin);
let _ = self.message_tx.take();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
let _ = self.message_tx.take();
if self.message_tx.take().is_none() {
warn!("Unexpected to have channel already dropped / similar")
}

Comment thread src/transport/webrtc/substream.rs Outdated
}
// This function carries forward the writer half close process.
//
// It is expected to:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit or similar: The following behaviors are expected on:

  • WriterState::Open state:

    • etc
  • WriteState::Fin state:

    • etc

Comment thread src/transport/webrtc/substream.rs Outdated
let mut timeout = Box::pin(tokio::time::sleep(FIN_ACK_TIMEOUT));
// Poll the timeout once to register it with tokio's timer
// This ensures we'll be woken when it expires
let _ = timeout.as_mut().poll(cx);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if timer.poll().is_ready() {
    error! ("Misconfigured timer is not supposed to be ready")
}

Comment thread src/transport/webrtc/substream.rs Outdated
// incoming messages, there are 2 side effects which connects the two streams:
// 1. If FIN arrived then FIN_ACK is expected to be sent back.
// 2. If FIN_ACK arrived the writer_state is updated.
{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely drop the { } block, we are hiding the locks under set / register_and_get

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's a vestigial from when a lock was acquired and dropped at the end of the sub-scope! 👍

.send(Event::Message {
match (self.message_tx.as_ref(), message.payload) {
(Some(message_tx), Some(payload)) if !payload.is_empty() => {
// TODO: awaiting here makes the entire connection
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets create an issue for this one if not already and place it in the comment here

}

struct Inner<T> {
state: Mutex<T>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would take this oen step further and replace the Mutex with an AtomicU8

Accidently, this is what I've proposed in the past to the original shutdown impl from #513 (comment).

Since we follow the behavior:

  • set: mutex.lock (ie .store(Release))
  • get: mutex.lock (ie .load(Acquire))

It should be straight forward (and cause no side-effects) to replace it with a lock free atomic:

enum WriterState {
    /// The writing stream is open.
    Open = 0,
    /// A Fin flag was sent.
    Fin = 1,
    /// FinAck was received.
    FinAck = 2,
    /// StopSending was received.
    StopSending = 3,
}

Then we could also close: #523

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea!! And given the current SharedState abstraction is seems pretty straight forward to implement!!!

@lexnv
Copy link
Copy Markdown
Collaborator

lexnv commented May 28, 2026

Nice job here @gab8i! This simplifies the state machines quite a lot!

While at it, AtomicU8 should be a drop-in replacement of Arc<Mutex<> without side-effects. Since everything is placed behind a nice set/get API, I would tackle that replacement here 🙏 Other than that left some tiny nits about comments and debuggability (feel free to ingnore where it doesnt make sense)

@gab8i gab8i marked this pull request as ready for review May 28, 2026 13:43
@gab8i gab8i changed the title [WIP] refactor(webrtc): rework Substream <-> SubstreamHandle communication refactor(webrtc): rework Substream <-> SubstreamHandle communication May 28, 2026
@gab8i gab8i merged commit 90aa132 into gab_webrtc_multiple_fixes_v2 May 28, 2026
3 checks passed
@gab8i gab8i deleted the gab_webrtc_substream_refactor branch May 28, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants