Mirrored from upstream 1jehuang/jcode — Pull Request #256 by @Zephyr709
Original state: open
Created: 2026-05-21T14:13:20Z · Updated: 2026-05-21T14:13:20Z
Diff: https://github.com/1jehuang/jcode/pull/256.diff
This issue is an auto-mirrored copy. Comments and edits here are local to quangdang46/jcode — do not expect them to propagate upstream.
Summary
Fixes mass session crashes caused by multi-session protocol corruption on the shared-server when multiple clients (swarm + multi-window) are attached. Symptom is RemoteConnection::next_event: protocol error=expected value at line 1 column 1 followed by Remote protocol error is not retryable; stopping reconnect loop, taking down every attached TUI within ~100ms.
Root cause
Several server code paths used the singular fallback member.event_tx directly instead of fanning out to all member.event_txs, and register_session_event_sender, unregister_session_event_sender, and fanout_session_event silently overwrote member.event_tx to point at whichever connection's writer happened to be touched last. With multiple attached clients, a send intended for one client's writer could land on another client's writer mid-line, splicing event tails into unrelated frames and crashing every session on the shared-server.
Changes
Commit 1 (266e8759) — server fix (root cause)
register_session_event_sender: only adopt new sender as singular fallback when existing fallback is closed.
unregister_session_event_sender: do not silently re-point member.event_tx to a surviving connection.
fanout_session_event: snapshot all live attachments without mutating the singular fallback.
comm_plan / comm_control / debug_swarm_write / swarm / client_session: route via super::fanout_session_event instead of direct member.event_tx.send, dropping read locks before fanout acquires the write lock.
Commit 2 (d3e3b753) — protocol resilience (defense in depth)
encode_event (in jcode-protocol) refuses to emit a JSON frame containing raw newlines. If serialization ever produces one (custom Display impls, hand-built JSON), log the kind and strip the byte instead of shipping a frame that would split into two on the receiver and crash every attached client.
RemoteConnection::next_event no longer treats a single malformed frame as fatal. It logs a truncated preview and resyncs at the next newline, giving up only after 16 consecutive corrupt frames to avoid busy looping.
Tests
- 5 new regression tests in
src/server/state.rs::multi_connection_protocol_tests covering register/unregister/fanout semantics that previously caused cross-connection writer corruption.
- Full suite: 188 passed, 0 failed (
cargo test --lib -p jcode server::).
Validation in production
Deployed binary at ~/.jcode/builds/versions/d3e3b753-protocol-resilience/jcode (running as shared-server pid 62439 since 11:33:39 today).
- Pre-fix log (old binary): mass session crashes at 10:41, 11:14:51, 11:14:54 with
Remote protocol error is not retryable; stopping reconnect loop.
- Post-fix log (new binary, ~9 min uptime so far): 0
not retryable errors, 0 session teardowns. One transient bad frame logged at 11:40:41 with consecutive_malformed=1 — exactly what the resilience commit is supposed to do (log + resync, no crash). The session that hit the bad frame (session_bird) is still attached and alive.
Risk / rollback
- The server fix changes registration semantics: the singular
event_tx is no longer overwritten on each new connection. Any code that depended on the overwrite behavior would now see stale senders; the audit and 5 regression tests cover all known call sites.
- Rollback: revert the two commits and the prior binary at
~/.jcode/builds/versions/266e8759-fix-multi-session-protocol/jcode is still on disk.
Summary
Fixes mass session crashes caused by multi-session protocol corruption on the shared-server when multiple clients (swarm + multi-window) are attached. Symptom is
RemoteConnection::next_event: protocol error=expected value at line 1 column 1followed byRemote protocol error is not retryable; stopping reconnect loop, taking down every attached TUI within ~100ms.Root cause
Several server code paths used the singular fallback
member.event_txdirectly instead of fanning out to allmember.event_txs, andregister_session_event_sender,unregister_session_event_sender, andfanout_session_eventsilently overwrotemember.event_txto point at whichever connection's writer happened to be touched last. With multiple attached clients, a send intended for one client's writer could land on another client's writer mid-line, splicing event tails into unrelated frames and crashing every session on the shared-server.Changes
Commit 1 (
266e8759) — server fix (root cause)register_session_event_sender: only adopt new sender as singular fallback when existing fallback is closed.unregister_session_event_sender: do not silently re-pointmember.event_txto a surviving connection.fanout_session_event: snapshot all live attachments without mutating the singular fallback.comm_plan/comm_control/debug_swarm_write/swarm/client_session: route viasuper::fanout_session_eventinstead of directmember.event_tx.send, dropping read locks before fanout acquires the write lock.Commit 2 (
d3e3b753) — protocol resilience (defense in depth)encode_event(injcode-protocol) refuses to emit a JSON frame containing raw newlines. If serialization ever produces one (custom Display impls, hand-built JSON), log the kind and strip the byte instead of shipping a frame that would split into two on the receiver and crash every attached client.RemoteConnection::next_eventno longer treats a single malformed frame as fatal. It logs a truncated preview and resyncs at the next newline, giving up only after 16 consecutive corrupt frames to avoid busy looping.Tests
src/server/state.rs::multi_connection_protocol_testscovering register/unregister/fanout semantics that previously caused cross-connection writer corruption.cargo test --lib -p jcode server::).Validation in production
Deployed binary at
~/.jcode/builds/versions/d3e3b753-protocol-resilience/jcode(running as shared-server pid 62439 since 11:33:39 today).Remote protocol error is not retryable; stopping reconnect loop.not retryableerrors, 0 session teardowns. One transient bad frame logged at 11:40:41 withconsecutive_malformed=1— exactly what the resilience commit is supposed to do (log + resync, no crash). The session that hit the bad frame (session_bird) is still attached and alive.Risk / rollback
event_txis no longer overwritten on each new connection. Any code that depended on the overwrite behavior would now see stale senders; the audit and 5 regression tests cover all known call sites.~/.jcode/builds/versions/266e8759-fix-multi-session-protocol/jcodeis still on disk.