fix(command): harden length-delimited + SCM IPC against panic-OOB#1260
Merged
Conversation
`Channel::try_read_delimited_message` decoded a peer-supplied `usize` length prefix and then sliced `&buffer[delimiter_size()..message_len]`. A peer that wrote a delimiter declaring `message_len < delimiter_size()` (e.g. 5 on a 64-bit host) bypassed the `> max_buffer_size` guard and panicked the receiver with `slice index starts at 8 but ends at 5`. This is reachable across every length-delimited channel pair: the master ↔ worker command IPC, the master ↔ CLI admin socket, and the test harness. A single 8-byte write to the admin unix socket therefore crashes the master command loop and tears down supervision — proxy-wide DoS until the operator restarts. Reject the bad frame with a new `ChannelError::MessageLengthUnderDelimiter` variant, symmetric with the existing `MessageTooLarge` upper-bound guard, and consume the bogus delimiter bytes so the channel re-syncs on the peer's next valid frame instead of looping forever on the same stale header. Regression: `rejects_declared_length_below_delimiter` in `command/src/channel.rs` writes a crafted 8-byte delimiter directly to the underlying mio unix stream and asserts the new error variant instead of the pre-fix panic. CWE-129 / CWE-248 — input validation gap at a trust boundary, reachable as an uncaught panic. Signed-off-by: Florentin Dubois <florentin.dubois@clever.cloud>
`ScmSocket::receive_listeners` zipped the peer-supplied
`ListenersCount.{http,tls,tcp}` lists against a fixed-size
`[RawFd; MAX_FDS_OUT]` (200 slots) without bounds-checking the
declared counts. A manifest declaring more entries than
`MAX_FDS_OUT`, or more entries than the FDs actually arrived over
SCM, panicked the worker on `received_fds[index..index + len]` (or on
the trailing `received_fds[index..file_descriptor_length]` slice when
`index > file_descriptor_length`).
The SCM channel is normally privileged (master → worker), so a panic
here requires either a buggy master, an alternate sidecar that opens
the SCM peer, or memory corruption. Even so, the bug is identical in
class to the channel.rs slice-OOB (CWE-129 / CWE-248) and violates
the project's "no panic on network-facing input" rule, which
explicitly extends to the command-channel surface.
Validate `http + tls + tcp <= min(MAX_FDS_OUT, fds_received)` up front
(with a `checked_add` guard against `usize` overflow), and surface the
mismatch as a new `ScmSocketError::ListenersCountInconsistent`
carrying every input that contributed to the decision. The trailing
`tcp` slice is reworked from `received_fds[index..file_descriptor_length]`
to the symmetric `received_fds[index..index + tcp_len]`; identical
result under the new invariant, no silent excess-FD acceptance.
Regression: `rejects_listeners_count_with_more_entries_than_fds` in
`command/src/scm_socket.rs` ships a forged `ListenersCount { http: 3
entries }` with zero FDs over a real socket pair and asserts the new
error variant instead of the pre-fix panic.
Signed-off-by: Florentin Dubois <florentin.dubois@clever.cloud>
Adds an end-to-end regression module that drives the production code
paths for both IPC framing fixes — not just the unit-level decoder.
`channel_short_delimiter_does_not_panic_worker` spawns a live worker
via the `Worker::start_new_worker` harness, writes a raw 8-byte
delimiter declaring `message_len = 5` directly to the underlying mio
unix stream, asserts the worker thread is still alive after the
malformed frame, then sends a real `SoftStop` and asserts
`wait_for_server_stop()` returns cleanly. The second assertion is the
re-sync proof: without dropping the bogus delimiter from the front
buffer (the channel.rs companion change), the SoftStop would never
decode and the harness would hang.
`scm_inconsistent_listeners_count_does_not_panic_server_constructor`
ships a forged `ListenersCount { http: 3 entries }` with zero file
descriptors over a real SCM socket pair (calling `nix::sendmsg`
directly to bypass `ScmSocket::send_listeners`, which keeps addresses
and FDs in sync by construction), then exercises the full
`Server::try_new_from_config` constructor on the receiving side and
pattern-matches on `ServerError::ScmSocket(ListenersCountInconsistent)`
instead of the pre-fix indexing panic. The match arm references the
new variant, so a future revert of `scm_socket.rs` fails to compile —
compile-time regression gate.
Sensitivity verified by reverting each fix locally:
- without the channel.rs bounds check, the e2e test panics at
`channel.rs:498:49` with the exact original slice OOB message;
- without the scm_socket.rs guard, the file fails to compile because
the new enum variant is gone.
Adds two `[dev-dependencies]` to `e2e/Cargo.toml`:
- `prost` to encode the forged length-delimited `ListenersCount`;
- `nix` with `socket` + `uio` features for the raw `sendmsg` shim.
Signed-off-by: Florentin Dubois <florentin.dubois@clever.cloud>
24bc851 to
f0cd5ab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two panic-on-malformed-IPC bugs in
sozu-command-libare reachable across the worker/master trust boundary, and one of them — the length-delimited command channel — is externally reachable through the admin unix socket. Both panicked withslice index starts at N but ends at Mon attacker-controlled framing. This branch hardens the receiver and adds unit-level + end-to-end regression tests.command/src/channel.rs:476—&buffer[delimiter_size()..message_len]message_len < delimiter_size()and drop the bogus delimiter so the channel re-syncscommand/src/scm_socket.rs:148-167—received_fds[index..index+len]http+tls+tcp ≤ min(MAX_FDS_OUT, fds_received)before slicingBoth bugs violate the rule stated in
CLAUDE.md: "No panic on network-facing input. In parser, socket, mux, TLS, command-channel, and config paths, convert invalid traffic into … a contextual log."Finding 1 — channel.rs slice-OOB (HIGH)
Channel::try_read_delimited_messagedecoded a peer-suppliedusizelength prefix and then sliced&buffer[delimiter_size()..message_len]. A peer that wrote a delimiter declaringmessage_len < delimiter_size()(e.g. 5 on a 64-bit host) bypassed the existing> max_buffer_sizeguard and panicked the receiver. Reachable from any local user with write perms on the admin unix socket → full-proxy DoS until restart.Patch:
ChannelError::MessageLengthUnderDelimitervariant, symmetric with the existingMessageTooLargeupper-bound rejection;self.front_buf.consume(delimiter_size())) before returning, so the channel re-syncs on the peer's next valid frame instead of looping forever on the stale header.CWE-129 (Improper Validation of Array Index) → reachable as CWE-248 (Uncaught Exception).
Finding 2 — scm_socket.rs slice-OOB (MEDIUM)
ScmSocket::receive_listenerszipped the peer-suppliedListenersCount.{http,tls,tcp}lists against a fixed[RawFd; MAX_FDS_OUT](200 slots) without bounds-checking the declared counts. A manifest declaring more entries thanMAX_FDS_OUTor more entries than the FDs that actually arrived panicked the receiver onreceived_fds[index..index + len], or on the trailingreceived_fds[index..file_descriptor_length]slice whenindex > file_descriptor_length.Patch:
http + tls + tcp ≤ min(MAX_FDS_OUT, fds_received)up front, withchecked_addto guard againstusizeoverflow;ScmSocketError::ListenersCountInconsistentcarrying every input that contributed to the decision;received_fds[index..file_descriptor_length]to the symmetricreceived_fds[index..index + tcp_len](identical result under the new invariant; no silent excess-FD acceptance).Protocol / security impact
ListenersCount(where counts match the FD set, as produced byScmSocket::send_listeners) is accepted identically.doc/configure.mdunchanged.Tests added
Unit regressions (live alongside the patched code):
command/src/channel.rs::tests::rejects_declared_length_below_delimiter— writes a 5-byte-declared delimiter via rawWrite::write_allon the underlying mio unix stream and asserts the new error variant instead of the pre-fix panic.command/src/scm_socket.rs::tests::rejects_listeners_count_with_more_entries_than_fds— ships a forgedListenersCount { http: 3 }with zero FDs and asserts the new error variant.End-to-end regression (new file
e2e/src/tests/command_channel_security_tests.rs, 218 lines):channel_short_delimiter_does_not_panic_workerspawns a real worker viaWorker::start_new_worker, writes the malformed delimiter, asserts the worker thread is still alive, then sends a realSoftStopand assertswait_for_server_stop()returns cleanly — the second assertion is the re-sync proof.scm_inconsistent_listeners_count_does_not_panic_server_constructordrives the realServer::try_new_from_configconstructor with a forged SCM payload and pattern-matches onServerError::ScmSocket(ListenersCountInconsistent). The match arm references the new variant, so a future revert ofscm_socket.rsfails to compile — compile-time regression gate.Sensitivity verified by reverting each fix locally:
channel.rs:498:49with the exact originalslice index starts at 8 but ends at 5message;Commands run
Doc updates
None required — no new config keys, metrics, CLI flags, or behaviour change for well-formed traffic.
CLAUDE.md's "no panic on network-facing input" rule already covers the command-channel path; this PR brings the implementation in line with that rule.Test plan
crypto-ring/crypto-aws-lc-rs/crypto-openssl/fips) all greencargo +nightly fmt --checkgreencargo test -p sozu-e2e -- command_channel_securityshows 2 passing testsfix(command):commit locally to confirm the e2e tests turn red (panic for channel, compile error for SCM)