Skip to content

docs(audits): fuzz coverage gap followups (#2316)#4347

Merged
oferchen merged 1 commit into
masterfrom
docs/fuzz-protocol-parsing-gaps
May 17, 2026
Merged

docs(audits): fuzz coverage gap followups (#2316)#4347
oferchen merged 1 commit into
masterfrom
docs/fuzz-protocol-parsing-gaps

Conversation

@oferchen
Copy link
Copy Markdown
Owner

Summary

  • Decompose the remaining P0/P1 fuzz coverage gaps from docs/audits/fuzz-coverage-matrix.md into individually filable FCV-NN follow-up tasks.
  • New docs/audits/fuzz-coverage-gap-followups.md lists per-gap: parse function file:line, recommended Arbitrary input shape, panic / no-panic invariants, and priority justification.
  • Covers the two open P0 gaps (negotiation_prologue, capability_vstring) plus the P0 fill-in legacy_sniffer and the five highest P1 surfaces (compat_flags, filter_list_wire, ndx_codec, incremental_flist, idlist, compress_decoders).

Test plan

  • Doc-only change: no code paths touched.
  • cargo fmt --all --check (no Rust changes; runs clean).
  • Cross-link from fuzz-coverage-matrix.md resolves to the new file.

Decompose the remaining P0/P1 fuzz coverage gaps from the FCV audit
into individually filable follow-up tasks. Each entry cites the parse
function (file:line), recommends a raw vs structured Arbitrary input
shape, and lists the panic / no-panic invariants the harness should
assert.

Covers the two open P0 gaps (negotiation_prologue, capability_vstring)
plus a P0 fill-in (legacy_sniffer reader paths) and the five highest
P1 surfaces (compat_flags, filter_list_wire, ndx_codec,
incremental_flist, idlist, compress_decoders).
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 17, 2026
@oferchen oferchen merged commit d681741 into master May 17, 2026
10 checks passed
oferchen added a commit that referenced this pull request May 18, 2026
…ete/context.rs (#2353) (#4365)

* docs: design SpillPolicy public API + env-var surface (#2335) (#4340)

Introduces design for a 5-knob SpillPolicy struct replacing the single
spill_threshold_bytes field on ConcurrentDeltaConfig. Documents defaults,
env-var precedence under OC_RSYNC_SPILL_*, ops-friendly CLI flags
(--spill-dir, --spill-threshold-bytes), validation rules, a one-release
deprecated-forwarding migration story, and a 5-step implementation plan
keyed to subtasks STN-2..STN-14.

Design only; no runtime changes.

* docs(audits): mutex poison recovery classification (#2350, #2351, #2358) (#4341)

Inventory all `.lock().expect()` / `.lock().unwrap()` (and `RwLock` peers)
sites across the workspace. Classify each as RECOVERABLE (safe to
`PoisonError::into_inner()` and proceed), FATAL (mid-operation panic
leaves wire state inconsistent, must propagate), or TEST-ONLY (test
fixtures and `#[cfg(test)]` env mutexes).

Adds tables for the delete module and workspace hotspots, per-crate
counts, and follow-up MPE-3..MPE-17 task list anchored on a shared
`lock_or_recover` / `lock_or_panic` helper. Identifies
`crates/engine/src/local_copy/context_impl/options.rs` as the highest-
value single remediation target (7 FATAL batch-writer sites).

* feat(engine): add lock_or_recover helpers for poison recovery (#4342)

Introduce util::poison with lock_or_recover, read_or_recover, and
write_or_recover helpers that promote PoisonError into the inner guard
for state that remains valid after a panicking thread unwinds.

These helpers will be used by upcoming refactors that replace
.lock().expect(...) call sites in modules whose protected state is
monotonic or otherwise self-consistent (delete emitter recorder,
buffer pool counters, work-queue shards) so a single panicking worker
no longer cascades into a full pool abort.

Includes unit tests that poison a Mutex and RwLock via a panicking
spawned thread and verify the helpers return a usable guard rather
than re-panicking.

* docs(audits): io_uring data path coverage (#2361) (#4343)

Catalogue every SQE construction site in fast_io/io_uring by opcode
category. Trace each site upward to confirm whether engine, transfer,
or rsync_io route real production data through it. Refute the
"metadata-only" framing - the receiver write path, disk-commit thread,
and sender source-read path all submit file-data SQEs today. Identify
the three production sites where switching to io_uring data SQEs would
yield the largest throughput gain (sockets, sub-1MiB source reads,
sparse/append fallback) for follow-on IUD-2 and IUD-3 work.

* test(fuzz): extend varint decode fuzz target (#2318) (#4344)

Adds structured boundary coverage to the protocol varint fuzz target
so it does not rely on the corpus to reach extreme inputs:

- i32 edges (MIN/MAX, +-1, 0, -0x4000_0000) and i64 edges encoded with
  write_int/write_varint/write_varlong/write_longint/encode_varint_to_vec,
  decoded with the matching readers and asserted to roundtrip.
- Truncated buffers: every strict prefix of each encoded boundary value
  is fed through decode_varint/read_varint/read_varlong/read_longint to
  exercise mid-byte UnexpectedEof paths without panicking.
- Both protocol 31 (legacy 4-byte LE via read_int) and protocol 32
  (varint) paths driven through read_varint30_int with proto 28-32.
- Legacy longint short and long (0xFFFFFFFF-prefixed) forms.

The unstructured arbitrary-bytes coverage from the original target is
preserved and still runs every iteration.

* docs: document Windows NTFS ACL behaviour and lossy cases (#2313) (#4346)

Expand the --acls reference, README platform table, and operator
migration guide to describe the Tier 1C partial Windows DACL path:
deny ACEs dropped, inherited ACEs skipped, SACL gated on
--audit-acls, non-rwx access bits collapsed, and unresolvable SIDs
dropped. Document --audit-acls, --fail-on-windows-acl-loss, and
--windows-acls as planned flags with SDDL wire encoding notes, and
cross-link the design doc from all three surfaces.

* docs(audits): fuzz coverage gap followups (#2316) (#4347)

Decompose the remaining P0/P1 fuzz coverage gaps from the FCV audit
into individually filable follow-up tasks. Each entry cites the parse
function (file:line), recommends a raw vs structured Arbitrary input
shape, and lists the panic / no-panic invariants the harness should
assert.

Covers the two open P0 gaps (negotiation_prologue, capability_vstring)
plus a P0 fill-in (legacy_sniffer reader paths) and the five highest
P1 surfaces (compat_flags, filter_list_wire, ndx_codec,
incremental_flist, idlist, compress_decoders).

* feat(rsync_io): ssh-socketpair-stderr connection primitive (#2372) (#4348)

Implements task SSE-3 from docs/design/socketpair-stderr-channel.md.

Adds the `socketpair_stderr` module gated behind the new
`ssh-socketpair-stderr` Cargo feature (default off). The primitive
returns a connected pair of `File`-typed byte streams ready to back the
async drain that SSE-4 will layer on top.

Unix path uses `UnixStream::pair` (`socketpair(AF_UNIX, SOCK_STREAM, 0)`
under the hood with `SOCK_CLOEXEC`) and converts both halves through
`OwnedFd` so callers store the parent and child ends behind the same
handle type as the existing pipe fallback. A `set_nonblocking` helper
toggles `O_NONBLOCK` via the safe stdlib API, ready for the SSE-4
tokio reactor registration.

Windows currently returns `io::ErrorKind::Unsupported`; the loopback
shim ships under SSE-5 alongside the safe handle bridge in `fast_io`,
so the crate keeps `#![deny(unsafe_code)]` intact.

Tests gated on Unix + feature flag: 1 KiB round-trip, peer-close EOF,
and `set_nonblocking` mode flip surfaced as `WouldBlock`.

* docs(design): io_uring data path - receive + send (#2362, #2363) (#4349)

Two companion design docs specifying how transfer file data will be
routed through io_uring SQEs end-to-end on Linux:

- iouring-receive-data-path.md: route disk-commit chunk writes through
  registered buffers + IORING_OP_WRITE_FIXED, gated by the new
  iouring-data-writes feature (default off). Shares a pool with the
  SMR-3a basis reader.
- iouring-send-data-path.md: basis re-read via IORING_OP_READ_FIXED
  feeding directly into IORING_OP_SEND_ZC against the multiplex
  socket, with the 4-byte MSG_DATA header pre-pended in-slot. Gated
  by the new iouring-data-sends feature (default off).

Both docs spell out ordering, fsync placement, short-write handling,
fallback hierarchy, SQPOLL exclusion, and a 5-step phased rollout
keyed to follow-up tasks IUD-5..IUD-8. No code changes.

* docs(audits): Windows hardlink ACL inheritance (#2311) (#4350)

Audits every ACL apply site for files participating in NTFS hardlink
cohorts, compares oc-rsync behaviour to upstream rsync 3.4.2 on cygwin,
and documents the one divergence: the local-copy --copy-dest Link
branch re-applies the source DACL once per follower instead of relying
on NTFS inode-level sharing. The writes are idempotent (identical
bytes), so correctness holds; the cost is N redundant
SetNamedSecurityInfoW round trips and audit log noise. Includes a
3-link cohort test plan covering wire receive, local copy, --copy-dest
and cross-cohort isolation.

* ci: nightly fuzz coverage report per target (#2322) (#4351)

Add `Fuzz Coverage Report` workflow that runs `cargo fuzz coverage` for
every target across the `fuzz/`, `crates/protocol/fuzz/`, and
`crates/filters/fuzz/` workspaces nightly at 03:30 UTC and uploads the
resulting `.lcov` files as 30-day workflow artifacts. The job is
informational (`continue-on-error: true`) and emits a per-target line
coverage summary to the GitHub step summary, mirroring the SAFETY audit
informational pattern from ci.yml.

Documents the rollout in docs/audits/fuzz-coverage-matrix.md so the
matrix and promotion criteria stay in one place.

* bench(windows): per-hotspot drilldown mode (#2305) (#4352)

Adds three env-gated sub-scenarios to scripts/windows_throughput_bench.sh
that isolate the IOCP hotspots catalogued in the IOCP sync-blocking
audit, so future Windows write/read/network improvements can be
attributed to specific changes instead of moving the aggregate
large_1gib / small_10000 numbers.

Enabled by OC_RSYNC_BENCH_DRILLDOWN=1; the existing scenarios run
unchanged when the flag is absent.

- write_only_iocp: oc-rsync --whole-file --inplace vs cp control to
  pin the IocpWriter per-IO drain (audit rows #1, #4, #13).
- read_only_iocp: oc-rsync --dry-run vs upstream --dry-run over the
  1 GiB fixture to pin the IocpReader drain (audit rows #2, #3).
- network_only_loopback: rsync push between two short-lived loopback
  daemons on 127.0.0.1, cancelling out disk bandwidth so only the
  IocpSocket send/recv path varies (audit rows #8-#11).

Documents the invocation, hotspot mapping, and interpretation rules
in docs/benchmarks/windows-throughput.md. The drilldown scenarios
are not in the required-checks list and have no acceptable-band
thresholds; they exist to attribute movement, not to gate merges.

* docs(architecture): BGID lifecycle (#2299) (#4353)

Add docs/architecture/bgid-lifecycle.md covering the allocation flow,
Drop-driven recycling rules, u16 namespace exhaustion math, pre-sized
free-list rationale, planned high-water-mark stat (BGE-3), and the
planned graceful fallback when exhausted (BGE-6). Cross-references the
audit at docs/audits/bgid-lifecycle.md (PR #4331) and the session
topology overview.

* feat(metadata): Windows DACL/SACL SDDL round-trip (#2307, #2308) (#4354)

Adds opt-in SDDL serialisation helpers in crates/metadata/src/acl_windows.rs:

- read_dacl_sddl(path): GetNamedSecurityInfoW + ConvertSecurityDescriptor
  ToStringSecurityDescriptorW for owner/group/DACL.
- read_sddl_with_sacl(path): same plus SACL (requires SE_SECURITY_NAME).
- write_dacl_sddl(path, sddl): ConvertStringSecurityDescriptorToSecurity
  DescriptorW + SetNamedSecurityInfoW, applying owner, group, DACL, and
  SACL components present in the parsed descriptor. DACL is written with
  PROTECTED_DACL_SECURITY_INFORMATION per design section 5.2 to avoid
  silently inheriting extra ACEs from the parent directory.

All Win32-allocated buffers are wrapped in RAII guards (OwnedSecurity
Descriptor, OwnedLocalWString) that release via LocalFree on drop, even
on error paths. Unsafe blocks each carry SAFETY notes naming the
invariant they uphold.

Implements the Windows fidelity payload described in
docs/design/windows-ntfs-acl-support.md section 4.2. Re-exports the new
functions from crates/metadata/src/lib.rs under the existing
cfg(all(feature = "acl", windows)) gate so consumers can drive the
round-trip without poking module internals.

Tests on a Windows tempdir:
- read_dacl_sddl_returns_non_empty_for_temp_file
- write_dacl_sddl_round_trips_known_descriptor
- write_dacl_sddl_preserves_owner_and_group
- write_dacl_sddl_rejects_invalid_input

* feat(fast_io): BGID high-water mark + 50%-occupancy warning (#2295) (#4355)

Adds an observable peak counter to the io_uring buffer-group-ID
allocator so operators can see the worst-case namespace pressure
the process has experienced.

- `PEAK_USED: AtomicU16` is updated via `fetch_max` after every
  successful `BgidAllocator::allocate`, including reuses from the
  free-list. Deallocation never lowers the value.
- `bgid_peak_used()` and `bgid_inflight()` expose the stat and the
  live snapshot for dashboards and capacity tests.
- The free-list is pre-sized to 4 096 entries so steady-state
  daemon churn does not trigger `Vec` reallocations under the
  free-list mutex.
- When the in-flight count crosses 50 % of the 16-bit namespace
  (32 768) a throttled `tracing::warn!` fires at most once per
  30 s so a hot allocate loop cannot flood the log.
- Non-Linux stub mirrors the new public accessors with zero
  returns so cross-platform callers compile without cfg-gating.
- Tests cover the 100-allocation peak, the deallocate monotonicity,
  the pre-sized capacity, and the inflight subtraction.

* docs(architecture): drain error recovery contract (#2385) (#4356)

* feat(fast_io): IOCP concurrent_ops auto-size based on CPU count (#2302) (#4358)

Replace the static IocpConfig::concurrent_ops default (4) with a
CPU-derived value: (cpus * 4).clamp(8, 64). This mirrors io_uring's
default_sq_entries derivation so wide hosts keep more overlapped
WriteFile operations in flight, while a 1-CPU host still gets a
healthy 8-deep submission window.

The field stays overridable for explicit tuning. The
COMPLETION_DRAIN_BATCH = 64 in disk_batch.rs and DEFAULT_BATCH_SIZE
= 64 in pump.rs remain fixed (intentionally aligned with the new
MAX_CONCURRENT_OPS ceiling) so a single GetQueuedCompletionStatusEx
call can reap an entire in-flight cohort; the pump path already
grows on demand up to MAX_BATCH_SIZE on ERROR_INSUFFICIENT_BUFFER
(#1930).

* docs: Mutex poison recovery policy (#2360) (#4359)

Capture the engine-wide contract for reacting to a poisoned Mutex /
RwLock. Distils the per-site audit at docs/audits/mutex-poison-policy.md
and the helpers landed in crates/engine/src/util/poison.rs into one
document new code can follow.

Defines the four cases (RECOVERABLE, FATAL, TEST-ONLY, UNAUDITED), the
decision tree for picking between them, code templates for the
RECOVERABLE (lock_or_recover) and FATAL (expect + # Panics rustdoc)
paths, the per-crate audit counts (23 fatal / 28 recoverable / 317
test-only across 368 sites), and the MPE-4..MPE-10 promotion path that
ends with the contract becoming a CI lint.

Cross-references docs/architecture/drain-error-recovery.md for the
ATU-series companion and the rsync_io/fast_io call sites that already
match the helper recipe.

* feat(engine): SpillPolicy struct + ConcurrentDeltaConfig wiring (#2336) (#4360)

Introduce a public SpillPolicy aggregate that groups every spill-layer
knob (threshold_bytes, dir, reclaim_mode, granularity, compression)
under one value, and wire it through ConcurrentDeltaConfig so callers
can stage policy changes without touching the DeltaConsumer API.

The legacy spill_threshold_bytes / spill_dir fields collapse into
spill_policy; deprecated accessors keep existing call sites compiling
while pointing readers at the new field. Defaults match the historical
behaviour (no spill, no extra plumbing).

The new SpillCompression::Zstd variant lives behind the additive
spill-compression feature so default builds gain no dependency.

* test(metadata): Windows source -> Linux destination ACL round-trip (#2312) (#4366)

Pins the cross-platform translation contract from the Windows NTFS ACL
design (docs/design/windows-ntfs-acl-support.md, sections 5.1/5.2/5.3):

- Forward leg: synthetic Windows-side DACL with owner/group/Everyone
  allow ACEs plus one named user lowers to user_obj/group_obj/other_obj
  and a RsyncAcl::names entry, with name preserved.
- Reverse leg: POSIX mode 0o755 plus one named user and one named group
  round-trips through the Windows-side ACE shape and back to the same
  rwx triplet, kinds, RIDs, and account names.
- Lossy path: deny ACEs are dropped on lowering; POSIX bits derive
  exclusively from surviving allow ACEs.

Gate on OC_RSYNC_METADATA_INTEROP=1 to match acl_xattr_roundtrip_linux,
and cfg(unix) so Linux/macOS CI runners can pick it up. Uses synthetic
SDDL-style fixtures so no Windows host is needed; assertions exercise
the documented mapping rules without depending on the in-flight WAS-2,
WAS-3, and WAS-4 helpers.

* docs(audits): Windows IOCP profiling methodology (#2300) (#4370)

WPG-1 deliverable: methodology a Windows operator follows to convert
the wall-clock signal from the throughput bench into a ranked hotspot
table. Picks ETW + WPA as the default stack (free, ships with the
Windows ADK), with VTune and WPR as alternatives. Documents the six
profiling steps (release-with-debug build, drilldown bench run, ETW
CPU + IO captures, classify, append results), defines the three
hotspot buckets (per-IO blocking drain, CQ-depth saturation,
bounce-buffer copy) keyed to rows of the sync-blocking audit and to
WPG-3/WPG-4 mitigations, and lays out a five-step implementation
plan keyed to WPG-4..WPG-6.

* docs: document ssh-socketpair-stderr feature (#2377) (#4368)

Adds the experimental `ssh-socketpair-stderr` feature to the README
cargo features table and a dedicated subsection in the vNEXT operator
migration guide. The flag was introduced by SSE-3 (#2372) on top of
the design in docs/design/socketpair-stderr-channel.md (#2371) and
gates the AF_UNIX socketpair stderr channel that lets the async SSH
transport register the drain with epoll/kqueue/tokio AsyncFd and wake
it via shutdown(2) instead of consuming a per-connection blocking
thread.

Migration guide entry explains when to opt in (Linux endpoints
running with async-ssh, fan-out clients, deployments wanting the
larger socketpair kernel buffer and shutdown-based wake) and when to
leave the default (Windows until SSE-5 lands, sync-only SSH builds,
operators who prefer the simpler pipe semantics for debugging).
Updates the section 4 count, rollback list, and appendix table.
Cross-links to the design doc and the SSE-1 stderr handling audit.

Task SSE-8.

* docs: SpillPolicy user-facing documentation (#2346) (#4378)

Document the receiver-side SpillPolicy surface across the three
operator-facing docs:

- README: add a Receiver memory tuning subsection under the cargo
  features table cross-linking the public-API design and the spillable
  buffer internals.
- docs/oc-rsync.1.md: document the planned --spill-dir and
  --spill-threshold-bytes flags (STN-11) under Performance Options,
  including the OC_RSYNC_SPILL_* env-var bridge until they ship.
- docs/operator-migration-guide-vNEXT.md: add a Receiver spill
  tunability subsection covering all five OC_RSYNC_SPILL_* env vars,
  defaults, precedence, and when to override.

* docs: ssh-socketpair-stderr opt-in feature (SSE-8 #2377) (#4385)

* docs(audit): SPL-9 spill mod.rs re-export audit (#2331) (#4390)

* refactor(engine): document delete/context mutex panic invariants (#2353)

Per the workspace mutex-poison policy
(docs/audits/mutex-poison-policy.md, MPE-1/2/9), every `.lock().expect()`
site in `crates/engine/src/delete/context.rs` is classified FATAL. The
cursor stack encodes upstream's parent-before-child traversal order and
the segment-entries buffer is overwritten in place; recovery from a
poisoned guard would mis-order deletes or unlink stale names. The strict
policy is to keep `.lock().expect("...")` and document the invariant.

Changes
- Add `# Panics` sections to `observe_directory`, `begin_directory`,
  `publish_plan_for` naming the invariant and pointing at the audit doc.
- Refresh the existing `# Panics` block on `observe_segment_for_delete`
  to match the new wording.
- Normalise the three `cursor.lock().unwrap()` test sites to
  `cursor.lock().expect("test cursor poisoned")` so diagnostics match
  the production message.

No production swap to `lock_or_recover` here: the audit lists 0
RECOVERABLE sites in this file. The helper now lives in
`crates/engine/src/util/poison.rs` (PR #4342) for later MPE-* tickets
that target genuinely recoverable state (drain shard buffer, recorder
sinks, bgid free lists).

No public API change. `cargo fmt --all` clean.
oferchen added a commit that referenced this pull request May 18, 2026
Decompose the remaining P0/P1 fuzz coverage gaps from the FCV audit
into individually filable follow-up tasks. Each entry cites the parse
function (file:line), recommends a raw vs structured Arbitrary input
shape, and lists the panic / no-panic invariants the harness should
assert.

Covers the two open P0 gaps (negotiation_prologue, capability_vstring)
plus a P0 fill-in (legacy_sniffer reader paths) and the five highest
P1 surfaces (compat_flags, filter_list_wire, ndx_codec,
incremental_flist, idlist, compress_decoders).
oferchen added a commit that referenced this pull request May 18, 2026
Decompose the remaining P0/P1 fuzz coverage gaps from the FCV audit
into individually filable follow-up tasks. Each entry cites the parse
function (file:line), recommends a raw vs structured Arbitrary input
shape, and lists the panic / no-panic invariants the harness should
assert.

Covers the two open P0 gaps (negotiation_prologue, capability_vstring)
plus a P0 fill-in (legacy_sniffer reader paths) and the five highest
P1 surfaces (compat_flags, filter_list_wire, ndx_codec,
incremental_flist, idlist, compress_decoders).
@oferchen oferchen deleted the docs/fuzz-protocol-parsing-gaps branch May 19, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant