Skip to content

feat(generator): instrument first-byte latency in send_file_list (INC_RECURSE I1)#4103

Merged
oferchen merged 1 commit into
masterfrom
feat/inc-recurse-first-byte-latency-2196
May 15, 2026
Merged

feat(generator): instrument first-byte latency in send_file_list (INC_RECURSE I1)#4103
oferchen merged 1 commit into
masterfrom
feat/inc-recurse-first-byte-latency-2196

Conversation

@oferchen
Copy link
Copy Markdown
Owner

Summary

Adds INC_RECURSE diagnostic counter I1 from the sender-side INC_RECURSE re-enablement plan (#2089): the elapsed time from send_file_list() entry to the first wire byte the receiver can observe.

A lightweight FirstByteWriter wraps the wire writer for the duration of send_file_list. It samples Instant::now() exactly once - on the first non-empty write - and records the resulting Duration. Per-entry overhead is one already-populated Option check; there is no per-entry sampling.

The latency is surfaced through every existing diagnostic channel:

  • TransferTiming.flist_first_byte_latency (internal field on GeneratorContext)
  • GeneratorStats.flist_first_byte_latency (public API for downstream consumers)
  • info_log!(Flist, 1, ...) - visible at -v --info=flist1
  • info_log!(Stats, 3, ...) - visible at -v --info=stats3 per the task spec
  • debug_log!(Flist, 2, ...) with the entry count for context
  • tracing::info!(target = "rsync::flist", latency_us, entries, ...) when the tracing feature is enabled

upstream: flist.c send_file_list / send_dir_name first-byte timing.

Closes #2196. Refs parent #2089.

Test plan

  • Added send_file_list_records_first_byte_latency: pushes one synthesized FileEntry, calls send_file_list, asserts flist_first_byte_latency is Some(d) with d > Duration::ZERO.
  • Added send_file_list_first_byte_latency_recorded_for_empty_list: verifies the probe still fires when only the one-byte end marker is written.
  • git diff --stat reports 160 insertions / 8 deletions across 4 files, under the 200-line budget.
  • CI: fmt + clippy, nextest (stable), Windows, macOS, Linux musl.

Adds INC_RECURSE diagnostic counter I1 (#2196): the elapsed time from
send_file_list() entry to the first wire byte the receiver can observe.

A lightweight FirstByteWriter wraps the wire writer for the duration of
send_file_list. It samples Instant::now() once on the first non-empty
write and records the elapsed Duration. Per-entry overhead is one
already-set Option check; there is no per-entry sampling.

The latency is exposed through:

- TransferTiming.flist_first_byte_latency (internal field)
- GeneratorStats.flist_first_byte_latency (public API)
- info_log!(Flist, 1, ...) and info_log!(Stats, 3, ...) for human output
- debug_log!(Flist, 2, ...) with entry count context
- tracing::info!(target = "rsync::flist", ...) when the tracing feature
  is enabled

Two unit tests cover the populated-list and empty-list paths and assert
that the recorded Duration is Some and non-zero.

upstream: flist.c send_file_list / send_dir_name first-byte timing

Refs #2089
@github-actions github-actions Bot added the enhancement New feature or request label May 15, 2026
@oferchen oferchen merged commit 10650a3 into master May 15, 2026
39 checks passed
oferchen added a commit that referenced this pull request May 16, 2026
…CURSE I4) (#4120)

* feat(transfer): add wire_to_flat_ndx partition_point counters (INC_RECURSE I4)

Adds INC_RECURSE diagnostic counter I4 (#2199): two process-global
AtomicU64 counters per side (generator and receiver) that track every
wire_to_flat_ndx / flat_to_wire_ndx invocation and the cumulative
partition_point comparison depth across the segment table.

The counters are sampled at end-of-transfer in the existing finalize
paths used by PR #4103 for the first-byte latency probe:

- generator: GeneratorContext::run, just before returning GeneratorStats
- receiver: finalize_transfer in receiver/transfer/phases.rs

Both sides emit a debug_log!(Genr, 1, ...) line with the totals and,
when the optional tracing feature is enabled, a tracing::debug! event
at rsync::generator::ndx_convert / rsync::receiver::ndx_convert.

A small partition_point_depth(len) helper approximates the worst-case
comparison count as floor(log2(len)) + 1.

Public API: a new ndx_convert_totals() free function in both modules
returns (call_count, cumulative_partition_point_depth). It is the only
addition to the public surface; no CLI flag, no wire-format change, no
change to the existing wire_to_flat_ndx / flat_to_wire_ndx signatures.

Unit tests in both crates exercise the converters and assert:

- the call counter monotonically grows by at least the number of
  invocations made in the test (>= guards against concurrent test
  runs incrementing the shared counter)
- the cumulative partition_point depth grows by at least
  N * depth(segment_table_len) after N calls against a known-size
  segment table

Closes #2199

* chore: regenerate Cargo.lock

* style: cargo fmt
oferchen added a commit that referenced this pull request May 16, 2026
…4142)

* feat(transfer): instrument prepare_pending_acl cost (INC_RECURSE I5)

Adds INC_RECURSE diagnostic counter I5 (#2200): two process-global
AtomicU64 counters that track every prepare_pending_acl invocation on
the generator file-list path and the cumulative wall time spent inside
the helper across all generator transfers.

The counters fire from a single record_prepare_acl helper bumped from
prepare_pending_acl itself, which wraps the existing body in an
Instant::now() pair. The early-return path (entry.is_symlink() or
!flags.acls) is still counted so per-segment call rate stays comparable
to entry count, but contributes a near-zero ns delta.

Totals are sampled in GeneratorContext::run via prepare_acl_totals() in
the same finalize block used by PR #4103 (I1 first-byte latency),
PR #4120 (I4 NDX partition_point) and PR #4121 (I3 flush rate), and
emitted via:

- debug_log!(Genr, 1, "generator prepare_pending_acl totals: \
  calls=N elapsed_ns=M")
- tracing::debug! event on rsync::generator::prepare_acl when the
  optional tracing feature is enabled

Two unit tests in crates/transfer/src/generator/tests.rs cover the
new counters:

- prepare_acl_call_counter_increments asserts the call counter and the
  cumulative elapsed_ns counter grow by at least N and sum-of-durations
  after N synthetic record_prepare_acl invocations. The assertion uses
  >= because the counter is shared across the process and other tests
  may run concurrently.
- prepare_acl_totals_observable_without_prep asserts the totals
  snapshot is a pure read - constructing a generator does not bump the
  counter on its own, so two adjacent reads with no intervening prep
  call must return identical values.

No new CLI flag, no wire-format change, no change to the public
prepare_pending_acl signature.

Closes #2200

* fix(transfer): rewrap rustdoc list items for prepare_pending_acl counters
oferchen added a commit that referenced this pull request May 16, 2026
…RSE I2) (#4143)

Adds INC_RECURSE diagnostic counter I2 (#2197): two process-global
AtomicU64 counters that track every encode_and_send_segment invocation
on the generator file-list path and the cumulative wall time spent
inside the helper across all generator transfers.

The counters fire from a single record_segment_dispatch helper bumped
from encode_and_send_segment itself, which wraps the existing body in
an Instant::now() pair. The zero-entry early-return path is still
counted so per-transfer dispatch rate stays comparable to segment
scheduler activity, but contributes a near-zero ns delta.

Totals are sampled in GeneratorContext::run via
segment_dispatch_totals() in the same finalize block used by PR #4103
(I1 first-byte latency), PR #4120 (I4 NDX partition_point), PR #4121
(I3 flush rate) and PR #4142 (I5 prepare_pending_acl), and emitted
via:

- debug_log!(Genr, 1, "generator encode_and_send_segment totals: \
  calls=N elapsed_ns=M")
- tracing::debug! event on rsync::generator::segment_dispatch when
  the optional tracing feature is enabled

Two unit tests in crates/transfer/src/generator/tests.rs cover the
new counters:

- segment_dispatch_call_counter_increments asserts the call counter
  and the cumulative elapsed_ns counter grow by at least N and
  sum-of-durations after N synthetic record_segment_dispatch
  invocations. The assertion uses >= because the counter is shared
  across the process and other tests may run concurrently.
- segment_dispatch_totals_observable_without_dispatch asserts the
  totals snapshot is a pure read - constructing a generator does not
  bump the counter on its own, so two adjacent reads with no
  intervening dispatch call must return identical values.

No new CLI flag, no wire-format change, no change to the public
encode_and_send_segment signature.

Closes #2197
@oferchen oferchen deleted the feat/inc-recurse-first-byte-latency-2196 branch May 16, 2026 19:50
oferchen added a commit that referenced this pull request May 18, 2026
Adds INC_RECURSE diagnostic counter I1 (#2196): the elapsed time from
send_file_list() entry to the first wire byte the receiver can observe.

A lightweight FirstByteWriter wraps the wire writer for the duration of
send_file_list. It samples Instant::now() once on the first non-empty
write and records the elapsed Duration. Per-entry overhead is one
already-set Option check; there is no per-entry sampling.

The latency is exposed through:

- TransferTiming.flist_first_byte_latency (internal field)
- GeneratorStats.flist_first_byte_latency (public API)
- info_log!(Flist, 1, ...) and info_log!(Stats, 3, ...) for human output
- debug_log!(Flist, 2, ...) with entry count context
- tracing::info!(target = "rsync::flist", ...) when the tracing feature
  is enabled

Two unit tests cover the populated-list and empty-list paths and assert
that the recorded Duration is Some and non-zero.

upstream: flist.c send_file_list / send_dir_name first-byte timing

Refs #2089
oferchen added a commit that referenced this pull request May 18, 2026
…CURSE I4) (#4120)

* feat(transfer): add wire_to_flat_ndx partition_point counters (INC_RECURSE I4)

Adds INC_RECURSE diagnostic counter I4 (#2199): two process-global
AtomicU64 counters per side (generator and receiver) that track every
wire_to_flat_ndx / flat_to_wire_ndx invocation and the cumulative
partition_point comparison depth across the segment table.

The counters are sampled at end-of-transfer in the existing finalize
paths used by PR #4103 for the first-byte latency probe:

- generator: GeneratorContext::run, just before returning GeneratorStats
- receiver: finalize_transfer in receiver/transfer/phases.rs

Both sides emit a debug_log!(Genr, 1, ...) line with the totals and,
when the optional tracing feature is enabled, a tracing::debug! event
at rsync::generator::ndx_convert / rsync::receiver::ndx_convert.

A small partition_point_depth(len) helper approximates the worst-case
comparison count as floor(log2(len)) + 1.

Public API: a new ndx_convert_totals() free function in both modules
returns (call_count, cumulative_partition_point_depth). It is the only
addition to the public surface; no CLI flag, no wire-format change, no
change to the existing wire_to_flat_ndx / flat_to_wire_ndx signatures.

Unit tests in both crates exercise the converters and assert:

- the call counter monotonically grows by at least the number of
  invocations made in the test (>= guards against concurrent test
  runs incrementing the shared counter)
- the cumulative partition_point depth grows by at least
  N * depth(segment_table_len) after N calls against a known-size
  segment table

Closes #2199

* chore: regenerate Cargo.lock

* style: cargo fmt
oferchen added a commit that referenced this pull request May 18, 2026
…4142)

* feat(transfer): instrument prepare_pending_acl cost (INC_RECURSE I5)

Adds INC_RECURSE diagnostic counter I5 (#2200): two process-global
AtomicU64 counters that track every prepare_pending_acl invocation on
the generator file-list path and the cumulative wall time spent inside
the helper across all generator transfers.

The counters fire from a single record_prepare_acl helper bumped from
prepare_pending_acl itself, which wraps the existing body in an
Instant::now() pair. The early-return path (entry.is_symlink() or
!flags.acls) is still counted so per-segment call rate stays comparable
to entry count, but contributes a near-zero ns delta.

Totals are sampled in GeneratorContext::run via prepare_acl_totals() in
the same finalize block used by PR #4103 (I1 first-byte latency),
PR #4120 (I4 NDX partition_point) and PR #4121 (I3 flush rate), and
emitted via:

- debug_log!(Genr, 1, "generator prepare_pending_acl totals: \
  calls=N elapsed_ns=M")
- tracing::debug! event on rsync::generator::prepare_acl when the
  optional tracing feature is enabled

Two unit tests in crates/transfer/src/generator/tests.rs cover the
new counters:

- prepare_acl_call_counter_increments asserts the call counter and the
  cumulative elapsed_ns counter grow by at least N and sum-of-durations
  after N synthetic record_prepare_acl invocations. The assertion uses
  >= because the counter is shared across the process and other tests
  may run concurrently.
- prepare_acl_totals_observable_without_prep asserts the totals
  snapshot is a pure read - constructing a generator does not bump the
  counter on its own, so two adjacent reads with no intervening prep
  call must return identical values.

No new CLI flag, no wire-format change, no change to the public
prepare_pending_acl signature.

Closes #2200

* fix(transfer): rewrap rustdoc list items for prepare_pending_acl counters
oferchen added a commit that referenced this pull request May 18, 2026
…RSE I2) (#4143)

Adds INC_RECURSE diagnostic counter I2 (#2197): two process-global
AtomicU64 counters that track every encode_and_send_segment invocation
on the generator file-list path and the cumulative wall time spent
inside the helper across all generator transfers.

The counters fire from a single record_segment_dispatch helper bumped
from encode_and_send_segment itself, which wraps the existing body in
an Instant::now() pair. The zero-entry early-return path is still
counted so per-transfer dispatch rate stays comparable to segment
scheduler activity, but contributes a near-zero ns delta.

Totals are sampled in GeneratorContext::run via
segment_dispatch_totals() in the same finalize block used by PR #4103
(I1 first-byte latency), PR #4120 (I4 NDX partition_point), PR #4121
(I3 flush rate) and PR #4142 (I5 prepare_pending_acl), and emitted
via:

- debug_log!(Genr, 1, "generator encode_and_send_segment totals: \
  calls=N elapsed_ns=M")
- tracing::debug! event on rsync::generator::segment_dispatch when
  the optional tracing feature is enabled

Two unit tests in crates/transfer/src/generator/tests.rs cover the
new counters:

- segment_dispatch_call_counter_increments asserts the call counter
  and the cumulative elapsed_ns counter grow by at least N and
  sum-of-durations after N synthetic record_segment_dispatch
  invocations. The assertion uses >= because the counter is shared
  across the process and other tests may run concurrently.
- segment_dispatch_totals_observable_without_dispatch asserts the
  totals snapshot is a pure read - constructing a generator does not
  bump the counter on its own, so two adjacent reads with no
  intervening dispatch call must return identical values.

No new CLI flag, no wire-format change, no change to the public
encode_and_send_segment signature.

Closes #2197
oferchen added a commit that referenced this pull request May 18, 2026
Adds INC_RECURSE diagnostic counter I1 (#2196): the elapsed time from
send_file_list() entry to the first wire byte the receiver can observe.

A lightweight FirstByteWriter wraps the wire writer for the duration of
send_file_list. It samples Instant::now() once on the first non-empty
write and records the elapsed Duration. Per-entry overhead is one
already-set Option check; there is no per-entry sampling.

The latency is exposed through:

- TransferTiming.flist_first_byte_latency (internal field)
- GeneratorStats.flist_first_byte_latency (public API)
- info_log!(Flist, 1, ...) and info_log!(Stats, 3, ...) for human output
- debug_log!(Flist, 2, ...) with entry count context
- tracing::info!(target = "rsync::flist", ...) when the tracing feature
  is enabled

Two unit tests cover the populated-list and empty-list paths and assert
that the recorded Duration is Some and non-zero.

upstream: flist.c send_file_list / send_dir_name first-byte timing

Refs #2089
oferchen added a commit that referenced this pull request May 18, 2026
…CURSE I4) (#4120)

* feat(transfer): add wire_to_flat_ndx partition_point counters (INC_RECURSE I4)

Adds INC_RECURSE diagnostic counter I4 (#2199): two process-global
AtomicU64 counters per side (generator and receiver) that track every
wire_to_flat_ndx / flat_to_wire_ndx invocation and the cumulative
partition_point comparison depth across the segment table.

The counters are sampled at end-of-transfer in the existing finalize
paths used by PR #4103 for the first-byte latency probe:

- generator: GeneratorContext::run, just before returning GeneratorStats
- receiver: finalize_transfer in receiver/transfer/phases.rs

Both sides emit a debug_log!(Genr, 1, ...) line with the totals and,
when the optional tracing feature is enabled, a tracing::debug! event
at rsync::generator::ndx_convert / rsync::receiver::ndx_convert.

A small partition_point_depth(len) helper approximates the worst-case
comparison count as floor(log2(len)) + 1.

Public API: a new ndx_convert_totals() free function in both modules
returns (call_count, cumulative_partition_point_depth). It is the only
addition to the public surface; no CLI flag, no wire-format change, no
change to the existing wire_to_flat_ndx / flat_to_wire_ndx signatures.

Unit tests in both crates exercise the converters and assert:

- the call counter monotonically grows by at least the number of
  invocations made in the test (>= guards against concurrent test
  runs incrementing the shared counter)
- the cumulative partition_point depth grows by at least
  N * depth(segment_table_len) after N calls against a known-size
  segment table

Closes #2199

* chore: regenerate Cargo.lock

* style: cargo fmt
oferchen added a commit that referenced this pull request May 18, 2026
…4142)

* feat(transfer): instrument prepare_pending_acl cost (INC_RECURSE I5)

Adds INC_RECURSE diagnostic counter I5 (#2200): two process-global
AtomicU64 counters that track every prepare_pending_acl invocation on
the generator file-list path and the cumulative wall time spent inside
the helper across all generator transfers.

The counters fire from a single record_prepare_acl helper bumped from
prepare_pending_acl itself, which wraps the existing body in an
Instant::now() pair. The early-return path (entry.is_symlink() or
!flags.acls) is still counted so per-segment call rate stays comparable
to entry count, but contributes a near-zero ns delta.

Totals are sampled in GeneratorContext::run via prepare_acl_totals() in
the same finalize block used by PR #4103 (I1 first-byte latency),
PR #4120 (I4 NDX partition_point) and PR #4121 (I3 flush rate), and
emitted via:

- debug_log!(Genr, 1, "generator prepare_pending_acl totals: \
  calls=N elapsed_ns=M")
- tracing::debug! event on rsync::generator::prepare_acl when the
  optional tracing feature is enabled

Two unit tests in crates/transfer/src/generator/tests.rs cover the
new counters:

- prepare_acl_call_counter_increments asserts the call counter and the
  cumulative elapsed_ns counter grow by at least N and sum-of-durations
  after N synthetic record_prepare_acl invocations. The assertion uses
  >= because the counter is shared across the process and other tests
  may run concurrently.
- prepare_acl_totals_observable_without_prep asserts the totals
  snapshot is a pure read - constructing a generator does not bump the
  counter on its own, so two adjacent reads with no intervening prep
  call must return identical values.

No new CLI flag, no wire-format change, no change to the public
prepare_pending_acl signature.

Closes #2200

* fix(transfer): rewrap rustdoc list items for prepare_pending_acl counters
oferchen added a commit that referenced this pull request May 18, 2026
…RSE I2) (#4143)

Adds INC_RECURSE diagnostic counter I2 (#2197): two process-global
AtomicU64 counters that track every encode_and_send_segment invocation
on the generator file-list path and the cumulative wall time spent
inside the helper across all generator transfers.

The counters fire from a single record_segment_dispatch helper bumped
from encode_and_send_segment itself, which wraps the existing body in
an Instant::now() pair. The zero-entry early-return path is still
counted so per-transfer dispatch rate stays comparable to segment
scheduler activity, but contributes a near-zero ns delta.

Totals are sampled in GeneratorContext::run via
segment_dispatch_totals() in the same finalize block used by PR #4103
(I1 first-byte latency), PR #4120 (I4 NDX partition_point), PR #4121
(I3 flush rate) and PR #4142 (I5 prepare_pending_acl), and emitted
via:

- debug_log!(Genr, 1, "generator encode_and_send_segment totals: \
  calls=N elapsed_ns=M")
- tracing::debug! event on rsync::generator::segment_dispatch when
  the optional tracing feature is enabled

Two unit tests in crates/transfer/src/generator/tests.rs cover the
new counters:

- segment_dispatch_call_counter_increments asserts the call counter
  and the cumulative elapsed_ns counter grow by at least N and
  sum-of-durations after N synthetic record_segment_dispatch
  invocations. The assertion uses >= because the counter is shared
  across the process and other tests may run concurrently.
- segment_dispatch_totals_observable_without_dispatch asserts the
  totals snapshot is a pure read - constructing a generator does not
  bump the counter on its own, so two adjacent reads with no
  intervening dispatch call must return identical values.

No new CLI flag, no wire-format change, no change to the public
encode_and_send_segment signature.

Closes #2197
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant