Skip to content

Fix one_collect panics when reading in-process perf ring buffer lost records#275

Merged
brianrob merged 5 commits into
mainfrom
copilot/fix-one-collect-panics
May 19, 2026
Merged

Fix one_collect panics when reading in-process perf ring buffer lost records#275
brianrob merged 5 commits into
mainfrom
copilot/fix-one-collect-panics

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 19, 2026

Summary

Fixes panics in RingBufDataSource when consuming the in-process perf
ring buffer. The root cause was that InProcessRingBufWriter emitted
PERF_RECORD_LOST records without the trailing sample_id_all fields
that the rest of the writer uses, so the reader decoded a bogus sample
id and then either:

  • panicked on &ring_bufs[&id] for an unknown id, or
  • read past the end of the record / mis-aligned subsequent reads.

This PR fixes the root cause and lightly hardens the reader so similar
malformed records can never panic again.

Changes

  • InProcessRingBufWriter::write_lost_record — emit the
    sample_id_all trailer (pid/tid, time, id) on PERF_RECORD_LOST
    records so they match the layout produced for normal records.
    LOST_RECORD_SIZE bumped from 24 → 48 bytes; the timestamp is filled
    in via perf_timestamp(...), and id/tid are left as 0.
  • RingBufDataSource::read_time — replace the panicking
    ring_bufs[&id] index with ring_bufs.get(&id). On an unknown id,
    log a warning, advance the cursor past just that one record, and
    return None so the caller's outer scan retries on the next pass.
    This preserves the original single-peek semantics — we never silently
    consume more than one record per call, so if the unknown-id path ever
    fires due to a real bug it stays visible rather than draining the
    ring quietly.
  • RingBufDataSource::read_time — add a minimum-size guard
    (header.size < data_offset + 16) to avoid u16 underflow when
    computing the time/id offsets for non-sample records. On a too-small
    record, advance by at least the header size so the ring still drains
    and return None. The cursor parameter is now &mut CpuRingCursor
    to allow advancing; both existing callsites already passed &mut.

Tests

  • Updated three existing InProcessRingBuf lost-record tests to
    assert the new 48-byte record size and read back the
    sample_id_all id field.
  • Added read_time_skips_unknown_ring_buffer_id covering the new
    skip-on-unknown-id path in read_time: the first call returns
    None after advancing past the unknown record, and a follow-up call
    returns the time of the known record.

Risk

Scope is limited to Linux perf_event::rb. The on-wire layout for
PERF_RECORD_LOST now matches what the rest of the in-process writer
already emits for normal records, so external readers that already
handle sample_id_all for sample records will handle the new lost
records the same way.

Copilot AI changed the title [WIP] Fix panics in one-collect during dotnet-trace collect-linux Prevent ring-buffer ID lookup panics in collect-linux perf parsing May 19, 2026
Copilot AI requested a review from brianrob May 19, 2026 19:16
@brianrob
Copy link
Copy Markdown
Member

@copilot, also look at our in-process ring buffer implementation and make sure that this isn't an issue where we're writing a bad value that then panics during the read path. We did make changes recently in this area.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 19, 2026

@copilot, also look at our in-process ring buffer implementation and make sure that this isn't an issue where we're writing a bad value that then panics during the read path. We did make changes rece...

I traced this through the in-process ring buffer path and found we were emitting PERF_RECORD_LOST without sample_id_all trailer fields, which can produce an invalid trailing ID for the read path. I fixed that in commit 891a80f by appending (pid/tid, time, id) to in-process lost records, updated the lost-record tests to validate the new layout, and re-ran targeted + full one_collect tests.

@brianrob brianrob marked this pull request as ready for review May 19, 2026 20:35
@brianrob brianrob requested a review from beaubelgrave May 19, 2026 20:36
Revert read_time back to single-peek semantics on the unknown-id and

invalid-header paths so we don't silently consume an unbounded run of

records in a single call. On either path we now advance past just the

offending record (or the minimum header size if the size field is too

small) and return None, letting the caller's outer scan retry on the

next pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brianrob brianrob changed the title Prevent ring-buffer ID lookup panics in collect-linux perf parsing Fix one_collect panics when reading in-process perf ring buffer lost records May 19, 2026
@brianrob brianrob merged commit 6ccba44 into main May 19, 2026
18 checks passed
@brianrob brianrob deleted the copilot/fix-one-collect-panics branch May 19, 2026 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

one-collect panics on latest package when running dotnet-trace collect-linux

3 participants