Fix one_collect panics when reading in-process perf ring buffer lost records by Copilot · Pull Request #275 · microsoft/one-collect

Copilot · 2026-05-19T19:03:50Z

Summary

Fixes panics in RingBufDataSource when consuming the in-process perf
ring buffer. The root cause was that InProcessRingBufWriter emitted
PERF_RECORD_LOST records without the trailing sample_id_all fields
that the rest of the writer uses, so the reader decoded a bogus sample
id and then either:

panicked on &ring_bufs[&id] for an unknown id, or
read past the end of the record / mis-aligned subsequent reads.

This PR fixes the root cause and lightly hardens the reader so similar
malformed records can never panic again.

Changes

InProcessRingBufWriter::write_lost_record — emit the
sample_id_all trailer (pid/tid, time, id) on PERF_RECORD_LOST
records so they match the layout produced for normal records.
LOST_RECORD_SIZE bumped from 24 → 48 bytes; the timestamp is filled
in via perf_timestamp(...), and id/tid are left as 0.
RingBufDataSource::read_time — replace the panicking
ring_bufs[&id] index with ring_bufs.get(&id). On an unknown id,
log a warning, advance the cursor past just that one record, and
return None so the caller's outer scan retries on the next pass.
This preserves the original single-peek semantics — we never silently
consume more than one record per call, so if the unknown-id path ever
fires due to a real bug it stays visible rather than draining the
ring quietly.
RingBufDataSource::read_time — add a minimum-size guard
(header.size < data_offset + 16) to avoid u16 underflow when
computing the time/id offsets for non-sample records. On a too-small
record, advance by at least the header size so the ring still drains
and return None. The cursor parameter is now &mut CpuRingCursor
to allow advancing; both existing callsites already passed &mut.

Tests

Updated three existing InProcessRingBuf lost-record tests to
assert the new 48-byte record size and read back the
sample_id_all id field.
Added read_time_skips_unknown_ring_buffer_id covering the new
skip-on-unknown-id path in read_time: the first call returns
None after advancing past the unknown record, and a follow-up call
returns the time of the known record.

Risk

Scope is limited to Linux perf_event::rb. The on-wire layout for
PERF_RECORD_LOST now matches what the rest of the in-process writer
already emits for normal records, so external readers that already
handle sample_id_all for sample records will handle the new lost
records the same way.

brianrob · 2026-05-19T19:17:30Z

@copilot, also look at our in-process ring buffer implementation and make sure that this isn't an issue where we're writing a bad value that then panics during the read path. We did make changes recently in this area.

Copilot · 2026-05-19T19:25:17Z

@copilot, also look at our in-process ring buffer implementation and make sure that this isn't an issue where we're writing a bad value that then panics during the read path. We did make changes rece...

I traced this through the in-process ring buffer path and found we were emitting PERF_RECORD_LOST without sample_id_all trailer fields, which can produce an invalid trailing ID for the read path. I fixed that in commit 891a80f by appending (pid/tid, time, id) to in-process lost records, updated the lost-record tests to validate the new layout, and re-ran targeted + full one_collect tests.

Revert read_time back to single-peek semantics on the unknown-id and invalid-header paths so we don't silently consume an unbounded run of records in a single call. On either path we now advance past just the offending record (or the minimum header size if the size field is too small) and return None, letting the caller's outer scan retry on the next pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Initial plan

13e9ccf

Copilot AI assigned Copilot and brianrob May 19, 2026

Copilot started work on behalf of brianrob May 19, 2026 19:03 View session

Copilot AI linked an issue May 19, 2026 that may be closed by this pull request

one-collect panics on latest package when running dotnet-trace collect-linux #274

Closed

Copilot AI added 2 commits May 19, 2026 19:11

Handle missing ring buffer IDs without panicking

02ce117

Skip unknown ring buffer records in read_time

6b9a0c8

Copilot AI changed the title ~~[WIP] Fix panics in one-collect during dotnet-trace collect-linux~~ Prevent ring-buffer ID lookup panics in collect-linux perf parsing May 19, 2026

Copilot finished work on behalf of brianrob May 19, 2026 19:16

Copilot AI requested a review from brianrob May 19, 2026 19:16

Copilot started work on behalf of brianrob May 19, 2026 19:17 View session

Emit sample_id_all fields in in-process lost records

891a80f

Copilot finished work on behalf of brianrob May 19, 2026 19:25

brianrob marked this pull request as ready for review May 19, 2026 20:35

brianrob requested a review from beaubelgrave May 19, 2026 20:36

brianrob changed the title ~~Prevent ring-buffer ID lookup panics in collect-linux perf parsing~~ Fix one_collect panics when reading in-process perf ring buffer lost records May 19, 2026

beaubelgrave approved these changes May 19, 2026

View reviewed changes

brianrob merged commit 6ccba44 into main May 19, 2026
18 checks passed

brianrob deleted the copilot/fix-one-collect-panics branch May 19, 2026 23:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix one_collect panics when reading in-process perf ring buffer lost records#275

Fix one_collect panics when reading in-process perf ring buffer lost records#275
brianrob merged 5 commits into
mainfrom
copilot/fix-one-collect-panics

Copilot AI commented May 19, 2026 •

edited by brianrob

Loading

Uh oh!

brianrob commented May 19, 2026

Uh oh!

Copilot AI commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 19, 2026 • edited by brianrob Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

Risk

Uh oh!

brianrob commented May 19, 2026

Uh oh!

Copilot AI commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 19, 2026 •

edited by brianrob

Loading