-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ringbuf: Add event counters #1621
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has totally checked 504 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
503 | 1 | 0 | 0 |
Click to see the invalid file list
- lib/ringbuf/macros/src/lib.rs
Ringbufs provide a fixed-size buffer for storing diagnostic events, which may include data. This is very useful for debugging. However, because the ring buffer has a fixed size, only the last `N` events are visible at any given time. If a ringbuf has a capacity of, say, 16 events, and a given event has occurred 10,000 times over the lifespan of the task, there's currently no way to know that the event has happened more than 16 times. Furthermore, there's no way to use the ringbuf to determine whether a given event has _ever_ occurred during the task's lifetime: if a given variant is not _currently_ present in the ringbuf, it may still have occurred previously and been "drowned out" by subsequent events. Finally, some targets disable the ringbuf entirely, for space reasons, and on those targets, it may still be desirable to be able to get some less granular diagnostic data out of the ringbuf events a task records. Therefore, this commit extends the `ringbuf` crate to support generating counts of the number of times a particular event variant has been recorded, in addition to a fixed-size buffer of events. These counters provide less detailed data than inspecting the ringbuf contents, because they don't record the values of any _fields_ on that event. However, they allow recording very large numbers of events in a fixed amount of space, and provide historical data for events that may have occurred in the past and then "fallen off" the end of the ringbuf as new events were recorded. By inspecting these counters, we can determine if a given variant has *ever* been recorded, even if it isn't currently in the buffer, and we can see the total number of times it has been recorded over the task's entire lifetime. Event counters are implemented as an array of `AtomicUsize` counts paired with an `&'static str` name for the variant that was recorded. A new `counted_ringbuf!` macro is provided to create a ringbuf and also generate a `{RINGBUF_NAME}_COUNTS` static containing its counters. In order to produce event counts, a value type must implement a new `ringbuf::Event` trait, which allows enumerating the variant names of an `enum` type and assigning indices into the counter array to each variant. A `#[derive(ringbuf::Event)]` macro allows generating implementations of this trait automatically. Finally, a new `event!` macro in the `ringbuf` crate both records a ringbuf entry _and_ increments the corresponding event count. I chose to add new macros rather than modifying the existing `ringbuf!` macros to allow the event counters to be adopted incrementally; otherwise, it would be necessary to add `#[derive(ringbuf::Event)]` to _every_ type that's ever recorded as a ringbuf evntry as part of this commit. Because recording event counters requires only `8 * <N_VARIANTS>` bytes, counters are currently recorded even when the `ringbuf` crate has the `disabled` feature set. This way, we can still record some diagnostic data on targets that don't have the space to store the ringbuf itself. If it becomes necessary to also disable counters, we could add a separate `disable-counters` feature as well, so that a target can pick and choose between recording ringbuf entries, counts, or both. Future work will include: - [ ] Actually adopting event counters in existing ringbufs. - [ ] Updating `humility ringbuf` to also dump ringbuf event counts.
30f2b37
to
39bbfcb
Compare
This commit changes `gimlet-seq-server` to use the `counted_ringbuf!` macro added in e3b942c
As a note, this design was based on the constraint that we would want to track the event counts per-ringbuf instance. If that's not the case, and we're okay with counters being global for each enum type that derives Is the constraint that we might want to use an event type in multiple ringbufs in the same task worth thinking about? Or are we okay with tracking counts at the level of the actual event type? |
Ringbufs provide a fixed-size buffer for storing diagnostic events, which may include data. This is very useful for debugging. However, because the ring buffer has a fixed size, only the last `N` events are visible at any given time. If a ringbuf has a capacity of, say, 16 events, and a given event has occurred 10,000 times over the lifespan of the task, there's currently no way to know that the event has happened more than 16 times. Furthermore, there's no way to use the ringbuf to determine whether a given event has _ever_ occurred during the task's lifetime: if a given variant is not _currently_ present in the ringbuf, it may still have occurred previously and been "drowned out" by subsequent events. Finally, some targets disable the ringbuf entirely, for space reasons, and on those targets, it may still be desirable to be able to get some less granular diagnostic data out of the ringbuf events a task records. Therefore, this commit extends the `ringbuf` crate to support generating counts of the number of times a particular event variant has been recorded, in addition to a fixed-size buffer of events. These counters provide less detailed data than inspecting the ringbuf contents, because they don't record the values of any _fields_ on that event. However, they allow recording very large numbers of events in a fixed amount of space, and provide historical data for events that may have occurred in the past and then "fallen off" the end of the ringbuf as new events were recorded. By inspecting these counters, we can determine if a given variant has *ever* been recorded, even if it isn't currently in the buffer, and we can see the total number of times it has been recorded over the task's entire lifetime. Event counters are implemented as an array of `AtomicUsize` counts paired with an `&'static str` name for the variant that was recorded. A new `counted_ringbuf!` macro is provided to create a ringbuf and also generate a `{RINGBUF_NAME}_COUNTS` static containing its counters. In order to produce event counts, a value type must implement a new `ringbuf::Event` trait, which allows enumerating the variant names of an `enum` type and assigning indices into the counter array to each variant. A `#[derive(ringbuf::Event)]` macro allows generating implementations of this trait automatically. Finally, a new `event!` macro in the `ringbuf` crate both records a ringbuf entry _and_ increments the corresponding event count. I chose to add new macros rather than modifying the existing `ringbuf!` macros to allow the event counters to be adopted incrementally; otherwise, it would be necessary to add `#[derive(ringbuf::Event)]` to _every_ type that's ever recorded as a ringbuf evntry as part of this commit. Because recording event counters requires only `8 * <N_VARIANTS>` bytes, counters are currently recorded even when the `ringbuf` crate has the `disabled` feature set. This way, we can still record some diagnostic data on targets that don't have the space to store the ringbuf itself. If it becomes necessary to also disable counters, we could add a separate `disable-counters` feature as well, so that a target can pick and choose between recording ringbuf entries, counts, or both. Future work will include: - [ ] Actually adopting event counters in existing ringbufs. - [ ] Updating `humility ringbuf` to also dump ringbuf event counts.
This commit changes `gimlet-seq-server` to use the `counted_ringbuf!` macro added in e3b942c
39bbfcb
to
ddcddac
Compare
argh, i guess a bunch of the devboard targets don't have |
Yeah, this is what I meant in chat when I mentioned the M0 doesn't have atomics. However, the M0 tends to be our most flash-constrained target, and often builds with ringbuf either reduced or disabled -- I wouldn't worry too much about good M0 support in this feature, personally, particularly since using atomics will be a lot smaller on every other platform. |
Initial thoughts: Debugger support would be simplified if we can figure out how to put the counters in either named statics, or a struct with named fields, rather than an array. At first glance I'm not enthusiastic about the generic name (Added in edit: Putting the strings in the binary is probably not necessary because our debuggers in general won't read them.) If it's not a lot of work, could you |
Oh, generating a struct with named fields is a great idea, I like that much more than the array with string pointers in it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has totally checked 505 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
504 | 1 | 0 | 0 |
Click to see the invalid file list
- lib/ringbuf/examples/counts.rs
As suggested by @cbiffle, this commit changes the ringbuf event counters code to generate a struct with named fields, rather than an array. This presents a nicer API to debuggers, and avoids the need to stick a bunch of string constants in the binary.
23ec66d
to
a43d644
Compare
As suggested by @cbiffle, this commit changes the ringbuf event counters code to generate a struct with named fields, rather than an array. This presents a nicer API to debuggers, and avoids the need to stick a bunch of string constants in the binary.
a43d644
to
730f0af
Compare
lib/ringbuf/src/lib.rs
Outdated
/// If you declared your ringbuffer without a name, you can also use this | ||
/// without a name, and it will default to `__RINGBUF`. | ||
#[macro_export] | ||
macro_rules! count_entry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like very much to not have to have a separate macro for recording the entry, and just always use ringbuf_entry!
, so that it's just declaring a ringbuf that has to use a different macro. However, all the ways of doing that I've been able to come up with so far require changing the static generated by ringbuf!
a bit, which...seems like it would break the debugger-facing API...
I didn't love that there were separate `count_entry!` and `ringbuf_entry!` macros --- it would be nicer if adopting event counts didn't require changing every macro invocation. This change reworks the macros to use a single `ringbuf_entry!` macro, and dispatch whether or not to count events based on whether the ringbuf was *declared* using `counted_ringbuf!` or `ringbuf!`. The way this works is by generating a module with the same name as the ringbuf, and generating a function inside that module that actually records the entry (and may or may not record event counts, depending on the macro that declared the ringbuf). This way, `ringbuf_entry!` can simply call `<RINGBUF_NAME>::record(...)`, and the function will determine whether or not to count events. It occurs to me that we could also use the module to generate the event counter static without having to do a proc-macro-y concatenation of idents.
Since the previous commit changes the `ringbuf!` and `counted_ringbuf!` macros to generate a module for each ringbuf, we can stick the counters static in there. This obviates the need to use proc-macros for incrementing the counters just so we can concatenate the ringbuf's name with `_COUNTS` to determine the name of the counter static. Now, we only use the proc macro for deriving `Count`. This allows us to make the derive macro optional.
Since the previous commit changes the `ringbuf!` and `counted_ringbuf!` macros to generate a module for each ringbuf, we can stick the counters static in there. This obviates the need to use proc-macros for incrementing the counters just so we can concatenate the ringbuf's name with `_COUNTS` to determine the name of the counter static. Now, we only use the proc macro for deriving `Count`. This allows us to make the derive macro optional.
ab5e744
to
6e86018
Compare
Since the previous commit changes the `ringbuf!` and `counted_ringbuf!` macros to generate a module for each ringbuf, we can stick the counters static in there. This obviates the need to use proc-macros for incrementing the counters just so we can concatenate the ringbuf's name with `_COUNTS` to determine the name of the counter static. Now, we only use the proc macro for deriving `Count`. This allows us to make the derive macro optional.
6e86018
to
094f7f4
Compare
Since the previous commit changes the `ringbuf!` and `counted_ringbuf!` macros to generate a module for each ringbuf, we can stick the counters static in there. This obviates the need to use proc-macros for incrementing the counters just so we can concatenate the ringbuf's name with `_COUNTS` to determine the name of the counter static. Now, we only use the proc macro for deriving `Count`. This allows us to make the derive macro optional.
094f7f4
to
95c9d82
Compare
I didn't love that there were separate `count_entry!` and `ringbuf_entry!` macros --- it would be nicer if adopting event counts didn't require changing every macro invocation. This change reworks the macros to use a single `ringbuf_entry!` macro, and dispatch whether or not to count events based on whether the ringbuf was *declared* using `counted_ringbuf!` or `ringbuf!`. The way this works is by generating a module with the same name as the ringbuf, and generating a function inside that module that actually records the entry (and may or may not record event counts, depending on the macro that declared the ringbuf). This way, `ringbuf_entry!` can simply call `<RINGBUF_NAME>::record(...)`, and the function will determine whether or not to count events. It occurs to me that we could also use the module to generate the event counter static without having to do a proc-macro-y concatenation of idents.
Since the previous commit changes the `ringbuf!` and `counted_ringbuf!` macros to generate a module for each ringbuf, we can stick the counters static in there. This obviates the need to use proc-macros for incrementing the counters just so we can concatenate the ringbuf's name with `_COUNTS` to determine the name of the counter static. Now, we only use the proc macro for deriving `Count`. This allows us to make the derive macro optional.
Currently, the `cargo test --workspace` invocation on CI tries to build example code for the host target, which is nice, as ensuring that the macro example builds at all is a useful test of the macros. However, because `ringbuf` depends on crates that currently only build for ARM targets, building the example for x86 breaks CI. To fix this, I've changed the `build_util::expose_m_profile()` function to return a `Result`, which is ignored when building lib crates like `armv6m-atomic-hack` and `static-cell` on non-ARM targets, and `unwrap`ped when building actual Hubris tasks. This way, it's now possible to build the lib crates for the host target, so the example compiles, but trying to compile Hubris for x86 will still fail (correctly).
I think this is my nicest design thus far, as it avoids a lot of the problems with the submodule based approach, and also only requires the proc macro if you want to derive counters.
This should hopefully help shave off a few bytes of `.rodata`?
5a4aa20
to
2379114
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a full hubris archive version bump? (I can't remember under what circumstances we would do that vs just bumping humility version)
@cbiffle I've updated the docs, as you mentioned offline. Let me know what you think! |
This commit replaces the manually-implemented event counters in `gimlet_inspector` with the `counted_ringbuf!` added in #1621. Since the motivation for using counters here was primarily size, I made the actual ringbuffer quite small; it only retains the last 8 events. This allows some insight into the last few things that happened while tracking the total number of events using counters.
Issue #1616 describes the need to add event counters to the `control-plane-agent` task. In particular, it's desirable to have counters of each IPC event and MGS message received by the `control-plane-agent` task. This commit uses the ringbuf counters added in #1621 and the `#[count(children)]` attribute added in this branch to generate counters from `control-plane-agent`'s MGS message events.
Issue #1616 describes the need to add event counters to the `control-plane-agent` task. In particular, it's desirable to have counters of each IPC event and MGS message received by the `control-plane-agent` task. This commit uses the ringbuf counters added in #1621 and the `#[count(children)]` attribute added in this branch to generate counters from `control-plane-agent`'s MGS message events.
Issue #1616 describes the need to add event counters to the `control-plane-agent` task. In particular, it's desirable to have counters of each IPC event and MGS message received by the `control-plane-agent` task. This commit uses the ringbuf counters added in #1621 and the `#[count(children)]` attribute added in this branch to generate counters from `control-plane-agent`'s MGS message events.
Issue #1616 describes the need to add event counters to the `control-plane-agent` task. In particular, it's desirable to have counters of each IPC event and MGS message received by the `control-plane-agent` task. This commit uses the ringbuf counters added in #1621 and the `#[count(children)]` attribute added in this branch to generate counters from `control-plane-agent`'s MGS message events.
Issue #1616 describes the need to add event counters to the `control-plane-agent` task. In particular, it's desirable to have counters of each IPC event and MGS message received by the `control-plane-agent` task. This commit uses the ringbuf counters added in #1621 and the `#[count(children)]` attribute added in this branch to generate counters from `control-plane-agent`'s MGS message events.
Hubris PR oxidecomputer/hubris#1621 added the ability for ring buffers whose entry type is an `enum` to generate a set of counters that track the total number of times each entry enum variant has been added to the ring buffer. PR oxidecomputer/hubris#1630 generalizes this to also allow counters to be generated for any `enum` without requiring a corresponding ring buffer. These counters are designed to be consumed by Humility. This PR updates Humility to understand these counters. In particular, it: - Changes the `humility ringbuf` command to understand `CountedRingbuf`s as well as un-counted `Ringbuf`s - When displaying a `CountedRingbuf`, the `humility ringbuf` command will now display the total counts for that ringbuf's entry variants (unless explicitly asked not to) - Adds a new `humility counters` subcommand for displaying all counters, whether or not they are attached to a ringbuf. The `humility counters` command behaves similarly to `humility ringbuf`, and allows filtering which counters are displayed based on the name of the variable and task, and can also be used to list all counters in the attached Hubris instance or dump. Both commands support a CLI flag to display counters with zero values. ## Examples <details open> <summary>Listing all counters in a dump:</summary> ```console $ humility -d hubris.c12.bufcounts5.core counters --list humility: attached to dump control_plane_agent: ADDR SIZE VARIABLE 0x2402b938 664 task_control_plane_agent::CRITICAL 0x24029000 1272 task_control_plane_agent::__RINGBUF gimlet_inspector: ADDR SIZE VARIABLE 0x2404ce40 12 task_gimlet_inspector::__COUNTERS gimlet_seq: ADDR SIZE VARIABLE 0x2403f424 48 drv_gimlet_seq_server::IPC_REQUESTS 0x2403e640 3448 drv_gimlet_seq_server::__RINGBUF thermal: ADDR SIZE VARIABLE 0x240039e0 912 task_thermal::__RINGBUF ``` </details> <details> <summary>Displaying all counters in a dump:</summary> ```console $ humility -d hubris.c12.bufcounts5.core counters humility: attached to dump control_plane_agent | +---> task_control_plane_agent::CRITICAL: | <no counts recorded> +---> task_control_plane_agent::__RINGBUF: <no counts recorded> gimlet_inspector | +---> task_gimlet_inspector::__COUNTERS: <no counts recorded> gimlet_seq | +---> drv_gimlet_seq_server::IPC_REQUESTS: | 411 GetState | 1 SetState(A0) +---> drv_gimlet_seq_server::__RINGBUF: 2068 Status 446 ClockConfigWrite 295 A0Status 102 A1Status 15 A0Power 2 V3P3SysA0VOut 1 SMStatus 1 SetState 1 RailsOff 1 Ident 1 A2Status 1 A2 1 SpdDimmsFound 1 Ice40PowerGoodV1P2 1 IdentValid 1 CPUPresent 1 Coretype 1 Programmed 1 Ice40Rails 1 InterruptFlags 1 RailsOn 1 UartEnabled 1 Ice40PowerGoodV3P3 1 UpdateState(A0) 1 A0 1 Reprogram 1 ClockConfigSuccess 1 ChecksumValid 1 PGStatus 1 Programming 1 PowerControl thermal | +---> task_thermal::__RINGBUF: 138 ControlPwm 6 FanAdded 3 AutoState(Boot) 2 AutoState(Running) 2 PowerModeChanged 1 Start 1 ThermalMode(Auto) ``` </details> <details open> <summary>Filtering by task:</summary> ```console $ humility -d hubris.c12.bufcounts5.core counters thermal humility: attached to dump thermal | +---> task_thermal::__RINGBUF: 138 ControlPwm 6 FanAdded 3 AutoState(Boot) 2 AutoState(Running) 2 PowerModeChanged 1 Start 1 ThermalMode(Auto) ``` </details> <details> <summary open>Sorting counters by their value:</summary> ```console $ humility -d hubris.c12.bufcounts5.core counters thermal --sort value humility: attached to dump thermal | +---> task_thermal::__RINGBUF: 138 ControlPwm 6 FanAdded 3 AutoState(Boot) 2 AutoState(Running) 2 PowerModeChanged 1 Start 1 ThermalMode(Auto) ``` </details> <details open> <summary>Sorting counters alphabetically:</summary> ```console $ humility -d hubris.c12.bufcounts5.core counters thermal --sort alpha humility: attached to dump thermal | +---> task_thermal::__RINGBUF: 3 AutoState(Boot) 2 AutoState(Running) 138 ControlPwm 6 FanAdded 2 PowerModeChanged 1 Start 1 ThermalMode(Auto) ``` </details> <details> <summary>Displaying all counters, including those with zero values:</summary> ```console $ humility -d hubris.c12.bufcounts5.core counters gimlet_seq --full humility: attached to dump gimlet_seq | +---> drv_gimlet_seq_server::IPC_REQUESTS: | 411 GetState | 1 SetState(_) | 0 | SetState(A2) | 0 | SetState(A2PlusFans) | 0 | SetState(A1) | 1 +---> SetState(A0) | 0 | SetState(A0PlusHP) | 0 | SetState(A0Thermtrip) | 0 | SetState(A0Reset) | 0 FansOn | 0 FansOff | 0 SendHardwareNmi | 0 ReadFpgaRegs +---> drv_gimlet_seq_server::__RINGBUF: 1 Ice40Rails 1 IdentValid 1 ChecksumValid 1 Reprogram 1 Programmed 1 Programming 1 Ice40PowerGoodV1P2 1 Ice40PowerGoodV3P3 1 RailsOff 1 Ident 1 A2Status 1 A2 0 A0FailureDetails 0 A0Failed(_) 0 | A0Failed(IllegalTransition) 0 | A0Failed(MuxToHostCPUFailed) 0 | A0Failed(MuxToSPFailed) 0 | A0Failed(ReadRegsFailed) 0 | A0Failed(CPUNotPresent) 0 | A0Failed(UnrecognizedCPU) 0 | A0Failed(A1Timeout) 0 | A0Failed(A0TimeoutGroupC) 0 | A0Failed(A0Timeout) 0 | A0Failed(I2cFault) 0 | A0Failed(ServerRestarted) 102 A1Status 1 CPUPresent 1 Coretype 295 A0Status 15 A0Power 0 NICPowerEnableLow 1 RailsOn 1 UartEnabled 1 A0 1 SetState 1 UpdateState(_) 0 | UpdateState(A2) 0 | UpdateState(A2PlusFans) 0 | UpdateState(A1) 1 +---> UpdateState(A0) 0 | UpdateState(A0PlusHP) 0 | UpdateState(A0Thermtrip) 0 | UpdateState(A0Reset) 446 ClockConfigWrite 1 ClockConfigSuccess 2068 Status 1 PGStatus 1 SMStatus 0 ResetCounts 1 PowerControl 1 InterruptFlags 2 V3P3SysA0VOut 0 SpdBankAbsent 0 SpdAbsent 1 SpdDimmsFound 0 I2cFault(_) 0 | I2cFault(BadResponse) 0 | I2cFault(BadArg) 0 | I2cFault(NoDevice) 0 | I2cFault(BadController) 0 | I2cFault(ReservedAddress) 0 | I2cFault(BadPort) 0 | I2cFault(NoRegister) 0 | I2cFault(BadMux) 0 | I2cFault(BadSegment) 0 | I2cFault(MuxNotFound) 0 | I2cFault(SegmentNotFound) 0 | I2cFault(SegmentDisconnected) 0 | I2cFault(MuxDisconnected) 0 | I2cFault(MuxMissing) 0 | I2cFault(BadMuxRegister) 0 | I2cFault(BusReset) 0 | I2cFault(BusResetMux) 0 | I2cFault(BusLocked) 0 | I2cFault(BusLockedMux) 0 | I2cFault(ControllerBusy) 0 | I2cFault(BusError) 0 | I2cFault(BadDeviceState) 0 | I2cFault(OperationNotSupported) 0 | I2cFault(IllegalLeaseCount) 0 | I2cFault(TooMuchData) 0 StartFailed(_) 0 | StartFailed(IllegalTransition) 0 | StartFailed(MuxToHostCPUFailed) 0 | StartFailed(MuxToSPFailed) 0 | StartFailed(ReadRegsFailed) 0 | StartFailed(CPUNotPresent) 0 | StartFailed(UnrecognizedCPU) 0 | StartFailed(A1Timeout) 0 | StartFailed(A0TimeoutGroupC) 0 | StartFailed(A0Timeout) 0 | StartFailed(I2cFault) 0 | StartFailed(ServerRestarted) ``` </details> <details open> <summary>Displaying variant total counts when running `humility ringbuf`:</summary> ```console $ humility -d hubris.c12.bufcounts5.core ringbuf thermal humility: attached to dump humility: ring buffer drv_i2c_devices::max31790::__RINGBUF in thermal: humility: ring buffer task_thermal::__RINGBUF in thermal: TOTAL VARIANT 138 ControlPwm 6 FanAdded 3 AutoState(Boot) 2 AutoState(Running) 2 PowerModeChanged 1 Start 1 ThermalMode(Auto) NDX LINE GEN COUNT PAYLOAD 0 346 1 1 Start 1 133 1 1 ThermalMode(Auto) 2 625 1 1 AutoState(Boot) 3 634 1 1 FanAdded(Fan(0x0)) 4 634 1 1 FanAdded(Fan(0x1)) 5 634 1 1 FanAdded(Fan(0x2)) 6 634 1 1 FanAdded(Fan(0x3)) 7 634 1 1 FanAdded(Fan(0x4)) 8 634 1 1 FanAdded(Fan(0x5)) 9 778 1 1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x1))) 10 625 1 1 AutoState(Boot) 11 876 1 1 AutoState(Running) 12 990 1 117 ControlPwm(0x0) 13 778 1 1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x2))) 14 625 1 1 AutoState(Boot) 15 876 1 1 AutoState(Running) 16 990 1 21 ControlPwm(0x0) ``` </details> <details> <summary>Displaying totals with zero values when running `humility ringbuf`:</summary> ```console $ humility -d hubris.c12.bufcounts5.core ringbuf thermal --full-totals humility: attached to dump humility: ring buffer drv_i2c_devices::max31790::__RINGBUF in thermal: humility: ring buffer task_thermal::__RINGBUF in thermal: TOTAL VARIANT 1 Start 1 ThermalMode(_) 0 | ThermalMode(Off) 0 | ThermalMode(Manual) 1 +---> ThermalMode(Auto) 5 AutoState(_) 3 +---> AutoState(Boot) 2 +---> AutoState(Running) 0 | AutoState(Overheated) 0 | AutoState(Uncontrollable) 0 FanReadFailed 0 MiscReadFailed 0 SensorReadFailed 0 PostFailed 138 ControlPwm 2 PowerModeChanged 0 PowerDownFailed 0 ControlError(_) 0 | ControlError(InvalidFan) 0 | ControlError(InvalidPWM) 0 | ControlError(DeviceError) 0 | ControlError(NotInManualMode) 0 | ControlError(NotInAutoMode) 0 | ControlError(AlreadyInAutoMode) 0 | ControlError(InvalidWatchdogTime) 0 | ControlError(InvalidParameter) 0 | ControlError(InvalidIndex) 0 | ControlError(ServerDeath) 0 FanPresenceUpdateFailed 6 FanAdded 0 FanRemoved 0 PowerDownAt 0 AddedDynamicInput 0 RemovedDynamicInput NDX LINE GEN COUNT PAYLOAD 0 346 1 1 Start 1 133 1 1 ThermalMode(Auto) 2 625 1 1 AutoState(Boot) 3 634 1 1 FanAdded(Fan(0x0)) 4 634 1 1 FanAdded(Fan(0x1)) 5 634 1 1 FanAdded(Fan(0x2)) 6 634 1 1 FanAdded(Fan(0x3)) 7 634 1 1 FanAdded(Fan(0x4)) 8 634 1 1 FanAdded(Fan(0x5)) 9 778 1 1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x1))) 10 625 1 1 AutoState(Boot) 11 876 1 1 AutoState(Running) 12 990 1 117 ControlPwm(0x0) 13 778 1 1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x2))) 14 625 1 1 AutoState(Boot) 15 876 1 1 AutoState(Running) 16 990 1 21 ControlPwm(0x0) ``` </details> <details> <summary>Opting out of totals when displaying a counted ringbuf:</summary> ```console $ humility -d hubris.c12.bufcounts5.core ringbuf thermal --no-totals humility: attached to dump humility: ring buffer drv_i2c_devices::max31790::__RINGBUF in thermal: humility: ring buffer task_thermal::__RINGBUF in thermal: NDX LINE GEN COUNT PAYLOAD 0 346 1 1 Start 1 133 1 1 ThermalMode(Auto) 2 625 1 1 AutoState(Boot) 3 634 1 1 FanAdded(Fan(0x0)) 4 634 1 1 FanAdded(Fan(0x1)) 5 634 1 1 FanAdded(Fan(0x2)) 6 634 1 1 FanAdded(Fan(0x3)) 7 634 1 1 FanAdded(Fan(0x4)) 8 634 1 1 FanAdded(Fan(0x5)) 9 778 1 1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x1))) 10 625 1 1 AutoState(Boot) 11 876 1 1 AutoState(Running) 12 990 1 117 ControlPwm(0x0) 13 778 1 1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x2))) 14 625 1 1 AutoState(Boot) 15 876 1 1 AutoState(Running) 16 990 1 21 ControlPwm(0x0) ``` </details>
Ringbufs provide a fixed-size buffer for storing diagnostic events,
which may include data. This is very useful for debugging. However,
because the ring buffer has a fixed size, only the last
N
events arevisible at any given time. If a ringbuf has a capacity of, say, 16
events, and a given event has occurred 10,000 times over the lifespan of
the task, there's currently no way to know that the event has happened
more than 16 times. Furthermore, there's no way to use the ringbuf to
determine whether a given event has ever occurred during the task's
lifetime: if a given variant is not currently present in the ringbuf,
it may still have occurred previously and been "drowned out" by
subsequent events. Finally, some targets disable the ringbuf entirely,
for space reasons, and on those targets, it may still be desirable to be
able to get some less granular diagnostic data out of the ringbuf events
a task records.
Therefore, this commit extends the
ringbuf
crate to support generatingcounts of the number of times a particular event variant has been
recorded, in addition to a fixed-size buffer of events. These counters
provide less detailed data than inspecting the ringbuf contents, because
they don't record the values of any fields on that event. However,
they allow recording very large numbers of events in a fixed amount of
space, and provide historical data for events that may have occurred in
the past and then "fallen off" the end of the ringbuf as new events were
recorded. By inspecting these counters, we can determine if a given
variant has ever been recorded, even if it isn't currently in the
buffer, and we can see the total number of times it has been recorded
over the task's entire lifetime.
Event counters are implemented using a new
CountedRingbuf
type, whichmarries a ring buffer with a type which can count occurances of entry
variants.
CountedRingbuf
s may be declared using thecounted_ringbuf!
macro, and entries can be recorded using the existing
ringbuf_entry!
macro.
CountedRingbuf
requires that the entry type implement a newCount
trait, which defines the counter type, an initializer for creating new
instances of the counter type, and a method to increment the counter
type with a given instance of the entry type. An implementation of
ringbuf::Count
can be generated for an entry type using the#[derive(ringbuf::Count)]
attribute, which generates astruct
withan
AtomicU32
field for each of the derivingenum
's variants. Thisstruct
can then be loaded by Humility to provide a view of the eventcounters.
Because recording event counters requires only
4 * <N_VARIANTS>
bytes,counters are currently recorded even when the
ringbuf
crate has thedisabled
feature set. This way, we can still record some diagnosticdata on targets that don't have the space to store the ringbuf itself.
If it becomes necessary to also disable counters, we could add a
separate
disable-counters
feature as well, so that a target can pickand choose between recording ringbuf entries, counts, or both.
As a proof of concept, I've also updated
gimlet-seq-server
to use thenew event counters for its
Trace
ringbuf. Subsequent commits will rollit out elsewhere.
Future work will include:
humility ringbuf
to also dump ringbuf event counts (seeadd
humility counters
, show counts inhumility ringbuf
humility#449).Depends on #1624