Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add #[count(children)] + #[count(skip)] attrs, and add counters to c-p-a and gimlet-seq #1630

Merged
merged 13 commits into from
Feb 28, 2024

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Feb 23, 2024

As described in #1616, it would be nice to have event counters in tasks
like control-plane-agent and gimlet-seq recording things like the
total count of each MGS message type (in control-plane-agent) or the
total number of times each IPC request was received. This branch does
that, using my changes from #1621 to generate event counters based on
ringbuf entries. In order to do so, I've made the following changes:

  • Add new helper attributes for the #[derive(ringbuf::Count)]
    attribute to skip counting a particular variant, and to generate a
    nested set of counters for a variant that contains another type
    implementing Count (d118d12)
  • Generate counters for control-plane-agent's Log ringbuf, including
    a counter for each MGS message kind
    (be4f6f1)
  • Count children for some entries in gimlet-seq's Trace ringbuf
    (713f7c4)
  • Add a Log::IpcRequest variant to control-plane-agent's ringbuf
    entry, and generate counters for each IPC request kind
    (602a44e)
  • Similarly, add a Trace::IpcRequest variant to gimlet-seq's ringbuf
    entry type, and generate counters for each IPC request kind
    (ff321b9)

I've structured this branch as a sequence of independent commits, which
should be able to be reviewed in order. Hopefully, these commits form a
narrative showing the changes to the #[derive(ringbuf::Count)] macro,
along with the motivating examples for why we would want to add those
features to the derive macro.

As an example of the type of data we can now get from counters, check
this out (using my changes from oxidecomputer/humility#449):

$ humility -d hubris.c12.bufcounts4.core ringbuf gimlet_seq -t --full-totals
humility: attached to dump
humility: ring buffer drv_gimlet_seq_server::__RINGBUF in gimlet_seq:
   TOTAL VARIANT
       1 Ice40Rails
       1 IdentValid
       1 ChecksumValid
       1 Reprogram
       1 Programmed
       1 Programming
       1 Ice40PowerGoodV1P2
       1 Ice40PowerGoodV3P3
       1 RailsOff
       1 Ident
       1 A2Status
       1 A2
       0 A0FailureDetails
       0 A0Failed(_)
       0 |     A0Failed(IllegalTransition)
       0 |     A0Failed(MuxToHostCPUFailed)
       0 |     A0Failed(MuxToSPFailed)
       0 |     A0Failed(ReadRegsFailed)
       0 |     A0Failed(CPUNotPresent)
       0 |     A0Failed(UnrecognizedCPU)
       0 |     A0Failed(A1Timeout)
       0 |     A0Failed(A0TimeoutGroupC)
       0 |     A0Failed(A0Timeout)
       0 |     A0Failed(I2cFault)
       0 |     A0Failed(ServerRestarted)
     102 A1Status
       1 CPUPresent
       1 Coretype
     292 A0Status
      15 A0Power
       0 NICPowerEnableLow
       1 RailsOn
       1 UartEnabled
       1 A0
       1 SetState
       1 UpdateState(_)
       0 |     UpdateState(A2)
       0 |     UpdateState(A2PlusFans)
       0 |     UpdateState(A1)
       1 +---> UpdateState(A0)
       0 |     UpdateState(A0PlusHP)
       0 |     UpdateState(A0Thermtrip)
       0 |     UpdateState(A0Reset)
     446 ClockConfigWrite
       1 ClockConfigSuccess
    4722 Status
       1 PGStatus
       1 SMStatus
       0 ResetCounts
       1 PowerControl
       1 InterruptFlags
       2 V3P3SysA0VOut
       0 SpdBankAbsent
       0 SpdAbsent
       1 SpdDimmsFound
       0 I2cFault
       0 StartFailed(_)
       0 |     StartFailed(IllegalTransition)
       0 |     StartFailed(MuxToHostCPUFailed)
       0 |     StartFailed(MuxToSPFailed)
       0 |     StartFailed(ReadRegsFailed)
       0 |     StartFailed(CPUNotPresent)
       0 |     StartFailed(UnrecognizedCPU)
       0 |     StartFailed(A1Timeout)
       0 |     StartFailed(A0TimeoutGroupC)
       0 |     StartFailed(A0Timeout)
       0 |     StartFailed(I2cFault)
       0 |     StartFailed(ServerRestarted)
     231 IpcRequest(_)
     230 +---> IpcRequest(GetState)
       1 +---> IpcRequest(SetState(_))
       0 |         SetState(A2)
       0 |         SetState(A2PlusFans)
       0 |         SetState(A1)
       1 +-------> SetState(A0)
       0 |         SetState(A0PlusHP)
       0 |         SetState(A0Thermtrip)
       0 |         SetState(A0Reset)
       0 |     IpcRequest(FansOn)
       0 |     IpcRequest(FansOff)
       0 |     IpcRequest(SendHardwareNmi)
       0 |     IpcRequest(ReadFpgaRegs)

Closes #1629
Closes #1616

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 506 files.

Valid Invalid Ignored Fixed
505 1 0 0
Click to see the invalid file list
  • lib/ringbuf/examples/count_children.rs

lib/ringbuf/examples/count_children.rs Show resolved Hide resolved
hawkw added a commit that referenced this pull request Feb 26, 2024
Issue #1616 describes the need to add event counters to the
`control-plane-agent` task. In particular, it's desirable to have
counters of each IPC event and MGS message received by the
`control-plane-agent` task. This commit uses the ringbuf counters added
in #1621 and the `#[count(children)]` attribute added in this branch to
generate counters from `control-plane-agent`'s MGS message events.
hawkw added a commit that referenced this pull request Feb 26, 2024
Previously, the `#[count(children)]` attribute for
`#[derive(ringbuf::Count)]` only worked on single-variant enums. This is
unfortunate, since most of the enums we would like to count have
multiple variants. This commit changes the derive macro to annotate
_fields_ rather than _variants_ with the `#[count(children)]` attribute,
so that multi-field variants can still count their children based on one
field.
@hawkw hawkw force-pushed the eliza/ringbuf-count-children branch from e8bfa7d to fee69d8 Compare February 26, 2024 20:48
hawkw added a commit that referenced this pull request Feb 26, 2024
This commit updates the `gimlet-seq` task to use `#[count(children)]` to
count variants of some of its ringbuf events.
hawkw added a commit that referenced this pull request Feb 26, 2024
hawkw added a commit that referenced this pull request Feb 26, 2024
As described in #1616, it would be nice to have event counters for all
IPC messages received by the `gimlet-seq` and `control-plane-agent`
tasks. This branch adds a new `IpcRequest` variant to the `Trace` enum
in `gimlet-seq`, and uses it to generate counters of all IPC requests.
IPCs are now also included in the trace ringbuffer.
hawkw added a commit that referenced this pull request Feb 26, 2024
Currently, the `#[derive(ringbuf::Count)]` attribute is pretty simple:
it generates a single counter for every variant of the annotated `enum`
type. For some of the ringbuf events we would like to generate counters
for, this is sufficient. However, there are a few things that would be
nice to have that we can't currently do with the derive attribute:

- Most ringbuf entry enum types have an "Empty"/"None" variant that's
  used for initializing the ringbuffer, but are never actually recorded
  at runtime. We could save a few bytes by not generating a counter for
  these variants.
- Many ringbuf entry enum variants contain a nested enum variant. such
  as a variant representing an error which contains an error kind enum,
  or, the motivating example, `Log::MgsMessage(MgsMessage)` in
  control-plane-agent. In this case, we would like to be able to
  generate a counter of each MGS message variant, which is not currently
  possible with the current `#[derive(ringbuf::Count)]` attribute.

This commit adds new helper attributes to the derive macro:
`#[count(skip)]` and `#[count(children)]`. The `#[count(skip)]`
attribute can be placed on an enum variant to indicate that a counter
*shouldn't* be generated for it. The `#[count(children)]` attribute can
be placed on an enum variant that has a single field, to indicate that a
_nested_ set of counters should be generated for that variant's inner
field. When the `#[count(children)]` attribute is used, the inner field
must also implement the `Count` trait. If it does, the field on the
counter type for that variant will be the child type's `Count::Counters`
type, rather than an `AtomicU32`, and we will increment the nested
counter. My Humility PR #449 adds nice support for interpreting and
displaying these nested counters.
hawkw added a commit that referenced this pull request Feb 26, 2024
Issue #1616 describes the need to add event counters to the
`control-plane-agent` task. In particular, it's desirable to have
counters of each IPC event and MGS message received by the
`control-plane-agent` task. This commit uses the ringbuf counters added
in #1621 and the `#[count(children)]` attribute added in this branch to
generate counters from `control-plane-agent`'s MGS message events.
hawkw added a commit that referenced this pull request Feb 26, 2024
Previously, the `#[count(children)]` attribute for
`#[derive(ringbuf::Count)]` only worked on single-variant enums. This is
unfortunate, since most of the enums we would like to count have
multiple variants. This commit changes the derive macro to annotate
_fields_ rather than _variants_ with the `#[count(children)]` attribute,
so that multi-field variants can still count their children based on one
field.
@hawkw hawkw force-pushed the eliza/ringbuf-count-children branch from fee69d8 to ff321b9 Compare February 26, 2024 20:56
hawkw added a commit that referenced this pull request Feb 26, 2024
This commit updates the `gimlet-seq` task to use `#[count(children)]` to
count variants of some of its ringbuf events.
hawkw added a commit that referenced this pull request Feb 26, 2024
This commit cleans up some stuff in the `#[derive(Count)]` proc-macro
internals, hopefully making it a little easier to parse.
hawkw added a commit that referenced this pull request Feb 26, 2024
hawkw added a commit that referenced this pull request Feb 26, 2024
As described in #1616, it would be nice to have event counters for all
IPC messages received by the `gimlet-seq` and `control-plane-agent`
tasks. This branch adds a new `IpcRequest` variant to the `Trace` enum
in `gimlet-seq`, and uses it to generate counters of all IPC requests.
IPCs are now also included in the trace ringbuffer.
@hawkw hawkw changed the base branch from eliza/ringbuf-counts to master February 26, 2024 20:57
@hawkw hawkw changed the title [WIP] more ringbuf counter features add #[count(children)] + #[count(skip)] attrs, and add counters to c-p-a and gimlet-seq Feb 26, 2024
@hawkw hawkw marked this pull request as ready for review February 26, 2024 21:35
@hawkw
Copy link
Member Author

hawkw commented Feb 26, 2024

Hmm, it looks like adding IPC event entries to the gimlet-seq ringbuffer makes it way less useful, e.g.:

 NDX LINE      GEN    COUNT PAYLOAD
  64  956        2        1 IpcRequest(GetState)
  65  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  66  956        2        1 IpcRequest(GetState)
  67  497        2       16 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  68  956        2        1 IpcRequest(GetState)
  69  497        2       83 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  70  956        2        1 IpcRequest(GetState)
  71  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  72  956        2        1 IpcRequest(GetState)
  73  497        2       25 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  74  956        2        1 IpcRequest(GetState)
  75  497        2       74 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  76  956        2        1 IpcRequest(GetState)
  77  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  78  956        2        1 IpcRequest(GetState)
  79  497        2       35 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  80  956        2        1 IpcRequest(GetState)
  81  497        2       64 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  82  956        2        1 IpcRequest(GetState)
  83  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  84  956        2        1 IpcRequest(GetState)
  85  497        2       44 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  86  956        2        1 IpcRequest(GetState)
  87  497        2       55 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  88  956        2        1 IpcRequest(GetState)
  89  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  90  956        2        1 IpcRequest(GetState)
  91  497        2       53 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  92  956        2        1 IpcRequest(GetState)
  93  497        2       46 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  94  956        2        1 IpcRequest(GetState)
  95  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  96  956        2        1 IpcRequest(GetState)
  97  497        2       62 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  98  956        2        1 IpcRequest(GetState)
  99  497        2       37 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 100  956        2        1 IpcRequest(GetState)
 101  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 102  956        2        1 IpcRequest(GetState)
 103  497        2       71 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 104  956        2        1 IpcRequest(GetState)
 105  497        2       28 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 106  956        2        1 IpcRequest(GetState)
 107  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 108  956        2        1 IpcRequest(GetState)
 109  497        2       81 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 110  956        2        1 IpcRequest(GetState)
 111  497        2       18 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 112  956        2        1 IpcRequest(GetState)
 113  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 114  956        2        1 IpcRequest(GetState)
 115  497        2       90 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 116  956        2        1 IpcRequest(GetState)
 117  497        2        9 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 118  956        2        1 IpcRequest(GetState)
 119  497        2        3 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 120  956        2        1 IpcRequest(GetState)
 121  497        2       97 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 122  956        2        1 IpcRequest(GetState)
 123  497        2        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 124  956        2        1 IpcRequest(GetState)
 125  497        2       19 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
 126  956        2        1 IpcRequest(GetState)
 127  497        2       80 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
   0  956        3        1 IpcRequest(GetState)
   1  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
   2  956        3        1 IpcRequest(GetState)
   3  497        3       29 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
   4  956        3        1 IpcRequest(GetState)
   5  497        3       70 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
   6  956        3        1 IpcRequest(GetState)
   7  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
   8  956        3        1 IpcRequest(GetState)
   9  497        3       38 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  10  956        3        1 IpcRequest(GetState)
  11  497        3       61 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  12  956        3        1 IpcRequest(GetState)
  13  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  14  956        3        1 IpcRequest(GetState)
  15  497        3       47 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  16  956        3        1 IpcRequest(GetState)
  17  497        3       52 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  18  956        3        1 IpcRequest(GetState)
  19  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  20  956        3        1 IpcRequest(GetState)
  21  497        3       56 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  22  956        3        1 IpcRequest(GetState)
  23  497        3       43 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  24  956        3        1 IpcRequest(GetState)
  25  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  26  956        3        1 IpcRequest(GetState)
  27  497        3       65 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  28  956        3        1 IpcRequest(GetState)
  29  497        3       34 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  30  956        3        1 IpcRequest(GetState)
  31  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  32  956        3        1 IpcRequest(GetState)
  33  497        3       75 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  34  956        3        1 IpcRequest(GetState)
  35  497        3       24 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  36  956        3        1 IpcRequest(GetState)
  37  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  38  956        3        1 IpcRequest(GetState)
  39  497        3       84 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  40  956        3        1 IpcRequest(GetState)
  41  497        3       15 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  42  956        3        1 IpcRequest(GetState)
  43  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  44  956        3        1 IpcRequest(GetState)
  45  497        3       93 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  46  956        3        1 IpcRequest(GetState)
  47  497        3        6 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  48  956        3        1 IpcRequest(GetState)
  49  497        3        3 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  50  956        3        1 IpcRequest(GetState)
  51  497        3       97 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  52  956        3        1 IpcRequest(GetState)
  53  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  54  956        3        1 IpcRequest(GetState)
  55  497        3        5 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  56  956        3        1 IpcRequest(GetState)
  57  497        3       94 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  58  956        3        1 IpcRequest(GetState)
  59  497        3        1 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  60  956        3        1 IpcRequest(GetState)
  61  497        3       32 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }
  62  956        3        1 IpcRequest(GetState)
  63  497        3       16 Status { ier: 0x0, ifr: 0x0, amd_status: 0x4, amd_a0: 0x4 }

If we didn't add the IpcRequest(GetState) entries, all the Status entries would be combined, and we would have a lot more interesting history still in the buffer. I'm going to move the IPC requests out of the main ringbuf, and put them in a smaller separate one, just for generating counters.

hawkw added a commit that referenced this pull request Feb 26, 2024
hawkw added a commit that referenced this pull request Feb 27, 2024
hawkw added a commit that referenced this pull request Feb 27, 2024
This commit moves the `Count` trait and `#[derive(Count)]` macro out of
`ringbuf` into a new `counters` crate, and adds a mechanism for
declaring a static set of counters _without_ an associated ring buffer.
This can be used in cases where we _just_ want to count stuff, like
error enums that don't have any additional data, or a total count of IPC
requests. I think @cbiffle wanted this for some stuff, so he'll be happy
to see this.

I've also changed the existing use of a separate ringbuf that's
fundamentally just for counters in `gimlet_seq` into an explicit
just-counters thingy, making it a bit smaller. I'll update Humility to
add a `counters` subcommand for reading counters regardless of whether
they're associated with a ringbuf, as well.
@hawkw hawkw force-pushed the eliza/ringbuf-count-children branch from 20481a1 to 12a882a Compare February 27, 2024 01:30
hawkw added a commit that referenced this pull request Feb 27, 2024
This commit moves the `Count` trait and `#[derive(Count)]` macro out of
`ringbuf` into a new `counters` crate, and adds a mechanism for
declaring a static set of counters _without_ an associated ring buffer.
This can be used in cases where we _just_ want to count stuff, like
error enums that don't have any additional data, or a total count of IPC
requests. I think @cbiffle wanted this for some stuff, so he'll be happy
to see this.

I've also changed the existing use of a separate ringbuf that's
fundamentally just for counters in `gimlet_seq` into an explicit
just-counters thingy, making it a bit smaller. I'll update Humility to
add a `counters` subcommand for reading counters regardless of whether
they're associated with a ringbuf, as well.
@hawkw hawkw force-pushed the eliza/ringbuf-count-children branch from 12a882a to dd276e0 Compare February 27, 2024 01:31
@hawkw hawkw enabled auto-merge (rebase) February 27, 2024 23:49
hawkw and others added 13 commits February 27, 2024 15:50
Currently, the `#[derive(ringbuf::Count)]` attribute is pretty simple:
it generates a single counter for every variant of the annotated `enum`
type. For some of the ringbuf events we would like to generate counters
for, this is sufficient. However, there are a few things that would be
nice to have that we can't currently do with the derive attribute:

- Most ringbuf entry enum types have an "Empty"/"None" variant that's
  used for initializing the ringbuffer, but are never actually recorded
  at runtime. We could save a few bytes by not generating a counter for
  these variants.
- Many ringbuf entry enum variants contain a nested enum variant. such
  as a variant representing an error which contains an error kind enum,
  or, the motivating example, `Log::MgsMessage(MgsMessage)` in
  control-plane-agent. In this case, we would like to be able to
  generate a counter of each MGS message variant, which is not currently
  possible with the current `#[derive(ringbuf::Count)]` attribute.

This commit adds new helper attributes to the derive macro:
`#[count(skip)]` and `#[count(children)]`. The `#[count(skip)]`
attribute can be placed on an enum variant to indicate that a counter
*shouldn't* be generated for it. The `#[count(children)]` attribute can
be placed on an enum variant that has a single field, to indicate that a
_nested_ set of counters should be generated for that variant's inner
field. When the `#[count(children)]` attribute is used, the inner field
must also implement the `Count` trait. If it does, the field on the
counter type for that variant will be the child type's `Count::Counters`
type, rather than an `AtomicU32`, and we will increment the nested
counter. My Humility PR #449 adds nice support for interpreting and
displaying these nested counters.
Issue #1616 describes the need to add event counters to the
`control-plane-agent` task. In particular, it's desirable to have
counters of each IPC event and MGS message received by the
`control-plane-agent` task. This commit uses the ringbuf counters added
in #1621 and the `#[count(children)]` attribute added in this branch to
generate counters from `control-plane-agent`'s MGS message events.
Previously, the `#[count(children)]` attribute for
`#[derive(ringbuf::Count)]` only worked on single-variant enums. This is
unfortunate, since most of the enums we would like to count have
multiple variants. This commit changes the derive macro to annotate
_fields_ rather than _variants_ with the `#[count(children)]` attribute,
so that multi-field variants can still count their children based on one
field.
This commit updates the `gimlet-seq` task to use `#[count(children)]` to
count variants of some of its ringbuf events.
This commit cleans up some stuff in the `#[derive(Count)]` proc-macro
internals, hopefully making it a little easier to parse.
As described in #1616, it would be nice to have event counters for all
IPC messages received by the `gimlet-seq` and `control-plane-agent`
tasks. This branch adds a new `IpcRequest` variant to the `Trace` enum
in `gimlet-seq`, and uses it to generate counters of all IPC requests.
IPCs are now also included in the trace ringbuffer.
This commit moves the `Count` trait and `#[derive(Count)]` macro out of
`ringbuf` into a new `counters` crate, and adds a mechanism for
declaring a static set of counters _without_ an associated ring buffer.
This can be used in cases where we _just_ want to count stuff, like
error enums that don't have any additional data, or a total count of IPC
requests. I think @cbiffle wanted this for some stuff, so he'll be happy
to see this.

I've also changed the existing use of a separate ringbuf that's
fundamentally just for counters in `gimlet_seq` into an explicit
just-counters thingy, making it a bit smaller. I'll update Humility to
add a `counters` subcommand for reading counters regardless of whether
they're associated with a ringbuf, as well.
Co-authored-by: Matt Keeter <matt@oxide.computer>
I'm not sure whether this is *really* pulling its weight, but it seemed
a little nice to have, mainly for cases where the counters static has
the default name.
The `gimlet-inspector` task currently uses a simple hand-written event
counter implementation. I've rewritten it to use `counters`, so that its
counters will show up in `humility counters`.
This commit adds `#[count(children)]` to the `Trace::I2cFault` variant
in `gimlet-seq`.
@hawkw hawkw force-pushed the eliza/ringbuf-count-children branch from 6bb78e6 to 85beaaa Compare February 27, 2024 23:50
@hawkw hawkw merged commit 0c29f28 into master Feb 28, 2024
83 checks passed
hawkw added a commit that referenced this pull request Feb 28, 2024
Currently, the `#[derive(ringbuf::Count)]` attribute is pretty simple:
it generates a single counter for every variant of the annotated `enum`
type. For some of the ringbuf events we would like to generate counters
for, this is sufficient. However, there are a few things that would be
nice to have that we can't currently do with the derive attribute:

- Most ringbuf entry enum types have an "Empty"/"None" variant that's
  used for initializing the ringbuffer, but are never actually recorded
  at runtime. We could save a few bytes by not generating a counter for
  these variants.
- Many ringbuf entry enum variants contain a nested enum variant. such
  as a variant representing an error which contains an error kind enum,
  or, the motivating example, `Log::MgsMessage(MgsMessage)` in
  control-plane-agent. In this case, we would like to be able to
  generate a counter of each MGS message variant, which is not currently
  possible with the current `#[derive(ringbuf::Count)]` attribute.

This commit adds new helper attributes to the derive macro:
`#[count(skip)]` and `#[count(children)]`. The `#[count(skip)]`
attribute can be placed on an enum variant to indicate that a counter
*shouldn't* be generated for it. The `#[count(children)]` attribute can
be placed on an enum variant that has a single field, to indicate that a
_nested_ set of counters should be generated for that variant's inner
field. When the `#[count(children)]` attribute is used, the inner field
must also implement the `Count` trait. If it does, the field on the
counter type for that variant will be the child type's `Count::Counters`
type, rather than an `AtomicU32`, and we will increment the nested
counter. My Humility PR #449 adds nice support for interpreting and
displaying these nested counters.
hawkw added a commit that referenced this pull request Feb 28, 2024
Issue #1616 describes the need to add event counters to the
`control-plane-agent` task. In particular, it's desirable to have
counters of each IPC event and MGS message received by the
`control-plane-agent` task. This commit uses the ringbuf counters added
in #1621 and the `#[count(children)]` attribute added in this branch to
generate counters from `control-plane-agent`'s MGS message events.
hawkw added a commit that referenced this pull request Feb 28, 2024
Previously, the `#[count(children)]` attribute for
`#[derive(ringbuf::Count)]` only worked on single-variant enums. This is
unfortunate, since most of the enums we would like to count have
multiple variants. This commit changes the derive macro to annotate
_fields_ rather than _variants_ with the `#[count(children)]` attribute,
so that multi-field variants can still count their children based on one
field.
hawkw added a commit that referenced this pull request Feb 28, 2024
This commit updates the `gimlet-seq` task to use `#[count(children)]` to
count variants of some of its ringbuf events.
hawkw added a commit that referenced this pull request Feb 28, 2024
This commit cleans up some stuff in the `#[derive(Count)]` proc-macro
internals, hopefully making it a little easier to parse.
hawkw added a commit that referenced this pull request Feb 28, 2024
hawkw added a commit that referenced this pull request Feb 28, 2024
As described in #1616, it would be nice to have event counters for all
IPC messages received by the `gimlet-seq` and `control-plane-agent`
tasks. This branch adds a new `IpcRequest` variant to the `Trace` enum
in `gimlet-seq`, and uses it to generate counters of all IPC requests.
IPCs are now also included in the trace ringbuffer.
hawkw added a commit that referenced this pull request Feb 28, 2024
hawkw added a commit that referenced this pull request Feb 28, 2024
This commit moves the `Count` trait and `#[derive(Count)]` macro out of
`ringbuf` into a new `counters` crate, and adds a mechanism for
declaring a static set of counters _without_ an associated ring buffer.
This can be used in cases where we _just_ want to count stuff, like
error enums that don't have any additional data, or a total count of IPC
requests. I think @cbiffle wanted this for some stuff, so he'll be happy
to see this.

I've also changed the existing use of a separate ringbuf that's
fundamentally just for counters in `gimlet_seq` into an explicit
just-counters thingy, making it a bit smaller. I'll update Humility to
add a `counters` subcommand for reading counters regardless of whether
they're associated with a ringbuf, as well.
hawkw added a commit that referenced this pull request Feb 28, 2024
Co-authored-by: Matt Keeter <matt@oxide.computer>
hawkw added a commit that referenced this pull request Feb 28, 2024
I'm not sure whether this is *really* pulling its weight, but it seemed
a little nice to have, mainly for cases where the counters static has
the default name.
hawkw added a commit that referenced this pull request Feb 28, 2024
The `gimlet-inspector` task currently uses a simple hand-written event
counter implementation. I've rewritten it to use `counters`, so that its
counters will show up in `humility counters`.
hawkw added a commit to oxidecomputer/humility that referenced this pull request Feb 29, 2024
Hubris PR oxidecomputer/hubris#1621 added the ability for ring buffers
whose entry type is an `enum` to generate a set of counters that track
the total number of times each entry enum variant has been added to the
ring buffer. PR oxidecomputer/hubris#1630 generalizes this to also allow
counters to be generated for any `enum` without requiring a
corresponding ring buffer.

These counters are designed to be consumed by Humility. This PR updates
Humility to understand these counters. In particular, it:

- Changes the `humility ringbuf` command to understand `CountedRingbuf`s
  as well as un-counted `Ringbuf`s
- When displaying a `CountedRingbuf`, the `humility ringbuf` command
  will now display the total counts for that ringbuf's entry variants
  (unless explicitly asked not to)
- Adds a new `humility counters` subcommand for displaying all counters,
  whether or not they are attached to a ringbuf.

The `humility counters` command behaves similarly to `humility ringbuf`,
and allows filtering which counters are displayed based on the name of
the variable and task, and can also be used to list all counters in the
attached Hubris instance or dump. Both commands support a CLI flag to
display counters with zero values.

## Examples

<details open>
<summary>Listing all counters in a dump:</summary>

```console
$ humility -d hubris.c12.bufcounts5.core counters --list
humility: attached to dump
control_plane_agent:
    ADDR        SIZE VARIABLE
    0x2402b938   664 task_control_plane_agent::CRITICAL
    0x24029000  1272 task_control_plane_agent::__RINGBUF
gimlet_inspector:
    ADDR        SIZE VARIABLE
    0x2404ce40    12 task_gimlet_inspector::__COUNTERS
gimlet_seq:
    ADDR        SIZE VARIABLE
    0x2403f424    48 drv_gimlet_seq_server::IPC_REQUESTS
    0x2403e640  3448 drv_gimlet_seq_server::__RINGBUF
thermal:
    ADDR        SIZE VARIABLE
    0x240039e0   912 task_thermal::__RINGBUF
```

</details>
<details>
<summary>Displaying all counters in a dump:</summary>

```console
$ humility -d hubris.c12.bufcounts5.core counters
humility: attached to dump
control_plane_agent
 |
 +---> task_control_plane_agent::CRITICAL:
 |     <no counts recorded>
 +---> task_control_plane_agent::__RINGBUF:
       <no counts recorded>
gimlet_inspector
 |
 +---> task_gimlet_inspector::__COUNTERS:
       <no counts recorded>
gimlet_seq
 |
 +---> drv_gimlet_seq_server::IPC_REQUESTS:
 |       411 GetState
 |         1 SetState(A0)
 +---> drv_gimlet_seq_server::__RINGBUF:
        2068 Status
         446 ClockConfigWrite
         295 A0Status
         102 A1Status
          15 A0Power
           2 V3P3SysA0VOut
           1 SMStatus
           1 SetState
           1 RailsOff
           1 Ident
           1 A2Status
           1 A2
           1 SpdDimmsFound
           1 Ice40PowerGoodV1P2
           1 IdentValid
           1 CPUPresent
           1 Coretype
           1 Programmed
           1 Ice40Rails
           1 InterruptFlags
           1 RailsOn
           1 UartEnabled
           1 Ice40PowerGoodV3P3
           1 UpdateState(A0)
           1 A0
           1 Reprogram
           1 ClockConfigSuccess
           1 ChecksumValid
           1 PGStatus
           1 Programming
           1 PowerControl
thermal
 |
 +---> task_thermal::__RINGBUF:
         138 ControlPwm
           6 FanAdded
           3 AutoState(Boot)
           2 AutoState(Running)
           2 PowerModeChanged
           1 Start
           1 ThermalMode(Auto)
```

</details>
<details open>
<summary>Filtering by task:</summary>

```console
$ humility -d hubris.c12.bufcounts5.core counters thermal
humility: attached to dump
thermal
 |
 +---> task_thermal::__RINGBUF:
         138 ControlPwm
           6 FanAdded
           3 AutoState(Boot)
           2 AutoState(Running)
           2 PowerModeChanged
           1 Start
           1 ThermalMode(Auto)
```

</details>
<details>
<summary open>Sorting counters by their value:</summary>

```console
$ humility -d hubris.c12.bufcounts5.core counters thermal --sort value
humility: attached to dump
thermal
 |
 +---> task_thermal::__RINGBUF:
         138 ControlPwm
           6 FanAdded
           3 AutoState(Boot)
           2 AutoState(Running)
           2 PowerModeChanged
           1 Start
           1 ThermalMode(Auto)
```

</details>
<details open>
<summary>Sorting counters alphabetically:</summary>

```console
$ humility -d hubris.c12.bufcounts5.core counters thermal --sort alpha
humility: attached to dump
thermal
 |
 +---> task_thermal::__RINGBUF:
           3 AutoState(Boot)
           2 AutoState(Running)
         138 ControlPwm
           6 FanAdded
           2 PowerModeChanged
           1 Start
           1 ThermalMode(Auto)
```

</details>
<details>
<summary>Displaying all counters, including those with zero values:</summary>

```console
$ humility -d hubris.c12.bufcounts5.core counters gimlet_seq --full
humility: attached to dump
gimlet_seq
 |
 +---> drv_gimlet_seq_server::IPC_REQUESTS:
 |       411 GetState
 |         1 SetState(_)
 |         0 |     SetState(A2)
 |         0 |     SetState(A2PlusFans)
 |         0 |     SetState(A1)
 |         1 +---> SetState(A0)
 |         0 |     SetState(A0PlusHP)
 |         0 |     SetState(A0Thermtrip)
 |         0 |     SetState(A0Reset)
 |         0 FansOn
 |         0 FansOff
 |         0 SendHardwareNmi
 |         0 ReadFpgaRegs
 +---> drv_gimlet_seq_server::__RINGBUF:
           1 Ice40Rails
           1 IdentValid
           1 ChecksumValid
           1 Reprogram
           1 Programmed
           1 Programming
           1 Ice40PowerGoodV1P2
           1 Ice40PowerGoodV3P3
           1 RailsOff
           1 Ident
           1 A2Status
           1 A2
           0 A0FailureDetails
           0 A0Failed(_)
           0 |     A0Failed(IllegalTransition)
           0 |     A0Failed(MuxToHostCPUFailed)
           0 |     A0Failed(MuxToSPFailed)
           0 |     A0Failed(ReadRegsFailed)
           0 |     A0Failed(CPUNotPresent)
           0 |     A0Failed(UnrecognizedCPU)
           0 |     A0Failed(A1Timeout)
           0 |     A0Failed(A0TimeoutGroupC)
           0 |     A0Failed(A0Timeout)
           0 |     A0Failed(I2cFault)
           0 |     A0Failed(ServerRestarted)
         102 A1Status
           1 CPUPresent
           1 Coretype
         295 A0Status
          15 A0Power
           0 NICPowerEnableLow
           1 RailsOn
           1 UartEnabled
           1 A0
           1 SetState
           1 UpdateState(_)
           0 |     UpdateState(A2)
           0 |     UpdateState(A2PlusFans)
           0 |     UpdateState(A1)
           1 +---> UpdateState(A0)
           0 |     UpdateState(A0PlusHP)
           0 |     UpdateState(A0Thermtrip)
           0 |     UpdateState(A0Reset)
         446 ClockConfigWrite
           1 ClockConfigSuccess
        2068 Status
           1 PGStatus
           1 SMStatus
           0 ResetCounts
           1 PowerControl
           1 InterruptFlags
           2 V3P3SysA0VOut
           0 SpdBankAbsent
           0 SpdAbsent
           1 SpdDimmsFound
           0 I2cFault(_)
           0 |     I2cFault(BadResponse)
           0 |     I2cFault(BadArg)
           0 |     I2cFault(NoDevice)
           0 |     I2cFault(BadController)
           0 |     I2cFault(ReservedAddress)
           0 |     I2cFault(BadPort)
           0 |     I2cFault(NoRegister)
           0 |     I2cFault(BadMux)
           0 |     I2cFault(BadSegment)
           0 |     I2cFault(MuxNotFound)
           0 |     I2cFault(SegmentNotFound)
           0 |     I2cFault(SegmentDisconnected)
           0 |     I2cFault(MuxDisconnected)
           0 |     I2cFault(MuxMissing)
           0 |     I2cFault(BadMuxRegister)
           0 |     I2cFault(BusReset)
           0 |     I2cFault(BusResetMux)
           0 |     I2cFault(BusLocked)
           0 |     I2cFault(BusLockedMux)
           0 |     I2cFault(ControllerBusy)
           0 |     I2cFault(BusError)
           0 |     I2cFault(BadDeviceState)
           0 |     I2cFault(OperationNotSupported)
           0 |     I2cFault(IllegalLeaseCount)
           0 |     I2cFault(TooMuchData)
           0 StartFailed(_)
           0 |     StartFailed(IllegalTransition)
           0 |     StartFailed(MuxToHostCPUFailed)
           0 |     StartFailed(MuxToSPFailed)
           0 |     StartFailed(ReadRegsFailed)
           0 |     StartFailed(CPUNotPresent)
           0 |     StartFailed(UnrecognizedCPU)
           0 |     StartFailed(A1Timeout)
           0 |     StartFailed(A0TimeoutGroupC)
           0 |     StartFailed(A0Timeout)
           0 |     StartFailed(I2cFault)
           0 |     StartFailed(ServerRestarted)
```

</details>
<details open>
<summary>Displaying variant total counts when running `humility ringbuf`:</summary>

```console
$ humility -d hubris.c12.bufcounts5.core ringbuf thermal
humility: attached to dump
humility: ring buffer drv_i2c_devices::max31790::__RINGBUF in thermal:
humility: ring buffer task_thermal::__RINGBUF in thermal:
   TOTAL VARIANT
     138 ControlPwm
       6 FanAdded
       3 AutoState(Boot)
       2 AutoState(Running)
       2 PowerModeChanged
       1 Start
       1 ThermalMode(Auto)
 NDX LINE      GEN    COUNT PAYLOAD
   0  346        1        1 Start
   1  133        1        1 ThermalMode(Auto)
   2  625        1        1 AutoState(Boot)
   3  634        1        1 FanAdded(Fan(0x0))
   4  634        1        1 FanAdded(Fan(0x1))
   5  634        1        1 FanAdded(Fan(0x2))
   6  634        1        1 FanAdded(Fan(0x3))
   7  634        1        1 FanAdded(Fan(0x4))
   8  634        1        1 FanAdded(Fan(0x5))
   9  778        1        1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x1)))
  10  625        1        1 AutoState(Boot)
  11  876        1        1 AutoState(Running)
  12  990        1      117 ControlPwm(0x0)
  13  778        1        1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x2)))
  14  625        1        1 AutoState(Boot)
  15  876        1        1 AutoState(Running)
  16  990        1       21 ControlPwm(0x0)
```

</details>
<details>
<summary>Displaying totals with zero values when running `humility ringbuf`:</summary>

```console
$ humility -d hubris.c12.bufcounts5.core ringbuf thermal --full-totals
humility: attached to dump
humility: ring buffer drv_i2c_devices::max31790::__RINGBUF in thermal:
humility: ring buffer task_thermal::__RINGBUF in thermal:
   TOTAL VARIANT
       1 Start
       1 ThermalMode(_)
       0 |     ThermalMode(Off)
       0 |     ThermalMode(Manual)
       1 +---> ThermalMode(Auto)
       5 AutoState(_)
       3 +---> AutoState(Boot)
       2 +---> AutoState(Running)
       0 |     AutoState(Overheated)
       0 |     AutoState(Uncontrollable)
       0 FanReadFailed
       0 MiscReadFailed
       0 SensorReadFailed
       0 PostFailed
     138 ControlPwm
       2 PowerModeChanged
       0 PowerDownFailed
       0 ControlError(_)
       0 |     ControlError(InvalidFan)
       0 |     ControlError(InvalidPWM)
       0 |     ControlError(DeviceError)
       0 |     ControlError(NotInManualMode)
       0 |     ControlError(NotInAutoMode)
       0 |     ControlError(AlreadyInAutoMode)
       0 |     ControlError(InvalidWatchdogTime)
       0 |     ControlError(InvalidParameter)
       0 |     ControlError(InvalidIndex)
       0 |     ControlError(ServerDeath)
       0 FanPresenceUpdateFailed
       6 FanAdded
       0 FanRemoved
       0 PowerDownAt
       0 AddedDynamicInput
       0 RemovedDynamicInput
 NDX LINE      GEN    COUNT PAYLOAD
   0  346        1        1 Start
   1  133        1        1 ThermalMode(Auto)
   2  625        1        1 AutoState(Boot)
   3  634        1        1 FanAdded(Fan(0x0))
   4  634        1        1 FanAdded(Fan(0x1))
   5  634        1        1 FanAdded(Fan(0x2))
   6  634        1        1 FanAdded(Fan(0x3))
   7  634        1        1 FanAdded(Fan(0x4))
   8  634        1        1 FanAdded(Fan(0x5))
   9  778        1        1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x1)))
  10  625        1        1 AutoState(Boot)
  11  876        1        1 AutoState(Running)
  12  990        1      117 ControlPwm(0x0)
  13  778        1        1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x2)))
  14  625        1        1 AutoState(Boot)
  15  876        1        1 AutoState(Running)
  16  990        1       21 ControlPwm(0x0)
```

</details>
<details>
<summary>Opting out of totals when displaying a counted ringbuf:</summary>

```console
$ humility -d hubris.c12.bufcounts5.core ringbuf thermal --no-totals
humility: attached to dump
humility: ring buffer drv_i2c_devices::max31790::__RINGBUF in thermal:
humility: ring buffer task_thermal::__RINGBUF in thermal:
 NDX LINE      GEN    COUNT PAYLOAD
   0  346        1        1 Start
   1  133        1        1 ThermalMode(Auto)
   2  625        1        1 AutoState(Boot)
   3  634        1        1 FanAdded(Fan(0x0))
   4  634        1        1 FanAdded(Fan(0x1))
   5  634        1        1 FanAdded(Fan(0x2))
   6  634        1        1 FanAdded(Fan(0x3))
   7  634        1        1 FanAdded(Fan(0x4))
   8  634        1        1 FanAdded(Fan(0x5))
   9  778        1        1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x1)))
  10  625        1        1 AutoState(Boot)
  11  876        1        1 AutoState(Running)
  12  990        1      117 ControlPwm(0x0)
  13  778        1        1 PowerModeChanged(PowerBitmask(InternalBitFlags(0x2)))
  14  625        1        1 AutoState(Boot)
  15  876        1        1 AutoState(Running)
  16  990        1       21 ControlPwm(0x0)
```

</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

control-plane-agent and kin should add event counters
3 participants