Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata based monitoring for streams and consumers #1062

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

ripienaar
Copy link
Collaborator

Demonstrating a model that could be used to enable self-service monitoring for client assets

@ripienaar
Copy link
Collaborator Author

ripienaar commented May 27, 2024

Given a stream with metadata:

Metadata:

        io.nats.monitor.lag-critical: 100
         io.nats.monitor.max-sources: 34
         io.nats.monitor.min-sources: 33
       io.nats.monitor.msgs-critical: 3000
           io.nats.monitor.msgs-warn: 4000
         io.nats.monitor.peer-expect: 1
   io.nats.monitor.peer-lag-critical: 100
  io.nats.monitor.peer-seen-critical: 5m
   io.nats.monitor.subjects-critical: 30
       io.nats.monitor.subjects-warn: 33

The nats server check command with no arguments set will health check it accoeding to these metadata values. metadata keys correspond with cli flags:

$ nats server check stream --stream LON --format text
LON: OK

Status Detail

╭────────┬────────────────────╮
│ Status │ Message            │
├────────┼────────────────────┤
│ OK     │ 1 current replicas │
│ OK     │ 34 sources         │
╰────────┴────────────────────╯

Check Metrics

╭──────────────────┬───────┬──────┬────────────────────┬───────────────────┬──────────────────────────────────────────────────────────────────────╮
│ Metric           │ Value │ Unit │ Critical Threshold │ Warning Threshold │ Description                                                          │
├──────────────────┼───────┼──────┼────────────────────┼───────────────────┼──────────────────────────────────────────────────────────────────────┤
│ peers            │ 1     │      │ 1                  │ 1                 │ Configured RAFT peers                                                │
│ peer_offline     │ 0     │      │ 0                  │ 0                 │ Offline RAFT peers                                                   │
│ peer_not_current │ 0     │      │ 0                  │ 0                 │ RAFT peers that are not current                                      │
│ peer_inactive    │ 0     │      │ 0                  │ 0                 │ Inactive RAFT peers                                                  │
│ peer_lagged      │ 0     │      │ 0                  │ 0                 │ RAFT peers that are lagged more than configured threshold            │
│ messages         │ 4,580 │      │ 3,000              │ 4,000             │ Messages stored in the stream                                        │
│ subjects         │ 34    │      │ 30                 │ 33                │ Number of subjects stored in the stream                              │
│ sources          │ 34    │      │ 34                 │ 33                │ Number of sources being consumed by this stream                      │
│ sources_lagged   │ 0     │      │ 0                  │ 0                 │ Number of sources that are behind more than the configured threshold │
│ sources_inactive │ 0     │      │ 0                  │ 0                 │ Number of sources that are inactive                                  │
╰──────────────────┴───────┴──────┴────────────────────┴───────────────────┴──────────────────────────────────────────────────────────────────────╯

Note check metrics have the given thresholds.

Do this on bulk via sys req jsz for example and full self service monitoring can be enabled.

/cc @bruth @wallyqs

@ripienaar ripienaar force-pushed the poc_metadata_monitoring branch 2 times, most recently from dd0bb59 to 8d6af80 Compare June 5, 2024 07:43
Introduce a model where monitoring thresholds can be captured
in Stream and Consumer metadata allowing for bulk self-service
monitoring.

Signed-off-by: R.I.Pienaar <rip@devco.net>
@ripienaar ripienaar changed the title poc metadata based monitoring for streams and consumers Metadata based monitoring for streams and consumers Jun 5, 2024
@ripienaar ripienaar marked this pull request as ready for review June 5, 2024 07:45
@ripienaar ripienaar merged commit 6417932 into nats-io:main Jun 5, 2024
2 checks passed
@ripienaar ripienaar deleted the poc_metadata_monitoring branch June 5, 2024 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant