Skip to content

Add INFO COMMANDSTATS tracking with per-command success rates#1702

Merged
Mathos1432 merged 6 commits intomainfrom
users/matrembl/commandstats
Apr 22, 2026
Merged

Add INFO COMMANDSTATS tracking with per-command success rates#1702
Mathos1432 merged 6 commits intomainfrom
users/matrembl/commandstats

Conversation

@Mathos1432
Copy link
Copy Markdown
Contributor

Why is this change being made?

Garnet currently has no way to track per-command usage statistics. Redis provides INFO COMMANDSTATS which reports how many times each command was called, how many failed, and how many were rejected. This is critical for production observability — operators need to understand command usage patterns, identify failing commands, and monitor success rates.

What changed?

Adds Redis-compatible INFO COMMANDSTATS support, gated behind --commandstats-monitor (disabled by default, zero overhead when off). When enabled, tracks per-command calls, failed_calls, and rejected_calls.

Key design decisions:

  • Counter-only tracking — No latency (usec) on the hot path. Reports usec=0,usec_per_call=0.00 for Redis format compatibility. Latency can be added later via sampling.

  • Per-session single-writer — Each session owns its own CommandStats array indexed by RespCommand enum (O(1) access). No locking on the hot path; aggregation happens on-demand when INFO COMMANDSTATS is requested.

  • Error detection — Dual strategy: commandErrorWritten flag set by WriteError/AbortWithErrorMessage, plus RESP - prefix check guarded by a bufferFlushedDuringCommand flag.

  • Excluded from default INFO — Only appears when explicitly requested via INFO COMMANDSTATS.

  • Bug fix: GarnetServerMonitor.Dispose() deadlocked when Start() was never called (the ManualResetEvent was never signaled). This affected EmbeddedRespServer used by BDN benchmarks. Fixed by tracking a started flag.

How was this validated?

Unit tests (17 passing)

  • 8 new RespCommandStatsTests: DisabledByDefault, CallsTracking, FailedCalls, UsecFieldsZero, MultipleCommands, SuccessRate, InfoServerShowsMonitorStatus, FormatMatchesRedisConvention
  • 9 existing metrics/info tests verified unbroken

Performance benchmarks (BenchmarkDotNet)

  • ACL — Server started with an ACL file (authentication enabled)
  • AOF — Server started with Append-Only File (persistence enabled)
  • None — Bare server, no ACL or AOF

PING (100 commands/op, .NET 10, Server GC)

Params Disabled Enabled Overhead
ACL 1.687 μs 1.820 μs +7.9%
AOF 1.680 μs 1.794 μs +6.8%
None 1.690 μs 1.816 μs +7.5%

PING is ~17ns/command, so ~1.3ns tracking overhead is proportionally large. This is the worst case.

SET (100 commands/op, .NET 10, Server GC)

Params Disabled Enabled Overhead
ACL 13.10 μs 13.18 μs +0.6%
AOF 21.13 μs 20.62 μs noise
None 12.62 μs 13.03 μs +3.2%

GET (100 commands/op, .NET 10, Server GC)

Params Disabled Enabled Overhead
ACL 14.93 μs 14.65 μs noise
AOF 14.70 μs 14.68 μs noise
None 14.24 μs 14.82 μs +4.1%

Summary: For real-world commands (SET/GET), overhead is 0-4% and within measurement noise for most configurations. Disabled by default = zero overhead unless opted in.

@Mathos1432 Mathos1432 force-pushed the users/matrembl/commandstats branch 5 times, most recently from 0460835 to 60dc7f0 Compare April 15, 2026 13:35
Implements Redis-compatible INFO COMMANDSTATS support, gated behind
--commandstats-monitor config flag. Tracks per-command calls, failed_calls,
and rejected_calls using lightweight counter increments (no Stopwatch).

Key design decisions:
- Array-indexed by RespCommand enum for O(1) access
- Per-session counters (single-writer, no locking on hot path)
- On-demand aggregation from active sessions + disposed session history
- No latency tracking (usec=0) to avoid Stopwatch overhead (~4.4x perf hit)
- Monitor dispose handles case where Start() was never called (EmbeddedServer)

Benchmark results (100 PINGs, .NET 10):
  Disabled: 1.687 us | Enabled: 1.820 us | Overhead: ~7.8%
  SET/GET overhead: 0-4% (within noise for real workloads)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Mathos1432 Mathos1432 force-pushed the users/matrembl/commandstats branch from 60dc7f0 to f26aea3 Compare April 15, 2026 15:14
…latency references

When MetricsSamplingFrequency > 0, globalCommandStats already includes
history from disposed sessions. PopulateCommandStatsInfo was adding
historyCommandStats on top, double-counting disposed session stats.

Also removed stale 'latency' references from doc comments and help text
since per-command latency tracking was removed to avoid Stopwatch overhead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Mathos1432 Mathos1432 marked this pull request as ready for review April 15, 2026 19:57
Copilot AI review requested due to automatic review settings April 15, 2026 19:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/server/Resp/RespServerSession.cs Outdated
Comment thread libs/server/Metrics/Info/GarnetInfoMetrics.cs Outdated
Comment thread test/Garnet.test/RespCommandStatsTests.cs Outdated
Comment thread test/Garnet.test/RespCommandStatsTests.cs Outdated
Comment thread libs/server/StoreWrapper.cs Outdated
Comment thread libs/server/Resp/RespServerSession.cs Outdated
Comment thread libs/server/Resp/RespServerSession.cs Outdated
Comment thread libs/server/Resp/RespServerSession.cs Outdated
Comment thread libs/server/Metrics/GarnetServerMonitor.cs
Mathos1432 and others added 2 commits April 21, 2026 09:47
…mes, fix tests

- Remove bufferFlushedDuringCommand field and pointer-based error detection;
  rely solely on commandErrorWritten flag set by WriteError/AbortWithErrorMessage
- Use RespCommandsInfo.GetRespCommandName() for canonical Redis command names
  instead of enum ToString with string replacement
- Switch most tests to metricsSamplingFreq: 0 (on-demand aggregation) to remove
  Thread.Sleep delays; keep one test with periodic sampling for coverage
- Fix CommandStatsFailedCallsTest to use SETRANGE with invalid offset (goes
  through AbortWithErrorMessage) instead of WRONGTYPE which bypasses the flag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GarnetLatencyMetricsSession dereferences monitor.monitor_iterations, but
StoreWrapper only created the monitor when MetricsSamplingFrequency > 0 or
CommandStatsMonitor was enabled. With --latency-monitor alone and no
sampling frequency, this would cause a NullReferenceException at session
creation. Add LatencyMonitor to the condition so the monitor is always
created when latency tracking is enabled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Mathos1432 Mathos1432 merged commit 9270dad into main Apr 22, 2026
48 of 49 checks passed
@Mathos1432 Mathos1432 deleted the users/matrembl/commandstats branch April 22, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants