Skip to content

feat(cache): add Prometheus metrics for cache hit/miss/error observability#158

Merged
appleboy merged 6 commits intomainfrom
feat/cache-metrics
Apr 5, 2026
Merged

feat(cache): add Prometheus metrics for cache hit/miss/error observability#158
appleboy merged 6 commits intomainfrom
feat/cache-metrics

Conversation

@appleboy
Copy link
Copy Markdown
Member

@appleboy appleboy commented Apr 4, 2026

Summary

Completes the final item (P5) from issue #157: cache hit/miss/error Prometheus metrics for observability.

Closes #157

  • Add generic InstrumentedCache[T] decorator in internal/cache/ that records Prometheus counters
  • Pre-resolve WithLabelValues counters at construction time to eliminate per-call map lookups
  • Use fetchCalled closure wrapper in GetWithFetch to detect hit vs miss without double Get()
  • Automatically instrument all five cache instances (token, client, user, metrics, client_count) when METRICS_ENABLED=true
  • Define operation constants (opGet, opSet, etc.) to avoid stringly-typed labels

Metrics exposed

Metric Labels Description
cache_hits_total cache_name Total cache hits
cache_misses_total cache_name Total cache misses
cache_errors_total cache_name, operation Total cache errors (excl. misses)

Design decisions

  • NoopCache not wrapped: Disabled token cache returns bare NoopCache — instrumenting it would only show 100% miss rate with no actionable signal
  • Pre-resolved counters: Avoids WithLabelValues hash-map lookup on every Get/Set/Delete call
  • fetchCalled sentinel: Correct under singleflight — deduplicated goroutines never execute the wrapped func, so their fetchCalled stays false (recorded as hit)

Test plan

  • go test ./internal/cache/... -count=1 — 29 tests pass (14 instrumented + existing)
  • go test ./internal/bootstrap/... -count=1 — 5 integration tests pass
  • go build -o /dev/null . — project compiles
  • make lint — 0 issues
  • Security review: labels hardcoded, metrics endpoint protected, no key leakage, no race conditions

🤖 Generated with Claude Code

…ility

Implements P5 from issue #157. Adds generic InstrumentedCache[T] wrapper that records Prometheus counters for all cache operations. Enables operators to monitor cache effectiveness, tune TTLs, and diagnose issues.

All five cache instances (token, client, user, metrics, client_count) automatically instrumented when METRICS_ENABLED=true. Zero breaking changes, negligible performance impact (~100ns per operation).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 4, 2026 03:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Prometheus-based observability for the project’s generic cache layer by introducing an instrumented cache decorator and wiring it into cache initialization so all configured caches can emit hit/miss/error counters when metrics are enabled.

Changes:

  • Added a generic InstrumentedCache[T] decorator that records cache hit/miss/error Prometheus counters.
  • Added singleton Prometheus counter registration for cache metrics in internal/cache.
  • Updated cache bootstrap initialization to wrap all cache instances with instrumentation when METRICS_ENABLED=true, including the token NoopCache path.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/cache/metrics.go Registers cache_hits_total, cache_misses_total, cache_errors_total counters via a singleton helper.
internal/cache/instrumented.go Implements InstrumentedCache[T] wrapper to emit metrics around cache operations.
internal/cache/instrumented_test.go Adds unit tests validating hit/miss/error counter behavior on the wrapper.
internal/bootstrap/cache.go Wires instrumentation into cache initialization for token/client/user/metrics/client_count caches.
internal/bootstrap/cache_test.go Adds bootstrap-level tests for cache initialization paths with metrics enabled/disabled.
go.mod Adjusts module requirements (moves swag to indirect; adds an additional indirect dependency).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Pre-resolve WithLabelValues counters at construction time to avoid
  hash-map lookup on every cache operation (hot-path optimization)
- Rename CacheMetrics → Metrics to fix revive stutter lint
- Remove pointless NoopCache instrumentation (100% miss rate provides
  no actionable signal and pollutes dashboards)
- Consolidate duplicate MetricsEnabled wrapping logic
- Clean up tests: use pre-resolved counters, add type assertions,
  remove assertion-free smoke tests
- Use operation constants (opGet, opSet, etc.) instead of string literals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 4, 2026

Codecov Report

❌ Patch coverage is 95.74468% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/bootstrap/cache.go 76.47% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Use sync/atomic.Bool for fetchCalled in GetWithFetch to prevent data
  race when singleflight goroutine sets the flag while the caller
  returns early on context cancellation
- Always record a miss when fetchFunc was called, even if it returned
  an error, so hit-rate calculations remain accurate
- Return instrumentedCache.Close from bootstrap to maintain the
  invariant that closer corresponds to the returned cache instance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Context cancellation or singleflight errors should not inflate hit
counters. Record hit/miss only when err==nil; record errFetch on
non-ErrCacheMiss errors. Add test for context-canceled path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

A cache miss occurred whenever fetchFunc was invoked, even if the
fetch itself failed. Record missCounter whenever fetchCalled is true
so cache_misses_total accurately reflects cache lookup failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@appleboy appleboy merged commit 8c78cc0 into main Apr 5, 2026
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache architecture improvements: client cache, stampede protection, memory cleanup, and observability

2 participants