Skip to content

Handle Debug service in agent connection metrics middleware#6878

Merged
sorindumitru merged 3 commits into
spiffe:mainfrom
rausingh-rh:fix-debug-api-metrics-misconfiguration
Apr 23, 2026
Merged

Handle Debug service in agent connection metrics middleware#6878
sorindumitru merged 3 commits into
spiffe:mainfrom
rausingh-rh:fix-debug-api-metrics-misconfiguration

Conversation

@rausingh-rh
Copy link
Copy Markdown
Contributor

@rausingh-rh rausingh-rh commented Apr 15, 2026

Pull Request check list

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Affected functionality

Agent metrics middleware (pkg/agent/endpoints/metrics.go), service name constants (pkg/common/api/middleware/names.go), and Debug API telemetry helpers (pkg/common/telemetry/agent/adminapi/debugapi.go).

Description of change

The agent's connectionMetrics middleware did not have a case for the Debug service (spire.agent.debug.v1.Debug) in its Preprocess/Postprocess switch statements. This caused a misconfiguration error to be logged every time the Debug API was called:

unrecognized service for connection metrics: spire.agent.debug.v1.Debug

Since the Debug API is typically polled on a schedule (e.g., by monitoring tools), this produced recurring error noise every minute.

Changes:

  • Add DebugServiceName and DebugServiceShortName constants in names.go, with a serviceReplacer mapping so logs and metrics use the clean short name "Debug" instead of the full proto path
  • Add IncrDebugAPIConnectionCounter and SetDebugAPIConnectionGauge telemetry helpers in debugapi.go, following the same pattern as the DelegatedIdentity helpers
  • Handle DebugServiceName in the Preprocess/Postprocess switch, emitting connection counter and gauge metrics (consistent with other agent services like DelegatedIdentity)
  • Add a focused regression test (TestDebugServiceConnectionMetrics) that verifies no misconfiguration error is logged and connection metrics are emitted
  • Add a guard-rail test (TestAllAgentServicesHandledByConnectionMetrics) that exercises every agent gRPC service through the middleware to catch similar issues in the future

Which issue this PR fixes

Fixes #5183

Comment thread pkg/agent/endpoints/metrics.go Outdated
case middleware.HealthServiceName, middleware.ServerReflectionServiceName, middleware.ServerReflectionV1AlphaServiceName:
// Intentionally not emitting metrics for health and reflection services
case middleware.DebugServiceName, middleware.HealthServiceName, middleware.ServerReflectionServiceName, middleware.ServerReflectionV1AlphaServiceName:
// Intentionally not emitting metrics for debug, health, and reflection services
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't emit metrics about the health and reflection services because they're gRPC services and possibly not that interesting. It might still be nice to have some metrics for it. Would it be possible to also emit the 2 metrics we emit for the other services? (connection counter and connection gauge).

Also happy to take this as is, since it fixes a real problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @sorindumitru ! That makes sense, I have updated the PR to emit connection counter and gauge metrics for the Debug service, following the same pattern as DelegatedIdentity. Added IncrDebugAPIConnectionCounter and SetDebugAPIConnectionGauge helper functions in pkg/common/telemetry/agent/adminapi/debugapi.go, and the regression test now asserts the metrics are actually emitted.

Copy link
Copy Markdown
Member

@sorindumitru sorindumitru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, thanks @rausingh-rh. Could you also add the new metrics to https://github.com/spiffe/spire/blob/main/doc/telemetry/telemetry.md ?

@rausingh-rh rausingh-rh reopened this Apr 23, 2026
@rausingh-rh
Copy link
Copy Markdown
Contributor Author

Done! Added the debug_api connection and connections metrics to doc/telemetry/telemetry.md.

Copy link
Copy Markdown
Member

@sorindumitru sorindumitru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @rausingh-rh !

@sorindumitru sorindumitru enabled auto-merge April 23, 2026 09:02
@sorindumitru sorindumitru added this pull request to the merge queue Apr 23, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 23, 2026
The agent's connectionMetrics middleware did not recognize the Debug
service, causing a misconfiguration error to be logged every minute:
"unrecognized service for connection metrics: spire.agent.debug.v1.Debug"

Add DebugServiceName constant and short name mapping, handle it in the
Preprocess/Postprocess switch alongside Health and Reflection, and add
test coverage for all agent services.

Fixes spiffe#5183

Signed-off-by: Raushan Singh <rausingh@redhat.com>
…r the Debug service

Signed-off-by: Raushan Singh <rausingh@redhat.com>
Signed-off-by: Raushan Singh <rausingh@redhat.com>
@rausingh-rh rausingh-rh force-pushed the fix-debug-api-metrics-misconfiguration branch from 743319c to 51cd97c Compare April 23, 2026 11:42
@rausingh-rh
Copy link
Copy Markdown
Contributor Author

Hi @sorindumitru the PR was failing because of the milestone check. I have rebased the branch with latest main.

@sorindumitru sorindumitru modified the milestone: 1.15.0 Apr 23, 2026
@sorindumitru sorindumitru enabled auto-merge April 23, 2026 11:58
@sorindumitru sorindumitru added this pull request to the merge queue Apr 23, 2026
Merged via the queue into spiffe:main with commit 81cb7d9 Apr 23, 2026
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

debug API logs metrics misconfiguration errors

2 participants