Handle Debug service in agent connection metrics middleware#6878
Conversation
| case middleware.HealthServiceName, middleware.ServerReflectionServiceName, middleware.ServerReflectionV1AlphaServiceName: | ||
| // Intentionally not emitting metrics for health and reflection services | ||
| case middleware.DebugServiceName, middleware.HealthServiceName, middleware.ServerReflectionServiceName, middleware.ServerReflectionV1AlphaServiceName: | ||
| // Intentionally not emitting metrics for debug, health, and reflection services |
There was a problem hiding this comment.
We don't emit metrics about the health and reflection services because they're gRPC services and possibly not that interesting. It might still be nice to have some metrics for it. Would it be possible to also emit the 2 metrics we emit for the other services? (connection counter and connection gauge).
Also happy to take this as is, since it fixes a real problem.
There was a problem hiding this comment.
Thanks for the suggestion @sorindumitru ! That makes sense, I have updated the PR to emit connection counter and gauge metrics for the Debug service, following the same pattern as DelegatedIdentity. Added IncrDebugAPIConnectionCounter and SetDebugAPIConnectionGauge helper functions in pkg/common/telemetry/agent/adminapi/debugapi.go, and the regression test now asserts the metrics are actually emitted.
sorindumitru
left a comment
There was a problem hiding this comment.
This is looking good, thanks @rausingh-rh. Could you also add the new metrics to https://github.com/spiffe/spire/blob/main/doc/telemetry/telemetry.md ?
|
Done! Added the |
sorindumitru
left a comment
There was a problem hiding this comment.
LGTM, thanks @rausingh-rh !
The agent's connectionMetrics middleware did not recognize the Debug service, causing a misconfiguration error to be logged every minute: "unrecognized service for connection metrics: spire.agent.debug.v1.Debug" Add DebugServiceName constant and short name mapping, handle it in the Preprocess/Postprocess switch alongside Health and Reflection, and add test coverage for all agent services. Fixes spiffe#5183 Signed-off-by: Raushan Singh <rausingh@redhat.com>
…r the Debug service Signed-off-by: Raushan Singh <rausingh@redhat.com>
Signed-off-by: Raushan Singh <rausingh@redhat.com>
743319c to
51cd97c
Compare
|
Hi @sorindumitru the PR was failing because of the milestone check. I have rebased the branch with latest main. |
Pull Request check list
Affected functionality
Agent metrics middleware (
pkg/agent/endpoints/metrics.go), service name constants (pkg/common/api/middleware/names.go), and Debug API telemetry helpers (pkg/common/telemetry/agent/adminapi/debugapi.go).Description of change
The agent's
connectionMetricsmiddleware did not have a case for the Debug service (spire.agent.debug.v1.Debug) in itsPreprocess/Postprocessswitch statements. This caused a misconfiguration error to be logged every time the Debug API was called:Since the Debug API is typically polled on a schedule (e.g., by monitoring tools), this produced recurring error noise every minute.
Changes:
DebugServiceNameandDebugServiceShortNameconstants innames.go, with aserviceReplacermapping so logs and metrics use the clean short name"Debug"instead of the full proto pathIncrDebugAPIConnectionCounterandSetDebugAPIConnectionGaugetelemetry helpers indebugapi.go, following the same pattern as the DelegatedIdentity helpersDebugServiceNamein thePreprocess/Postprocessswitch, emitting connection counter and gauge metrics (consistent with other agent services like DelegatedIdentity)TestDebugServiceConnectionMetrics) that verifies no misconfiguration error is logged and connection metrics are emittedTestAllAgentServicesHandledByConnectionMetrics) that exercises every agent gRPC service through the middleware to catch similar issues in the futureWhich issue this PR fixes
Fixes #5183