Add always-on /metrics endpoint with dual pull/push telemetry#138
Conversation
✱ Stainless preview buildsThis PR will update the
|
262d071 to
0a08f81
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Automated Risk Assessment
Risk level: Medium-High
Code review is required for this PR.
Why this risk level (from code diff evidence):
- Large behavioral change set:
34files,1010insertions,225deletions. - Core runtime/server lifecycle changes in
cmd/api/main.goadd a second always-on listener and integrate its startup/shutdown into process control flow. - Telemetry stack redesign in
lib/otel/otel.gochanges provider initialization, exporter failure behavior, runtime instrumentation handling, and introduces Prometheus pull export wiring. - Shared config and DI surface changed (
cmd/api/config/config.go,lib/providers/providers.go) with new validation/defaults and wiring paths. - Metric model/labels changed in shared VM metrics subsystem (
lib/vm_metrics/*,lib/middleware/otel.go) affecting operational observability semantics. - New skill prompt/instruction files under
skills/are included; prompt/instruction changes are treated as elevated-risk by policy.
Decision:
- Not approved (only Very Low / Low are eligible for auto-approval).
- Requested reviewers:
@hiroTamada,@rgarcia.
hiroTamada
left a comment
There was a problem hiding this comment.
solid design -- always-on prometheus pull with optional OTLP push is the right architecture. cardinality guardrails, denormalized metric removal, log noise reduction, and test consolidation are all good improvements. no blocking concerns.


Summary
/metricsendpoint on a dedicated metrics listener (default127.0.0.1:9464)otel.enabled=trueConfig additions
metrics.listen_address(default127.0.0.1)metrics.port(default9464)otel.metric_export_interval(default60s)Env mapping (via
__nesting):METRICS__LISTEN_ADDRESSMETRICS__PORTOTEL__METRIC_EXPORT_INTERVALBehavior details
/metricsis served outside API auth/OpenAPI middleware on a separate listenerTests
/metricsis available with push disabled and with push enabled but bad endpointNote
Medium Risk
Touches telemetry initialization and server startup/shutdown by adding a dedicated
/metricslistener and changing OTel init behavior, which could impact availability and observability if misconfigured. Also changes exported VM metrics (removing a series and adding guardrails), which may affect dashboards/alerts.Overview
Adds an always-on Prometheus pull metrics endpoint (
/metrics) served from a dedicated HTTP listener (default127.0.0.1:9464), while keeping OTLP push export optional whenotel.enabled=trueand configurable viaotel.metric_export_interval.Introduces
metrics.listen_address,metrics.port, andmetrics.vm_label_budgetconfig/env keys with validation, wires the budget intovm_metrics, and adds guardrail metrics for per-VM label cardinality (while removing the denormalizedhypeman_vm_memory_utilization_ratioseries and updating the Grafana dashboard query accordingly).Improves observability hygiene: adds OTEL tracing spans for WebSocket
exec/cpsessions with constrained attributes, ensures HTTP metrics use a sentinel path label for unmatched routes, and reduces INFO-level log noise in several background/ingress paths.Written by Cursor Bugbot for commit 327d96d. This will update automatically on new commits. Configure here.