feat(metrics): config-driven bounded request labels (adds service to oneshot_pipeline.duration)#545
Conversation
Source a bounded `service` label ({tts,stt,other}) from the trusted X-StreamKit-Service request header so TTS, STT, and other oneshot requests are distinguishable without leaking arbitrary user-submitted pipeline names into metric cardinality.
Signed-off-by: streamkit-devin <devin@streamkit.dev>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| } | ||
| self.recorded = true; | ||
| let labels = [KeyValue::new("status", status)]; | ||
| let labels = [KeyValue::new("status", status), KeyValue::new("service", self.service)]; |
There was a problem hiding this comment.
📝 Info: Existing aggregate PromQL should continue to work with the added label
The change adds a second label to oneshot_pipeline.duration observations at all metric recording sites in this handler. I did not flag this as a bug because the existing dashboard queries aggregate away all non-grouped labels: the error-rate query uses sum(rate(oneshot_pipeline_duration_count{status="error"}[5m])) and the latency query uses sum(rate(oneshot_pipeline_duration_bucket{status="ok"}[5m])) by (le) in samples/grafana-dashboard.json:164 and samples/grafana-dashboard.json:279, so adding service will increase series cardinality but should not change those aggregate panels' values.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
| fn classify_service(header: Option<&str>) -> &'static str { | ||
| match header.map(|h| h.trim().to_ascii_lowercase()).as_deref() { | ||
| Some("tts") => "tts", | ||
| Some("stt") => "stt", | ||
| _ => "other", | ||
| } |
There was a problem hiding this comment.
📝 Info: Service header is intentionally low-cardinality and advisory
The new header classifier lowercases and trims the value, allows only tts and stt, and maps missing, invalid UTF-8, empty, or unknown values to other. That means clients can choose the service bucket for observability, but cannot create unbounded metric cardinality. I did not treat the unauthenticated/advisory nature of X-StreamKit-Service as a security bug because the label is only used for metrics and is not fed into authorization or pipeline behavior.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #545 +/- ##
==========================================
+ Coverage 79.96% 80.00% +0.03%
==========================================
Files 234 235 +1
Lines 68061 68234 +173
Branches 1846 1933 +87
==========================================
+ Hits 54428 54591 +163
- Misses 13627 13637 +10
Partials 6 6
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Replace the hardcoded {tts,stt,other} allowlist with a general, operator-configurable [server.metrics.request_labels] facility: each label is sourced from a trusted request header, bounded by a configured allowlist, with a fallback (default "other"). A reusable resolver applies it to both oneshot_pipeline.duration and the shared http.server.* request metrics, so new dimensions need only a config edit, not a recompile. Defaults preserve the {tts,stt,other} service label.
Signed-off-by: streamkit-devin <devin@streamkit.dev>
…esolution Address review on request-metric labels: - Reject configured label names that collide with built-in metric keys (status, http.method, http.route, http.status_code) or duplicate each other, at config load (duplicate Prometheus label keys break scrape). - Pre-normalize allowlist entries once at config load so the per-request hot path only normalizes the incoming header. - Resolve labels once in metrics_middleware and stash them in request extensions; the oneshot handler reuses them (falls back to resolving when the layer is absent, e.g. unit tests). - Record oneshot_pipeline.duration on the routing-task-aborted error arm for consistent error-path coverage. Signed-off-by: streamkit-devin <devin@streamkit.dev>
Summary
skit.tomledit, not a recompile.[server.metrics]as a list ofrequest_labels, each{ name, header, allowed[], fallback }. The header value is trimmed + lowercased and matched againstallowed; anything outside it — or a missing header — collapses tofallback(default"other"). This bounds cardinality: clients pick a bucket, they can't create new ones.apps/skit/src/metrics_labels.rs) is applied at two consumers to prove it's general: the sharedhttp.server.*request middleware, andoneshot_pipeline.duration(all record sites — streaming ok / incomplete-drop, and all three early error paths). The middleware resolves once and stashes the result in request extensions; the oneshot handler reuses it (no double parsing).servicelabel fromX-StreamKit-Servicewith allowlist{tts, stt}→oneshot_pipeline.durationgainsservice ∈ {tts, stt, other}out of the box. The gateway should set this header per endpoint (sibling effort).status,http.method,http.route,http.status_code) or duplicate each other are rejected (a duplicate label key makes Prometheus drop the series). Allowlists are normalized once at load so the per-request path only normalizes the incoming header.Example to add a new dimension without touching Rust:
Review & Validation
allowed, elsefallback; emptyallowed⇒ alwaysfallback).MetricsConfig::validate).serviceappears at everyoneshot_pipeline.durationrecord site and onhttp.server.*.[server.metrics]block) still yieldsservice ∈ {tts,stt,other}— seesample_config_test.just lint-skitandcargo test -p streamkit-server(metrics_labels + oneshot + config) pass.Notes
servicedimension tohttp.server.*for everyone on upgrade, not just oneshot. Aggregate PromQL is unaffected, but it is a schema change. Discussing with the requester whether to keep default-on or make it opt-in (empty default).fallback.apps/skit/only. Dashboard, docs, and the gateway header-setting are owned by sibling sessions;samples/skit.tomlunchanged since defaults carry the behavior.Link to Devin session: https://staging.itsdev.in/sessions/b5750eb705c84d388f3af4f5b6d6940b
Requested by: @streamer45
Devin Review
3e84014(HEAD is010ca67)