feat(observability): plugin/ML dashboard row + standalone speech-gateway dashboard#547
feat(observability): plugin/ML dashboard row + standalone speech-gateway dashboard#547staging-devin-ai-integration[bot] wants to merge 3 commits into
Conversation
Add Grafana rows for plugin/ML-inference, the speech-gateway metric contract, and a per-service oneshot split, plus documentation for the previously-undocumented plugin metrics and a guide for monitoring the hosted speech services. Signed-off-by: streamkit-devin <devin@streamkit.dev>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
📝 Info: Dashboard JSON remains syntactically valid after the large insertion
The dashboard file is a large generated-style JSON artifact, so I validated it mechanically with python3 -m json.tool samples/grafana-dashboard.json; it parsed successfully. I therefore did not flag formatting or syntax issues in the Grafana dashboard changes.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
| "uid": "${DS_PROMETHEUS}" | ||
| }, | ||
| "editorMode": "code", | ||
| "expr": "histogram_quantile(0.95, sum(rate(oneshot_pipeline_duration_bucket{service=~\"tts|stt|other\"}[5m])) by (le, service))", |
There was a problem hiding this comment.
🟡 Oneshot panels filter on a service label that is never emitted
The new Oneshot Speech Services panels require oneshot_pipeline_duration_* series to have service=tts|stt|other, but the actual oneshot instrumentation only records a status attribute when the response stream completes (apps/skit/src/server/oneshot.rs:450-456) and on error paths (apps/skit/src/server/oneshot.rs:654-667). As a result, the latency query here matches no series at all, and the error-rate panel below cannot split TTS vs STT as advertised; the new dashboard row will be empty/misleading until the server emits a service label or the panels are changed back to status-only metrics.
Prompt for agents
The Oneshot Speech Services dashboard row in samples/grafana-dashboard.json and the corresponding observability docs assume oneshot_pipeline_duration has a service label with values tts/stt/other. The server currently records only a status label in apps/skit/src/server/oneshot.rs. Either add a real service attribute to all oneshot_pipeline.duration histogram recordings, with a well-defined derivation for tts/stt/other, or update the new dashboard/docs to use the existing status-only metric contract so the panels display actual data.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
| "uid": "${DS_PROMETHEUS}" | ||
| }, | ||
| "editorMode": "code", | ||
| "expr": "sum(rate(gateway_requests_total[1m])) by (endpoint)", |
There was a problem hiding this comment.
🟡 Gateway panels query metrics that the example gateway never exposes
The new Speech Gateway dashboard row queries gateway_requests_total and related Prometheus metrics, and the docs say the gateway exposes a /metrics endpoint, but the checked-in gateway only registers /stt and /tts handlers (examples/speech-gateway/cmd/gateway/main.go:128-130) and has no Prometheus metrics imports or handler in its imports/setup (examples/speech-gateway/cmd/gateway/main.go:8-25). Importing this dashboard for the provided gateway therefore leaves the entire row empty even when the gateway is running and scraped as instructed.
Prompt for agents
The Speech Gateway row in samples/grafana-dashboard.json and docs/src/content/docs/guides/observability.md assumes examples/speech-gateway exports Prometheus metrics such as gateway_requests_total, gateway_request_duration_seconds, gateway_inflight_requests, gateway_upstream_duration_seconds, and gateway_rejected_total from /metrics. The current Go gateway does not register /metrics or any Prometheus instrumentation. Either implement the advertised gateway metrics in examples/speech-gateway/cmd/gateway/main.go or remove/adjust the dashboard row and documentation so they do not promise panels that can never receive data.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
| "expr": "histogram_quantile(0.50, sum(rate(plugin_call_duration_seconds_bucket[5m])) by (le, plugin_kind))", | ||
| "legendFormat": "p50 {{plugin_kind}}", | ||
| "range": true, | ||
| "refId": "A" | ||
| }, | ||
| { | ||
| "datasource": { | ||
| "type": "prometheus", | ||
| "uid": "${DS_PROMETHEUS}" | ||
| }, | ||
| "editorMode": "code", | ||
| "expr": "histogram_quantile(0.95, sum(rate(plugin_call_duration_seconds_bucket[5m])) by (le, plugin_kind))", | ||
| "legendFormat": "p95 {{plugin_kind}}", | ||
| "range": true, | ||
| "refId": "B" | ||
| }, | ||
| { | ||
| "datasource": { | ||
| "type": "prometheus", | ||
| "uid": "${DS_PROMETHEUS}" | ||
| }, | ||
| "editorMode": "code", | ||
| "expr": "histogram_quantile(0.99, sum(rate(plugin_call_duration_seconds_bucket[5m])) by (le, plugin_kind))", |
There was a problem hiding this comment.
📝 Info: Plugin metric queries match the existing OTel instrument contract
I checked the new Plugins / ML inference panels against the native plugin instrumentation. The Rust code records plugin.call.duration with unit s, counters named plugin.calls, plugin.errors, plugin.panics, and plugin.timeouts, and labels plugin.kind plus op (crates/plugin-native/src/metrics.rs:39-61, crates/plugin-native/src/metrics.rs:100-102). That matches the dashboard’s Prometheus names like plugin_call_duration_seconds_bucket, plugin_calls_total, and label plugin_kind after the OTLP→Prometheus rewrite described in the same docs, so I did not flag these plugin panels as a bug.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
| "uid": "${DS_PROMETHEUS}" | ||
| }, | ||
| "editorMode": "code", | ||
| "expr": "sum(rate(gateway_requests_total[1m])) by (code) / ignoring(code) group_left sum(rate(gateway_requests_total[1m]))", |
There was a problem hiding this comment.
📝 Info: Gateway status mix aggregates across endpoints by design
The status-mix query divides sum(rate(gateway_requests_total[1m])) by (code) by the global request rate, so it intentionally shows each HTTP code’s share of all gateway traffic rather than per-endpoint status percentages. Given the panel title is just “Gateway Status Mix” and the description says “Share of responses by status code,” this aggregation is consistent and was not treated as a bug.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
Per review: keep only the core Plugins / ML inference row in the official samples dashboard, and move the demo-service panels (Speech Gateway + per-service oneshot split) into a self-contained dashboard under examples/speech-gateway/. Revert the official observability docs; document the gateway metrics + dashboard in the service's own README instead. Signed-off-by: streamkit-devin <devin@streamkit.dev>
Summary
Part of the parallel speech-services observability effort. Scoped so the demo service stays decoupled from the official StreamKit dashboard/docs:
samples/grafana-dashboard.json(official) — adds only the Plugins / ML inference row, since it's built purely on core metrics (plugin_call_duration_seconds,plugin_calls_total,plugins_loaded, …) and is valuable on its own. Mirrors the existing styling /${DS_PROMETHEUS}templating / collapsible-row pattern; the only existing-panel change is a+17yshift of the 4 collapsed Advanced rows.examples/speech-gateway/grafana-dashboard.json(new, self-contained) — the demo-service dashboard: Speech Gateway row (frozengateway_*contract from the sibling Go PR), Oneshot Speech Services row (per-service split ofoneshot_pipeline_durationvia theservicelabel from the sibling Rust PR), plus a duplicated Plugins / ML inference row so it stands alone.examples/speech-gateway/README.md— documents the gateway/metricscontract and how to import the dashboard. Officialdocs/.../observability.mdis intentionally left unchanged.Metric-name note:
plugin.call.duration(OTel units) is queried asplugin_call_duration_seconds_bucket— existing panels confirm the exporter appends the unit (process_memory_usage→_bytes,process_cpu_utilization→_percent); labelplugin.kind→plugin_kind.Review & Validation
jqpasses on both dashboards; official diff is just the Plugins row + the collapsed-rowyshift (collapsed Advanced rows remain intact).reuse lintpasses (.jsoncovered byREUSE.toml).gateway_*/service-label names match the sibling PRs.Notes
The Speech Gateway and Oneshot Speech Services panels (in the standalone dashboard) describe metrics emitted by sibling PRs that may not be merged yet — they'll show no data until those land. Built against the frozen contract so the dashboard is ready immediately.
Link to Devin session: https://staging.itsdev.in/sessions/1c90abbd437a4c3383e43eff412f0c2e
Requested by: @streamer45
Devin Review
f9b96a9(HEAD isebbf40f)