Skip to content

feat(observability): plugin/ML dashboard row + standalone speech-gateway dashboard#547

Open
staging-devin-ai-integration[bot] wants to merge 3 commits into
mainfrom
devin/1780157741-speech-observability
Open

feat(observability): plugin/ML dashboard row + standalone speech-gateway dashboard#547
staging-devin-ai-integration[bot] wants to merge 3 commits into
mainfrom
devin/1780157741-speech-observability

Conversation

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor

@staging-devin-ai-integration staging-devin-ai-integration Bot commented May 30, 2026

Summary

Part of the parallel speech-services observability effort. Scoped so the demo service stays decoupled from the official StreamKit dashboard/docs:

  • samples/grafana-dashboard.json (official) — adds only the Plugins / ML inference row, since it's built purely on core metrics (plugin_call_duration_seconds, plugin_calls_total, plugins_loaded, …) and is valuable on its own. Mirrors the existing styling / ${DS_PROMETHEUS} templating / collapsible-row pattern; the only existing-panel change is a +17 y shift of the 4 collapsed Advanced rows.
  • examples/speech-gateway/grafana-dashboard.json (new, self-contained) — the demo-service dashboard: Speech Gateway row (frozen gateway_* contract from the sibling Go PR), Oneshot Speech Services row (per-service split of oneshot_pipeline_duration via the service label from the sibling Rust PR), plus a duplicated Plugins / ML inference row so it stands alone.
  • examples/speech-gateway/README.md — documents the gateway /metrics contract and how to import the dashboard. Official docs/.../observability.md is intentionally left unchanged.

Metric-name note: plugin.call.duration (OTel unit s) is queried as plugin_call_duration_seconds_bucket — existing panels confirm the exporter appends the unit (process_memory_usage_bytes, process_cpu_utilization_percent); label plugin.kindplugin_kind.

Review & Validation

  • jq passes on both dashboards; official diff is just the Plugins row + the collapsed-row y shift (collapsed Advanced rows remain intact).
  • reuse lint passes (.json covered by REUSE.toml).
  • Confirm the gateway_* / service-label names match the sibling PRs.

Notes

The Speech Gateway and Oneshot Speech Services panels (in the standalone dashboard) describe metrics emitted by sibling PRs that may not be merged yet — they'll show no data until those land. Built against the frozen contract so the dashboard is ready immediately.

Link to Devin session: https://staging.itsdev.in/sessions/1c90abbd437a4c3383e43eff412f0c2e
Requested by: @streamer45


Devin Review

Status Commit
🕐 Outdated f9b96a9 (HEAD is ebbf40f)

Run Devin Review

Open in Devin Review (Staging)

Add Grafana rows for plugin/ML-inference, the speech-gateway metric contract, and a per-service oneshot split, plus documentation for the previously-undocumented plugin metrics and a guide for monitoring the hosted speech services.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 5 potential issues.

Open in Devin Review (Staging)
Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Dashboard JSON remains syntactically valid after the large insertion

The dashboard file is a large generated-style JSON artifact, so I validated it mechanically with python3 -m json.tool samples/grafana-dashboard.json; it parsed successfully. I therefore did not flag formatting or syntax issues in the Grafana dashboard changes.

Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment thread samples/grafana-dashboard.json Outdated
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "histogram_quantile(0.95, sum(rate(oneshot_pipeline_duration_bucket{service=~\"tts|stt|other\"}[5m])) by (le, service))",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Oneshot panels filter on a service label that is never emitted

The new Oneshot Speech Services panels require oneshot_pipeline_duration_* series to have service=tts|stt|other, but the actual oneshot instrumentation only records a status attribute when the response stream completes (apps/skit/src/server/oneshot.rs:450-456) and on error paths (apps/skit/src/server/oneshot.rs:654-667). As a result, the latency query here matches no series at all, and the error-rate panel below cannot split TTS vs STT as advertised; the new dashboard row will be empty/misleading until the server emits a service label or the panels are changed back to status-only metrics.

Prompt for agents
The Oneshot Speech Services dashboard row in samples/grafana-dashboard.json and the corresponding observability docs assume oneshot_pipeline_duration has a service label with values tts/stt/other. The server currently records only a status label in apps/skit/src/server/oneshot.rs. Either add a real service attribute to all oneshot_pipeline.duration histogram recordings, with a well-defined derivation for tts/stt/other, or update the new dashboard/docs to use the existing status-only metric contract so the panels display actual data.
Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment thread samples/grafana-dashboard.json Outdated
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "sum(rate(gateway_requests_total[1m])) by (endpoint)",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gateway panels query metrics that the example gateway never exposes

The new Speech Gateway dashboard row queries gateway_requests_total and related Prometheus metrics, and the docs say the gateway exposes a /metrics endpoint, but the checked-in gateway only registers /stt and /tts handlers (examples/speech-gateway/cmd/gateway/main.go:128-130) and has no Prometheus metrics imports or handler in its imports/setup (examples/speech-gateway/cmd/gateway/main.go:8-25). Importing this dashboard for the provided gateway therefore leaves the entire row empty even when the gateway is running and scraped as instructed.

Prompt for agents
The Speech Gateway row in samples/grafana-dashboard.json and docs/src/content/docs/guides/observability.md assumes examples/speech-gateway exports Prometheus metrics such as gateway_requests_total, gateway_request_duration_seconds, gateway_inflight_requests, gateway_upstream_duration_seconds, and gateway_rejected_total from /metrics. The current Go gateway does not register /metrics or any Prometheus instrumentation. Either implement the advertised gateway metrics in examples/speech-gateway/cmd/gateway/main.go or remove/adjust the dashboard row and documentation so they do not promise panels that can never receive data.
Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment on lines +1752 to +1774
"expr": "histogram_quantile(0.50, sum(rate(plugin_call_duration_seconds_bucket[5m])) by (le, plugin_kind))",
"legendFormat": "p50 {{plugin_kind}}",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "histogram_quantile(0.95, sum(rate(plugin_call_duration_seconds_bucket[5m])) by (le, plugin_kind))",
"legendFormat": "p95 {{plugin_kind}}",
"range": true,
"refId": "B"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "histogram_quantile(0.99, sum(rate(plugin_call_duration_seconds_bucket[5m])) by (le, plugin_kind))",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Plugin metric queries match the existing OTel instrument contract

I checked the new Plugins / ML inference panels against the native plugin instrumentation. The Rust code records plugin.call.duration with unit s, counters named plugin.calls, plugin.errors, plugin.panics, and plugin.timeouts, and labels plugin.kind plus op (crates/plugin-native/src/metrics.rs:39-61, crates/plugin-native/src/metrics.rs:100-102). That matches the dashboard’s Prometheus names like plugin_call_duration_seconds_bucket, plugin_calls_total, and label plugin_kind after the OTLP→Prometheus rewrite described in the same docs, so I did not flag these plugin panels as a bug.

Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment thread samples/grafana-dashboard.json Outdated
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "sum(rate(gateway_requests_total[1m])) by (code) / ignoring(code) group_left sum(rate(gateway_requests_total[1m]))",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Gateway status mix aggregates across endpoints by design

The status-mix query divides sum(rate(gateway_requests_total[1m])) by (code) by the global request rate, so it intentionally shows each HTTP code’s share of all gateway traffic rather than per-endpoint status percentages. Given the panel title is just “Gateway Status Mix” and the description says “Share of responses by status code,” this aggregation is consistent and was not treated as a bug.

Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Per review: keep only the core Plugins / ML inference row in the official samples dashboard, and move the demo-service panels (Speech Gateway + per-service oneshot split) into a self-contained dashboard under examples/speech-gateway/. Revert the official observability docs; document the gateway metrics + dashboard in the service's own README instead.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
@staging-devin-ai-integration staging-devin-ai-integration Bot changed the title feat(observability): dashboard rows + docs for speech-service metrics feat(observability): plugin/ML dashboard row + standalone speech-gateway dashboard May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants