Skip to content

feat: embedding metrics, Grafana dashboard overhaul & monitoring stack#137

Merged
XuPeng-SH merged 2 commits intomainfrom
feature/embedding-metrics-dashboard
Mar 25, 2026
Merged

feat: embedding metrics, Grafana dashboard overhaul & monitoring stack#137
XuPeng-SH merged 2 commits intomainfrom
feature/embedding-metrics-dashboard

Conversation

@gouhongshen
Copy link
Copy Markdown
Contributor

Summary

Adds embedding latency/error observability, completely rebuilds the Grafana dashboard, and provides an opt-in monitoring stack via docker-compose.

Metrics Code

  • InstrumentedEmbedder: transparent wrapper around any EmbeddingProvider that records per-provider latency histograms and error counters
  • Metrics module refactor: split monolithic metrics into sub-modules (http, embedding, worker, security, render, types, middleware)
  • Source detection middleware: classifies traffic as api/mcp/sdk/internal via header inspection

Grafana Dashboard

  • Complete rebuild: 5 categorized rows (Overview, HTTP Traffic, HTTP Latency, Embedding, Infrastructure) with 37 panels
  • Split p50/p95/p99 into separate panels for both HTTP and Embedding latency
  • English ⓘ descriptions on all panels with route reference legends
  • New panels: request rate by route, source/status pie charts, entity queue pressure, build info

Monitoring Stack (opt-in, zero impact on normal workflow)

  • Prometheus + Grafana services under monitoring docker-compose profile
  • docker compose --profile monitoring up -d to enable
  • Entity worker env vars (ENTITY_WORKER_COUNT/POOL_SIZE/QUEUE_SIZE) exposed for tuning

Load Test Validation

  • Tested with 1000 users / 150 concurrent / 120s: 0 pool timeouts, 99.4% success rate
  • p50 latency improved ~50% vs pre-optimization baseline (upstream async edit-log + pool isolation)

Metrics:
- InstrumentedEmbedder wrapper: transparent proxy that records
  embedding latency (histogram) and error count (counter) per provider
- Refactored metrics module into sub-modules (http, embedding, worker,
  security, render, types, middleware) for maintainability
- Source detection middleware (api/mcp/sdk/internal) via header inspection

Grafana Dashboard:
- Complete rebuild: 5 categorized rows (Overview, HTTP Traffic,
  HTTP Latency, Embedding, Infrastructure) with 37 panels
- Split p50/p95/p99 into separate panels for HTTP and Embedding latency
- English descriptions on all panels with route reference legends
- Added pie charts, entity queue pressure, build info panels

Monitoring Stack (opt-in):
- Prometheus + Grafana services under 'monitoring' docker-compose profile
- Zero impact on normal 'docker compose up' workflow
- Entity worker env vars (ENTITY_WORKER_COUNT/POOL_SIZE/QUEUE_SIZE)
  exposed via docker-compose for tuning

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@gouhongshen gouhongshen force-pushed the feature/embedding-metrics-dashboard branch from 8f2e1b8 to 1d9e95d Compare March 25, 2026 11:21
@XuPeng-SH XuPeng-SH self-requested a review March 25, 2026 11:44
@XuPeng-SH XuPeng-SH enabled auto-merge (squash) March 25, 2026 11:45
@XuPeng-SH XuPeng-SH merged commit 597a431 into main Mar 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants