Feature: Add additional Prometheus metrics and Grafana dashboard (#207)#496
Merged
filthyrake merged 2 commits intodevfrom Jan 3, 2026
Merged
Feature: Add additional Prometheus metrics and Grafana dashboard (#207)#496filthyrake merged 2 commits intodevfrom
filthyrake merged 2 commits intodevfrom
Conversation
Add 5 new Prometheus metrics for enhanced observability: - HTTP_REQUESTS_IN_PROGRESS gauge (low-cardinality by API name) - VIDEOS_WATCH_TIME_SECONDS_TOTAL counter - WORKER_JOBS_COMPLETED_TOTAL counter (by worker_name) - WORKER_HEARTBEAT_AGE_SECONDS gauge (by worker_name) - STORAGE_VIDEOS_BYTES gauge with periodic reconciliation Implementation highlights: - Pure ASGI HTTPMetricsMiddleware for 6x better performance - Endpoint path normalization to prevent cardinality explosion - Background task updates heartbeat ages every 30s (no DB query on /metrics) - Storage reconciliation scans filesystem every 6 hours - Instrument existing but unused HTTP and transcoding metrics Also includes: - Grafana dashboard JSON with panels for API, transcoding, workers, storage - Tests for all new metrics and middleware 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Critical fixes: - Add `api` label to HTTP_REQUESTS_TOTAL for Grafana dashboard compatibility - Fix storage reconciliation with timeout, symlink protection, partial failure handling - Add database retry logic (fetch_all_with_retry) to background task - Fix storage metric for overwritten segments (track net change) High priority fixes: - Add LRU cache to normalize_endpoint() for 95%+ allocation reduction - Replace _metrics.clear() with selective label removal to avoid race conditions - Add worker name label sanitization to prevent label injection - Add background task health metrics (errors, last_success, duration) Medium priority improvements: - Improve normalize_endpoint with UUID and slug pattern detection - Make reconciliation interval configurable via VLOG_STORAGE_RECONCILIATION_INTERVAL - Add VLOG_STORAGE_SCAN_TIMEOUT and VLOG_STORAGE_SCAN_MAX_FILES configs - Add comprehensive tests for new features New metrics added: - BACKGROUND_TASK_ERRORS_TOTAL - BACKGROUND_TASK_LAST_SUCCESS - BACKGROUND_TASK_DURATION_SECONDS - STORAGE_RECONCILIATION_STATUS 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HTTP_REQUESTS_IN_PROGRESSgauge (low-cardinality by API name)VIDEOS_WATCH_TIME_SECONDS_TOTALcounterWORKER_JOBS_COMPLETED_TOTALcounter (by worker_name)WORKER_HEARTBEAT_AGE_SECONDSgauge (by worker_name)STORAGE_VIDEOS_BYTESgauge with periodic reconciliationHTTPMetricsMiddlewarefor 6x better performance than BaseHTTPMiddlewareTechnical Details
apilabel (admin/worker/public) instead of full endpoint pathsworker_nameinstead of UUID for lower cardinalityTest plan
Closes #207
🤖 Generated with Claude Code