Add operational improvements: metrics, backups, log rotation (Issue #414, PR #3)#424
Merged
filthyrake merged 2 commits intodevfrom Dec 27, 2025
Merged
Conversation
, PR #3) This is the third and final PR addressing the infrastructure review (Issue #414). ## Prometheus Metrics - Add prometheus-client dependency to pyproject.toml - Create api/metrics.py with comprehensive metrics: - HTTP request metrics (requests total, duration) - Video metrics (total videos, uploads) - Transcoding metrics (jobs total/active, duration, queue size) - Worker metrics (total workers, heartbeats) - Re-encode queue metrics - Database metrics (connections, retries, query duration) - Redis metrics (operations, circuit breaker state) - Storage and playback metrics - Add /metrics endpoint to Admin API (port 9001) - Add /api/metrics endpoint to Worker API (port 9002) ## Automated Database Backups - Add k8s/backup-cronjob.yaml for PostgreSQL backups - Runs daily at 2:00 AM UTC - Uses pg_dump with compression - 7-day retention with automatic cleanup - Proper security context (non-root, seccompProfile) - Mounts NAS storage for backup destination ## Audit Log Rotation - Add AUDIT_LOG_MAX_BYTES config (default: 10MB) - Add AUDIT_LOG_BACKUP_COUNT config (default: 5 backups) - Replace FileHandler with RotatingFileHandler in api/audit.py - Prevents unbounded log growth ## Issue #414 Checklist Completion - [x] PR #1: Security scanning (Trivy, pip-audit, .dockerignore, multi-stage builds) - [x] PR #2: Kubernetes security (pinned images, seccompProfile, NetworkPolicy) - [x] PR #3: Operational improvements (this PR) - [x] Prometheus metrics endpoints - [x] Automated database backups - [x] Audit log rotation - [x] Docker build/push workflow (already existed) - [x] Database connection pooling (already existed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes critical and important issues identified in code review: ## Backup Script Fixes (Critical) - Remove double compression (pg_dump custom format already compresses) - Add `set -eo pipefail` for proper error handling - Add backup integrity verification with `pg_restore --list` - Change file extension from .sql.gz to .dump - Remove corrupted backup file on verification failure ## Metrics Endpoint Fixes (Important) - Standardize endpoint paths: both Admin and Worker APIs now use `/metrics` - Document `/metrics` in AdminAuthMiddleware allowed paths list - Note: `/metrics` already bypasses auth (not under /api/* path) ## Test Coverage - Add tests/test_metrics.py with 12 tests covering: - Metrics module functionality - Prometheus format validation - Metric definitions (counters, gauges, histograms) - Audit log rotation configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This was referenced Dec 27, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is the third and final PR addressing the infrastructure review (Issue #414). Completes the remaining operational improvements.
Prometheus Metrics
prometheus-clientdependencyapi/metrics.py) with:/metricsendpoint to Admin API (port 9001)/api/metricsendpoint to Worker API (port 9002)Automated Database Backups
k8s/backup-cronjob.yamlfor PostgreSQL backups:Audit Log Rotation
VLOG_AUDIT_LOG_MAX_BYTESconfig (default: 10MB)VLOG_AUDIT_LOG_BACKUP_COUNTconfig (default: 5)FileHandlerwithRotatingFileHandlerIssue #414 Progress Complete!
Test plan
curl http://localhost:9001/metricscurl http://localhost:9002/api/metricskubectl apply -f k8s/backup-cronjob.yaml --dry-run=client🤖 Generated with Claude Code