Add monitoring stack: log-viewer, metrics-collector, Python client, CI/CD, and deployment tooling#1
Conversation
- Add plan.md with full implementation roadmap: shared Zig modules,
per-service build order, Python client with FastAPI integration, CI/CD
- Expand specs.md with resolved design gaps:
- Error occurrences table (last 5 per error group)
- Log Viewer API key auth and multiline log reassembly
- SSE lifecycle (30min max, 15s heartbeat, 5 max concurrent)
- Metrics Collector additional indexes and label filtering strategy
- Cross-cutting: request limits, concurrency model, graceful shutdown,
structured logging, migration strategy, CORS, backup, deployment
Create top-level directories (error-tracker, log-viewer, metrics-collector, clients/python, shared, deploy) with placeholder files. Add comprehensive .gitignore for Zig, SQLite, Python, and Node artifacts. Set up docker-compose.monitoring.yml with correct port mappings, volume mounts, health checks, and network config. Add secrets.env.example template.
Implement the shared SQLite module (shared/sqlite.zig) with connection wrapper, prepared statements, migration runner, and comprehensive tests. Add error-tracker Zig project with build config, Dockerfile, HTTP server skeleton, and database schema (errors + error_occurrences tables with indexes). Fix Zig 0.13 compatibility issues and Docker build context.
Implement shared/config.zig providing reusable env var parsing (string, int, bool, required/optional) and LOG_LEVEL initialization. Add error-tracker/src/config.zig that loads all service-specific env vars (DATABASE_PATH, API_KEY, POSTMARK_*, ALERT_EMAILS, RETENTION_DAYS, BASE_URL, LOG_LEVEL) with proper defaults and required validation. Integrate config into main.zig replacing inline env var reads.
…d integration tests
…ication, and email alerting Implement POST /api/errors with full request lifecycle: parse/validate JSON body (required: project, exception_type, message, traceback; optional: environment, request_url, request_method, request_headers, user_id, extra), compute MD5 fingerprint, and upsert error records with create/increment/reopen semantics. Each ingestion creates an occurrence record with per-request context, trimmed to max 5 per error group. Email alerts via Postmark API fire on new fingerprints only (skipped silently if unconfigured). All 173 tests pass.
…otent resolution and tests
…og level extraction, HTTP skeleton
…sor tracking, rotation detection, ring buffer cleanup, and background polling thread
…route wiring Implements GET /api/logs (container, level, search, since, until, limit, offset), GET /api/containers, GET /api/stats, and enhanced GET /health with log count and last ingest timestamp. All 9 query tests pass.
…ing, heartbeat, and filters Implements Server-Sent Events streaming with database polling for new log entries, optional container/level filters, 30-minute max duration, 15-second heartbeat, and max 5 concurrent SSE connections (503 when exceeded). Each SSE connection runs in its own thread with its own SQLite connection.
…-expand Embeds HTML page via @embedfile with Tailwind CSS styling, container/level filter dropdowns, full-text search with debounce, time range selector, SSE-based live tail mode, and expandable log entries showing full details. Served at GET / before auth middleware. Marks all log-viewer tasks complete.
…board, web UI Implements the full metrics-collector service: - Bootstrap: build.zig, Dockerfile, HTTP server on port 8000 - SQLite database with metrics_raw and metrics_aggregated tables - POST /api/metrics batch ingestion with validation - Minute/hour aggregation engine with p50/p95/p99 percentiles - Data retention cleanup (raw, minute, hourly tiers) - GET /api/metrics query with period/resolution/label filters - GET /api/metrics/names, GET /api/dashboard, GET /health - Web UI dashboard with uPlot charts (metric explorer, latency percentiles) - Auth middleware, rate limiting (200 req/min), 512KB body limit - Background aggregation thread with own SQLite connection - All tests passing (known config test exclusion)
…session 14 learnings
…, and FastAPI integration Create the monlightstack Python package with full implementations of: - ErrorClient (async/sync error reporting with PII filtering) - MetricsClient (buffered metrics with periodic flush) - FastAPI integration (MonlightMiddleware, MonlightExceptionHandler, setup_monlight) - 7 scaffolding tests verifying imports and basic behavior Also mark shared infrastructure tasks as complete in plan.md (implemented inline in each service rather than as separate shared modules).
…g, and fire-and-forget behavior
…er, shutdown, and error handling
…ented in session 15)
…oint normalization, and edge cases
… push - Create .github/workflows/zig-services.yml with matrix strategy for all 3 services - Run zig build test for each service with env vars (API_KEY, CONTAINERS) - Build Docker images, verify size < 20MB, push to ghcr.io on main only - Fix config tests to be environment-aware (work with or without API_KEY set) - Fix docker-compose build context for log-viewer and metrics-collector (need repo root context to access shared/ directory)
- Map prefixed API keys from secrets.env (e.g., LOG_VIEWER_API_KEY) to the API_KEY env var each service reads - Add LOG_LEVEL environment variable with info default to all services - Add deploy.resources.limits.memory: 30M to all three services - Remove obsolete docker-compose version field - Document LOG_LEVEL in secrets.env.example
…erence, and ops documentation
There was a problem hiding this comment.
Pull request overview
This PR introduces the full MonlightStack monitoring platform: three Zig microservices (error-tracker, log-viewer, metrics-collector), a Python client with FastAPI integration and tests, CI/CD workflows, and deployment/ops tooling (compose configs, backup/upgrade scripts).
Changes:
- Added shared Zig modules for configuration, API-key auth, and request rate/body-size limiting.
- Implemented/extended services with embedded web UIs, SQLite schemas/migrations, retention cleanup, and test suites.
- Added Python client package (
monlightstack) + CI workflows and deployment scripts/configs for running/upgrading the stack.
Reviewed changes
Copilot reviewed 72 out of 78 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
shared/rate_limit.zig |
Shared in-memory rate limiter + Content-Length body-size guard + tests (note: retry-after and invalid Content-Length handling need adjustment). |
shared/config.zig |
Shared env-var configuration helpers and LOG_LEVEL parsing. |
shared/auth.zig |
Shared X-API-Key authentication helper with constant-time compare + tests. |
shared/.gitkeep |
Keeps shared/ directory in git. |
prompt.md |
Agent runbook / iteration workflow instructions. |
metrics-collector/src/web_ui.zig |
Serves embedded metrics dashboard HTML. |
metrics-collector/src/retention.zig |
Retention cleanup for raw/minute/hour aggregates + tests. |
metrics-collector/src/database.zig |
SQLite schema/migrations + startup/migration tests. |
metrics-collector/src/config.zig |
Env-driven service configuration (retention is hours/days; docs should match). |
metrics-collector/build.zig.zon |
Zig package metadata for metrics-collector. |
metrics-collector/build.zig |
Builds executable, wires shared modules, and defines unit test targets. |
metrics-collector/Dockerfile |
Multi-stage Alpine build/runtime image for metrics-collector. |
looop.sh |
Wrapper script to run an external “opencode” loop (has a shell quoting issue in --help usage output). |
log-viewer/src/web_ui.zig |
Serves embedded log viewer HTML. |
log-viewer/src/main.zig |
HTTP server routing + auth/rate-limit/body-size enforcement + SSE tail wiring. |
log-viewer/src/log_level.zig |
Log level extraction heuristics + tests. |
log-viewer/src/database.zig |
SQLite schema with FTS5 + cursor tracking + tests. |
log-viewer/src/config.zig |
Env-driven log-viewer configuration + basic load test. |
log-viewer/build.zig.zon |
Zig package metadata for log-viewer. |
log-viewer/build.zig |
Builds executable, wires shared modules, and defines unit test targets. |
log-viewer/Dockerfile |
Multi-stage Alpine build/runtime image for log-viewer. |
error-tracker/src/web_ui.zig |
Serves embedded error listing/detail pages and path matcher tests. |
error-tracker/src/static/index.html |
Error list UI (date parsing logic should not append an extra Z). |
error-tracker/src/static/error_detail.html |
Error detail UI + resolve button (date parsing logic should not append an extra Z). |
error-tracker/src/retention.zig |
Deletes old resolved errors + background thread + tests. |
error-tracker/src/rate_limit_test.zig |
Integration tests for rate limiting and body size enforcement. |
error-tracker/src/projects_listing.zig |
/api/projects JSON formatter + tests (ensure ArrayList is deinit’d on error paths). |
error-tracker/src/fingerprint.zig |
MD5 fingerprinting from traceback location + extensive tests. |
error-tracker/src/error_resolve.zig |
Resolve endpoint parsing + idempotent resolve logic + tests. |
error-tracker/src/config.zig |
Env-driven error-tracker configuration + basic load test. |
error-tracker/src/auth_test.zig |
Integration tests for X-API-Key behavior and health endpoint exclusions. |
error-tracker/src/.gitkeep |
Keeps error-tracker/src/ directory in git. |
error-tracker/build.zig.zon |
Zig package metadata for error-tracker. |
error-tracker/build.zig |
Builds executable, wires shared modules, and defines unit/integration test targets. |
error-tracker/Dockerfile |
Multi-stage Alpine build/runtime image for error-tracker. |
deploy/upgrade.sh |
Rolling upgrade script with backups, rebuilds, restarts, and health checks. |
deploy/secrets.env.example |
Example env file for per-service API keys and shared Postmark config. |
deploy/docker-compose.test.yml |
E2E/smoke-test compose stack for all services. |
deploy/docker-compose.monitoring.yml |
Production-ish compose config with ports, env wiring, healthchecks, and memory limits. |
deploy/data/metrics/.gitkeep |
Keeps metrics data dir in git. |
deploy/data/logs/.gitkeep |
Keeps logs data dir in git. |
deploy/data/errors/.gitkeep |
Keeps errors data dir in git. |
deploy/backup.sh |
SQLite .backup snapshot script with retention policy and optional S3 upload stub. |
clients/python/tests/test_setup_monlight.py |
FastAPI integration tests for setup_monlight wiring and behavior. |
clients/python/tests/test_scaffolding.py |
Basic import/scaffolding tests for the Python package. |
clients/python/pyproject.toml |
Python package metadata, dependencies, and pytest config. |
clients/python/monlightstack/metrics_client.py |
Buffered metrics client with periodic background flush. |
clients/python/monlightstack/integrations/fastapi.py |
Middleware + exception handler + setup_monlight convenience wiring. |
clients/python/monlightstack/integrations/__init__.py |
Integrations package marker. |
clients/python/monlightstack/error_client.py |
Async/sync error reporting with sensitive-header filtering. |
clients/python/monlightstack/__init__.py |
Top-level exports and package version. |
clients/python/AGENTS.md |
Developer notes/patterns for the Python client module. |
README.md |
Project documentation, setup guides, API reference, and env var tables (metrics retention units/defaults need to match code). |
.gitignore |
Ignores Zig build artifacts, SQLite files, env files, and Python artifacts. |
.github/workflows/zig-services.yml |
CI for Zig services: tests + docker build + size check + GHCR push on main. |
.github/workflows/python-client.yml |
CI for Python client tests + intended PyPI publish job (publish trigger needs tag events to run). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import monlightstack | ||
|
|
||
| assert monlightstack.__version__ == "0.1.0" |
There was a problem hiding this comment.
Module 'monlightstack' is imported with both 'import' and 'import from'.
| import monlightstack | |
| assert monlightstack.__version__ == "0.1.0" | |
| from monlightstack import __version__ | |
| assert __version__ == "0.1.0" |
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| from unittest.mock import patch |
There was a problem hiding this comment.
Import of 'patch' is not used.
| from unittest.mock import patch |
| import json | ||
| import logging | ||
| import time | ||
| from unittest.mock import patch |
There was a problem hiding this comment.
Import of 'patch' is not used.
| from unittest.mock import patch |
| from __future__ import annotations | ||
|
|
||
| import time | ||
| from unittest.mock import MagicMock, patch |
There was a problem hiding this comment.
Import of 'patch' is not used.
| from unittest.mock import MagicMock, patch | |
| from unittest.mock import MagicMock |
|
|
||
| from __future__ import annotations | ||
|
|
||
| import asyncio |
There was a problem hiding this comment.
Import of 'asyncio' is not used.
| import asyncio |
| from __future__ import annotations | ||
|
|
||
| import asyncio | ||
| from unittest.mock import MagicMock, patch |
There was a problem hiding this comment.
Import of 'MagicMock' is not used.
| from unittest.mock import MagicMock, patch | |
| from unittest.mock import patch |
| import asyncio | ||
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import httpx |
There was a problem hiding this comment.
Import of 'httpx' is not used.
| import httpx |
|
|
||
| import httpx | ||
| import pytest | ||
| from fastapi import FastAPI, HTTPException |
There was a problem hiding this comment.
Import of 'HTTPException' is not used.
| from fastapi import FastAPI, HTTPException | |
| from fastapi import FastAPI |
| # Extract headers as a plain dict | ||
| try: | ||
| request_context["request_headers"] = dict(request.headers) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| # Extract headers as a plain dict | |
| try: | |
| request_context["request_headers"] = dict(request.headers) | |
| except Exception: | |
| pass | |
| # Extract headers as a plain dict (best-effort; failures are non-fatal) | |
| try: | |
| request_context["request_headers"] = dict(request.headers) | |
| except Exception as e: | |
| logger.debug( | |
| "Failed to capture request headers for Monlight error context: %s", | |
| e, | |
| ) |
Rename the Python package directory from monlightstack/ to monlight/, update all references across README, deploy scripts, test files, plan, progress notes, AGENTS.md, and the metrics dashboard HTML. All 121 Python tests pass after the rename.
Summary
This PR implements the complete monlightstack monitoring platform, adding three major components and supporting infrastructure:
log-viewer/): Zig-based service for Docker log ingestion with JSON parsing, multiline reassembly, cursor tracking, FTS5 full-text search, SSE live tail, and an embedded web UImetrics-collector/): Zig-based service for metrics ingestion, minute/hour aggregation with percentile computation, data retention, query API with auto-resolution, dashboard endpoint, and an embedded web UI with uPlot chartsclients/python/):monlightstackpackage withErrorClient,MetricsClient, and FastAPI integration (MonlightMiddleware,MonlightExceptionHandler,setup_monlight) — includes comprehensive test suitesChanges
How to Review
This is a large PR covering the full platform build-out. Suggested review order:
shared/— reusable Zig modules (HTTP router, SQLite, auth, rate limiting, logging)error-tracker/— enhancements (web UI, retention, tests)log-viewer/— new servicemetrics-collector/— new serviceclients/python/— Python client and tests.github/workflows/— CI/CD pipelinesdeploy/— operational scripts and config