Skip to content

Add monitoring stack: log-viewer, metrics-collector, Python client, CI/CD, and deployment tooling#1

Merged
mattmezza merged 46 commits into
mainfrom
looop
Feb 12, 2026
Merged

Add monitoring stack: log-viewer, metrics-collector, Python client, CI/CD, and deployment tooling#1
mattmezza merged 46 commits into
mainfrom
looop

Conversation

@mattmezza
Copy link
Copy Markdown
Owner

Summary

This PR implements the complete monlightstack monitoring platform, adding three major components and supporting infrastructure:

  • Log Viewer service (log-viewer/): Zig-based service for Docker log ingestion with JSON parsing, multiline reassembly, cursor tracking, FTS5 full-text search, SSE live tail, and an embedded web UI
  • Metrics Collector service (metrics-collector/): Zig-based service for metrics ingestion, minute/hour aggregation with percentile computation, data retention, query API with auto-resolution, dashboard endpoint, and an embedded web UI with uPlot charts
  • Python client (clients/python/): monlightstack package with ErrorClient, MetricsClient, and FastAPI integration (MonlightMiddleware, MonlightExceptionHandler, setup_monlight) — includes comprehensive test suites
  • CI/CD workflows: GitHub Actions for Python client (test matrix + PyPI publishing) and Zig services (test, Docker build, GHCR push)
  • Deployment tooling: SQLite backup script with retention and optional S3 upload, upgrade script with rolling restarts and health verification, end-to-end smoke test suite, and deployment config fixes (API key mapping, LOG_LEVEL, memory limits)
  • Error Tracker enhancements: Web UI for error listing/detail, data retention cleanup for resolved errors, additional test coverage (fingerprinting, auth, rate limiting, retention)

Changes

  • 57 files changed, ~12,200 lines added
  • 25 commits covering iterative development from shared infrastructure through services, client, tests, CI/CD, and deployment

How to Review

This is a large PR covering the full platform build-out. Suggested review order:

  1. shared/ — reusable Zig modules (HTTP router, SQLite, auth, rate limiting, logging)
  2. error-tracker/ — enhancements (web UI, retention, tests)
  3. log-viewer/ — new service
  4. metrics-collector/ — new service
  5. clients/python/ — Python client and tests
  6. .github/workflows/ — CI/CD pipelines
  7. deploy/ — operational scripts and config

- Add plan.md with full implementation roadmap: shared Zig modules,
  per-service build order, Python client with FastAPI integration, CI/CD
- Expand specs.md with resolved design gaps:
  - Error occurrences table (last 5 per error group)
  - Log Viewer API key auth and multiline log reassembly
  - SSE lifecycle (30min max, 15s heartbeat, 5 max concurrent)
  - Metrics Collector additional indexes and label filtering strategy
  - Cross-cutting: request limits, concurrency model, graceful shutdown,
    structured logging, migration strategy, CORS, backup, deployment
Create top-level directories (error-tracker, log-viewer, metrics-collector,
clients/python, shared, deploy) with placeholder files. Add comprehensive
.gitignore for Zig, SQLite, Python, and Node artifacts. Set up
docker-compose.monitoring.yml with correct port mappings, volume mounts,
health checks, and network config. Add secrets.env.example template.
Implement the shared SQLite module (shared/sqlite.zig) with connection
wrapper, prepared statements, migration runner, and comprehensive tests.
Add error-tracker Zig project with build config, Dockerfile, HTTP server
skeleton, and database schema (errors + error_occurrences tables with
indexes). Fix Zig 0.13 compatibility issues and Docker build context.
Implement shared/config.zig providing reusable env var parsing (string,
int, bool, required/optional) and LOG_LEVEL initialization. Add
error-tracker/src/config.zig that loads all service-specific env vars
(DATABASE_PATH, API_KEY, POSTMARK_*, ALERT_EMAILS, RETENTION_DAYS,
BASE_URL, LOG_LEVEL) with proper defaults and required validation.
Integrate config into main.zig replacing inline env var reads.
…ication, and email alerting

Implement POST /api/errors with full request lifecycle: parse/validate JSON
body (required: project, exception_type, message, traceback; optional:
environment, request_url, request_method, request_headers, user_id, extra),
compute MD5 fingerprint, and upsert error records with create/increment/reopen
semantics. Each ingestion creates an occurrence record with per-request context,
trimmed to max 5 per error group. Email alerts via Postmark API fire on new
fingerprints only (skipped silently if unconfigured). All 173 tests pass.
…sor tracking, rotation detection, ring buffer cleanup, and background polling thread
…route wiring

Implements GET /api/logs (container, level, search, since, until, limit, offset),
GET /api/containers, GET /api/stats, and enhanced GET /health with log count
and last ingest timestamp. All 9 query tests pass.
…ing, heartbeat, and filters

Implements Server-Sent Events streaming with database polling for new log entries,
optional container/level filters, 30-minute max duration, 15-second heartbeat,
and max 5 concurrent SSE connections (503 when exceeded). Each SSE connection
runs in its own thread with its own SQLite connection.
…-expand

Embeds HTML page via @embedfile with Tailwind CSS styling, container/level
filter dropdowns, full-text search with debounce, time range selector,
SSE-based live tail mode, and expandable log entries showing full details.
Served at GET / before auth middleware. Marks all log-viewer tasks complete.
…board, web UI

Implements the full metrics-collector service:
- Bootstrap: build.zig, Dockerfile, HTTP server on port 8000
- SQLite database with metrics_raw and metrics_aggregated tables
- POST /api/metrics batch ingestion with validation
- Minute/hour aggregation engine with p50/p95/p99 percentiles
- Data retention cleanup (raw, minute, hourly tiers)
- GET /api/metrics query with period/resolution/label filters
- GET /api/metrics/names, GET /api/dashboard, GET /health
- Web UI dashboard with uPlot charts (metric explorer, latency percentiles)
- Auth middleware, rate limiting (200 req/min), 512KB body limit
- Background aggregation thread with own SQLite connection
- All tests passing (known config test exclusion)
…, and FastAPI integration

Create the monlightstack Python package with full implementations of:
- ErrorClient (async/sync error reporting with PII filtering)
- MetricsClient (buffered metrics with periodic flush)
- FastAPI integration (MonlightMiddleware, MonlightExceptionHandler, setup_monlight)
- 7 scaffolding tests verifying imports and basic behavior

Also mark shared infrastructure tasks as complete in plan.md (implemented
inline in each service rather than as separate shared modules).
… push

- Create .github/workflows/zig-services.yml with matrix strategy for all 3 services
- Run zig build test for each service with env vars (API_KEY, CONTAINERS)
- Build Docker images, verify size < 20MB, push to ghcr.io on main only
- Fix config tests to be environment-aware (work with or without API_KEY set)
- Fix docker-compose build context for log-viewer and metrics-collector
  (need repo root context to access shared/ directory)
- Map prefixed API keys from secrets.env (e.g., LOG_VIEWER_API_KEY) to
  the API_KEY env var each service reads
- Add LOG_LEVEL environment variable with info default to all services
- Add deploy.resources.limits.memory: 30M to all three services
- Remove obsolete docker-compose version field
- Document LOG_LEVEL in secrets.env.example
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the full MonlightStack monitoring platform: three Zig microservices (error-tracker, log-viewer, metrics-collector), a Python client with FastAPI integration and tests, CI/CD workflows, and deployment/ops tooling (compose configs, backup/upgrade scripts).

Changes:

  • Added shared Zig modules for configuration, API-key auth, and request rate/body-size limiting.
  • Implemented/extended services with embedded web UIs, SQLite schemas/migrations, retention cleanup, and test suites.
  • Added Python client package (monlightstack) + CI workflows and deployment scripts/configs for running/upgrading the stack.

Reviewed changes

Copilot reviewed 72 out of 78 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
shared/rate_limit.zig Shared in-memory rate limiter + Content-Length body-size guard + tests (note: retry-after and invalid Content-Length handling need adjustment).
shared/config.zig Shared env-var configuration helpers and LOG_LEVEL parsing.
shared/auth.zig Shared X-API-Key authentication helper with constant-time compare + tests.
shared/.gitkeep Keeps shared/ directory in git.
prompt.md Agent runbook / iteration workflow instructions.
metrics-collector/src/web_ui.zig Serves embedded metrics dashboard HTML.
metrics-collector/src/retention.zig Retention cleanup for raw/minute/hour aggregates + tests.
metrics-collector/src/database.zig SQLite schema/migrations + startup/migration tests.
metrics-collector/src/config.zig Env-driven service configuration (retention is hours/days; docs should match).
metrics-collector/build.zig.zon Zig package metadata for metrics-collector.
metrics-collector/build.zig Builds executable, wires shared modules, and defines unit test targets.
metrics-collector/Dockerfile Multi-stage Alpine build/runtime image for metrics-collector.
looop.sh Wrapper script to run an external “opencode” loop (has a shell quoting issue in --help usage output).
log-viewer/src/web_ui.zig Serves embedded log viewer HTML.
log-viewer/src/main.zig HTTP server routing + auth/rate-limit/body-size enforcement + SSE tail wiring.
log-viewer/src/log_level.zig Log level extraction heuristics + tests.
log-viewer/src/database.zig SQLite schema with FTS5 + cursor tracking + tests.
log-viewer/src/config.zig Env-driven log-viewer configuration + basic load test.
log-viewer/build.zig.zon Zig package metadata for log-viewer.
log-viewer/build.zig Builds executable, wires shared modules, and defines unit test targets.
log-viewer/Dockerfile Multi-stage Alpine build/runtime image for log-viewer.
error-tracker/src/web_ui.zig Serves embedded error listing/detail pages and path matcher tests.
error-tracker/src/static/index.html Error list UI (date parsing logic should not append an extra Z).
error-tracker/src/static/error_detail.html Error detail UI + resolve button (date parsing logic should not append an extra Z).
error-tracker/src/retention.zig Deletes old resolved errors + background thread + tests.
error-tracker/src/rate_limit_test.zig Integration tests for rate limiting and body size enforcement.
error-tracker/src/projects_listing.zig /api/projects JSON formatter + tests (ensure ArrayList is deinit’d on error paths).
error-tracker/src/fingerprint.zig MD5 fingerprinting from traceback location + extensive tests.
error-tracker/src/error_resolve.zig Resolve endpoint parsing + idempotent resolve logic + tests.
error-tracker/src/config.zig Env-driven error-tracker configuration + basic load test.
error-tracker/src/auth_test.zig Integration tests for X-API-Key behavior and health endpoint exclusions.
error-tracker/src/.gitkeep Keeps error-tracker/src/ directory in git.
error-tracker/build.zig.zon Zig package metadata for error-tracker.
error-tracker/build.zig Builds executable, wires shared modules, and defines unit/integration test targets.
error-tracker/Dockerfile Multi-stage Alpine build/runtime image for error-tracker.
deploy/upgrade.sh Rolling upgrade script with backups, rebuilds, restarts, and health checks.
deploy/secrets.env.example Example env file for per-service API keys and shared Postmark config.
deploy/docker-compose.test.yml E2E/smoke-test compose stack for all services.
deploy/docker-compose.monitoring.yml Production-ish compose config with ports, env wiring, healthchecks, and memory limits.
deploy/data/metrics/.gitkeep Keeps metrics data dir in git.
deploy/data/logs/.gitkeep Keeps logs data dir in git.
deploy/data/errors/.gitkeep Keeps errors data dir in git.
deploy/backup.sh SQLite .backup snapshot script with retention policy and optional S3 upload stub.
clients/python/tests/test_setup_monlight.py FastAPI integration tests for setup_monlight wiring and behavior.
clients/python/tests/test_scaffolding.py Basic import/scaffolding tests for the Python package.
clients/python/pyproject.toml Python package metadata, dependencies, and pytest config.
clients/python/monlightstack/metrics_client.py Buffered metrics client with periodic background flush.
clients/python/monlightstack/integrations/fastapi.py Middleware + exception handler + setup_monlight convenience wiring.
clients/python/monlightstack/integrations/__init__.py Integrations package marker.
clients/python/monlightstack/error_client.py Async/sync error reporting with sensitive-header filtering.
clients/python/monlightstack/__init__.py Top-level exports and package version.
clients/python/AGENTS.md Developer notes/patterns for the Python client module.
README.md Project documentation, setup guides, API reference, and env var tables (metrics retention units/defaults need to match code).
.gitignore Ignores Zig build artifacts, SQLite files, env files, and Python artifacts.
.github/workflows/zig-services.yml CI for Zig services: tests + docker build + size check + GHCR push on main.
.github/workflows/python-client.yml CI for Python client tests + intended PyPI publish job (publish trigger needs tag events to run).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +6 to +8
import monlightstack

assert monlightstack.__version__ == "0.1.0"
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'monlightstack' is imported with both 'import' and 'import from'.

Suggested change
import monlightstack
assert monlightstack.__version__ == "0.1.0"
from monlightstack import __version__
assert __version__ == "0.1.0"

Copilot uses AI. Check for mistakes.
from __future__ import annotations

import logging
from unittest.mock import patch
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'patch' is not used.

Suggested change
from unittest.mock import patch

Copilot uses AI. Check for mistakes.
import json
import logging
import time
from unittest.mock import patch
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'patch' is not used.

Suggested change
from unittest.mock import patch

Copilot uses AI. Check for mistakes.
from __future__ import annotations

import time
from unittest.mock import MagicMock, patch
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'patch' is not used.

Suggested change
from unittest.mock import MagicMock, patch
from unittest.mock import MagicMock

Copilot uses AI. Check for mistakes.

from __future__ import annotations

import asyncio
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'asyncio' is not used.

Suggested change
import asyncio

Copilot uses AI. Check for mistakes.
from __future__ import annotations

import asyncio
from unittest.mock import MagicMock, patch
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'MagicMock' is not used.

Suggested change
from unittest.mock import MagicMock, patch
from unittest.mock import patch

Copilot uses AI. Check for mistakes.
import asyncio
from unittest.mock import MagicMock, patch

import httpx
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'httpx' is not used.

Suggested change
import httpx

Copilot uses AI. Check for mistakes.

import httpx
import pytest
from fastapi import FastAPI, HTTPException
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'HTTPException' is not used.

Suggested change
from fastapi import FastAPI, HTTPException
from fastapi import FastAPI

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +103
# Extract headers as a plain dict
try:
request_context["request_headers"] = dict(request.headers)
except Exception:
pass
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
# Extract headers as a plain dict
try:
request_context["request_headers"] = dict(request.headers)
except Exception:
pass
# Extract headers as a plain dict (best-effort; failures are non-fatal)
try:
request_context["request_headers"] = dict(request.headers)
except Exception as e:
logger.debug(
"Failed to capture request headers for Monlight error context: %s",
e,
)

Copilot uses AI. Check for mistakes.
Rename the Python package directory from monlightstack/ to monlight/,
update all references across README, deploy scripts, test files, plan,
progress notes, AGENTS.md, and the metrics dashboard HTML. All 121
Python tests pass after the rename.
@mattmezza mattmezza merged commit adbdd5f into main Feb 12, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants