A self-hosted uptime and synthetic monitoring platform built with Go, Gin, PostgreSQL, Redis, worker pools, Prometheus metrics, and webhook notifications.
UpTime started as a small uptime-check API. This rebuild turns the same idea into a backend-first portfolio project with real persistence, scheduler/worker separation, incident handling, API-key auth, metrics, and Docker Compose.
- Gin REST API with
GET /health, legacyGET /health-check, and legacyPOST /ping-endpoint - HTTP, keyword, TCP, DNS, and TLS checks
- HTTP timing details through
httptrace: DNS, TCP connect, TLS handshake, first byte, total duration - PostgreSQL tables for monitors, check results, incidents, notification channels, API keys, and audit logs
- Worker process with goroutines, channels, context cancellation, duplicate-check avoidance, and graceful shutdown
- Incident lifecycle: opens after
failureThresholdconsecutive failures and resolves on recovery - Webhook notification channels for incident open/resolve events
- API key authentication with hashed stored keys and a bootstrap admin key
- Prometheus metrics for API requests, checks, incidents, and worker jobs
- Docker Compose stack with API, worker, Postgres, Redis, Prometheus, and Grafana
These screenshots were captured from the live Docker Compose stack.
flowchart LR
User[User / API Client] --> API[Go Gin API]
API --> Postgres[(PostgreSQL)]
API --> Redis[(Redis)]
Worker[Go Worker Pool] --> Postgres
Worker --> Redis
Worker --> Targets[Websites / TCP / DNS / TLS Targets]
Worker --> Notify[Webhook Notifications]
Prometheus --> API
Prometheus --> Worker
Grafana --> Prometheus
- Go 1.22+
- Gin
- PostgreSQL via
pgx - Redis
- Prometheus client library
- Structured logging with
slog - Docker Compose
Run the full stack:
make docker-upAPI: http://localhost:8008
Prometheus: http://localhost:9090
Grafana: http://localhost:3000 with admin / admin
Run without Docker for Go processes:
export DATABASE_URL='postgres://uptime:uptime@localhost:5432/uptime?sslmode=disable'
export REDIS_URL='redis://localhost:6379/0'
export UPTIME_BOOTSTRAP_API_KEY='dev_admin_key'
make migrate
go run ./cmd/api
go run ./cmd/worker| Variable | Default | Description |
|---|---|---|
APP_ENV |
development |
Runtime environment (production enforces stricter defaults) |
APP_PORT |
8008 |
API port |
METRICS_PORT |
8009 |
Worker Prometheus metrics port |
DATABASE_URL |
local Postgres | PostgreSQL connection string (postgres:// or postgresql://) |
REDIS_URL |
local Redis | Redis connection string (redis:// or rediss://) |
UPTIME_BOOTSTRAP_API_KEY |
dev_admin_key (dev only) |
Bootstrap bearer token. Required in production; must be ≥ 16 chars |
ALLOW_PRIVATE_TARGETS |
false |
Allow localhost/private targets for checks/webhooks (forbidden in production) |
CHECK_WORKER_COUNT |
10 |
Worker goroutine count (1–1024) |
DEFAULT_CHECK_TIMEOUT_SECONDS |
10 |
Default check timeout (1–300) |
SCHEDULER_TICK_SECONDS |
5 |
How often the scheduler polls for due monitors (1–60) |
LOG_LEVEL |
info |
debug, info, warn, or error |
TLS_EXPIRY_WARN_DAYS |
14 |
Days before expiry that TLS checks report degraded |
WEBHOOK_SIGNING_SECRET |
empty | If set, webhook bodies are HMAC-SHA256 signed in X-UpTime-Signature |
WEBHOOK_TIMEOUT_SECONDS |
10 |
Per-attempt webhook timeout |
WEBHOOK_MAX_RETRIES |
3 |
Additional webhook attempts after the first failure (0–10) |
SHUTDOWN_TIMEOUT_SECONDS |
15 |
Graceful shutdown deadline |
API_READ_HEADER_TIMEOUT_SECONDS |
5 |
API http.Server read header timeout |
API_WRITE_TIMEOUT_SECONDS |
30 |
API http.Server write timeout |
MAX_REQUEST_BODY_BYTES |
1048576 |
Maximum accepted request body size in bytes |
MIGRATIONS_DIR |
migrations |
Directory containing *.up.sql files |
Health:
curl http://localhost:8008/healthManual legacy check:
curl -X POST http://localhost:8008/ping-endpoint \
-H "Content-Type: application/json" \
-d '{"endpoint":"https://example.com"}'Create a monitor:
curl -X POST http://localhost:8008/api/v1/monitors \
-H "Authorization: Bearer dev_admin_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Example Website",
"type": "http",
"target": "https://example.com",
"method": "GET",
"expectedStatus": 200,
"timeoutSeconds": 10,
"intervalSeconds": 60,
"failureThreshold": 3,
"enabled": true
}'Run a monitor now:
curl -X POST http://localhost:8008/api/v1/monitors/00000000-0000-0000-0000-000000000101/check-now \
-H "Authorization: Bearer dev_admin_key"Create an API key:
curl -X POST http://localhost:8008/api/v1/api-keys \
-H "Authorization: Bearer dev_admin_key" \
-H "Content-Type: application/json" \
-d '{"name":"local dev"}'cmd/worker periodically loads enabled monitors from PostgreSQL. It schedules checks by intervalSeconds, skips monitors already in flight, and fans jobs out to a fixed goroutine pool. Each job uses context timeouts, stores a check result, updates monitor status, and applies incident rules.
Redis is part of the local stack and health reporting. The current worker uses local in-process scheduling; Redis-backed distributed locks/queues are a natural next step for multiple worker replicas.
http: validates URL, blocks private targets by default, supportsGET/HEAD, expected status, redirects disabled, body snippets, and timing breakdowns.keyword: HTTP check plus expected keyword matching.tcp: checkshost:portreachability withnet.Dialer.dns: resolves a hostname with Go's resolver.tls: connects to a TLS endpoint and marks certificates near expiry as degraded.
Checks are stored in check_results. A monitor opens an incident only after failureThreshold consecutive failures. A succeeding check resolves the open incident. Webhook notifications are sent on both transitions and attempts are recorded in notification_events.
GET /metrics exposes API metrics. The worker exposes metrics on :8009/metrics.
Prometheus scrapes both services, and Grafana is provisioned with a starter UpTime dashboard.
A minimal job UI is served by the API at GET /workers. It polls
GET /api/v1/workers/status every 2 seconds and shows, per worker instance:
host, started/last-seen, active and queued jobs, in-flight monitor IDs, and
the most recent 50 check results. Workers write their state into
worker_heartbeats every 5 seconds, so the same view also reflects crashed
or restarting instances (rows older than ~20 seconds are flagged stale).
The HTML page is unauthenticated; it prompts for an API key client-side and uses it as a Bearer token for the protected status XHR.
/api/v1/*endpoints requireAuthorization: Bearer <key>orX-API-Key- Raw generated API keys are shown once; only SHA-256 hashes are stored
- URLs and webhooks block localhost/private/link-local targets unless
ALLOW_PRIVATE_TARGETS=true - Checks use context timeouts and bounded response snippets
- Logs avoid raw API keys and webhook payload secrets
make test
make checkThe test suite covers HTTP checker success, timeout, expected-status mismatch, SSRF blocking, TCP success/failure, DNS success/failure, TLS expiry classification, API key hashing, and incident open/resolve rules.
- Redis-backed distributed queue and locks
- Slack, Discord, and SMTP notification channels
- Public status pages
- Multi-tenant organisations
- Remote monitoring agents
- Optional React/Next.js dashboard
- Elasticsearch/Kibana analytics as a future optional integration




