Standardized Docker container management for a homelab environment. Containers run as systemd services, with unified observability (logs + metrics) and automated lifecycle management.
- Systemd-native lifecycle β Each stack is a systemd service, enabling boot ordering, dependency management, and standard
systemctlcommands - Centralized observability β All logs flow to Graylog; all metrics flow to Prometheus; Grafana provides unified dashboards
- Opt-in automation β Watchtower updates only labeled containers on a controlled schedule
- Explicit resource limits β Every container declares memory/CPU caps to prevent runaway usage
- Health-first orchestration β Services use healthchecks to gate dependent startups
- Programmatic validation β Python-based tooling for structured parsing, validation, and reporting
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Host β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Application Containers β β
β β graylog pihole unifi homeassistant openclaw β β
β ββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ β
β β stdout/stderr β metrics β
β βΌ βΌ β
β ββββββββββββββ βββββββββββββ β
β β Fluent Bit β β cAdvisor β β
β βββββββ¬βββββββ βββββββ¬ββββββ β
β β GELF β scrape β
β βΌ βΌ β
β ββββββββββββ ββββββββββββββ β
β β Graylog β β Prometheus β β
β ββββββββββββ ββββββββ¬ββββββ β
β β β β
β ββββββββββββββββ¬ββββββββββββ β
β βΌ β
β βββββββββββββ β
β β Grafana β β dashboards + alerting β
β βββββββββββββ β
β β
β Lifecycle: Systemd (boot) + Watchtower (image updates) β
β Validation: Python scripts β JSON reports β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Stack | Purpose | Ports |
|---|---|---|
| graylog | Log aggregation (MongoDB + OpenSearch + Graylog) | 9000, 514, 1514, 12201 |
| monitoring | Metrics pipeline (Prometheus + cAdvisor + Pushgateway + Grafana) | 3000, 9090, 9091 |
| fluentbit | Log shipper β tails Docker logs, ships to Graylog | β |
| watchtower | Automated container image updates | β |
| homeassistant | Home automation platform | 8123 (host) |
| pihole | DNS sinkhole and ad blocker | 53, 8053 (host) |
| unifi | UniFi network controller | 8443, 8080, 3478 |
| openclaw | AI agent gateway | 18789 |
/opt/docker/ # Production deployment path
βββ <stack>/
β βββ docker-compose.yml # Stack definition
β βββ .env # Secrets (not in git)
β βββ .env.example # Template for .env
β βββ README.md # Stack-specific docs
β βββ data/ # Persistent volumes
βββ scripts/ # Python validation & management tools
βββ validate.py # Stack validation
βββ audit.py # Full infrastructure audit
βββ healthcheck.py # Container health checks
βββ backup.py # Backup management
βββ setup.py # Prerequisites and installation
βββ host.py # Host system information
βββ lib/ # Core library modules
βββ templates/ # Systemd & cron templates
This repository mirrors the structure at /opt/docker/ on the target host.
Compose files are validated for healthcheck, restart policy, resource limits (all ERROR), plus container_name and Watchtower label (WARNING). Full list and severity: docs/STANDARDS.md. Run ./scripts/validate.py to check stacks.
# Validate all stacks (JSON output)
./scripts/validate.py
# Human-readable output
./scripts/validate.py --human
# Validate specific stack
./scripts/validate.py graylog
# Full infrastructure audit
./scripts/audit.py --summary
# Port conflict check
./scripts/audit.py --ports
# Image version audit
./scripts/audit.py --images --human# Start/stop/restart
sudo systemctl start docker-compose@<stack>
sudo systemctl stop docker-compose@<stack>
sudo systemctl restart docker-compose@<stack>
# Enable at boot
sudo systemctl enable docker-compose@<stack>
# View logs
sudo journalctl -u docker-compose@<stack> -f# Check all containers (JSON)
./scripts/healthcheck.py
# Human-readable with failures only
./scripts/healthcheck.py --human --quiet
# With metrics push
./scripts/healthcheck.py --push-metrics --send-log# Backup configurations
./scripts/backup.py
# Backup with data
./scripts/backup.py --data
# List existing backups
./scripts/backup.py --list --human# Full system report (JSON)
./scripts/host.py
# Human-readable
./scripts/host.py --human
# Specific sections
./scripts/host.py --hardware # CPU, memory, disk
./scripts/host.py --docker # Docker daemon info
./scripts/host.py --services # Systemd compose services# Check prerequisites
./scripts/setup.py
# Install/fix issues
./scripts/setup.py --installReference: docs/OBSERVABILITY.md.
| What | Where |
|---|---|
| Logs | Graylog UI (:9000) or docker logs <container> |
| Metrics | Grafana (:3000) or Prometheus (:9090) |
| Script metrics | Pushgateway (:9091) |
| Container stats | docker stats |
Send logs and metrics from scripts, external apps, or ad-hoc debugging sessions.
# Using Python library
from scripts.lib.observability import log_info
log_info("Operation completed", facility="myapp", duration_ms=150)# Direct curl to GELF HTTP
curl -X POST -H "Content-Type: application/json" \
-d '{"version":"1.1","host":"myhost","short_message":"Hello"}' \
http://localhost:12201/gelf# Using Python library
from scripts.lib.observability import metric_gauge
metric_gauge("myapp_items", 42, labels={"env": "prod"})# Direct curl
echo 'myapp_items 42' | curl --data-binary @- http://localhost:9091/metrics/job/myapp# 1. Check prerequisites
./scripts/setup.py
# 2. Install/configure (run fixes)
./scripts/setup.py --install
# 3. Create shared network
docker network create monitoring_net
# 4. Deploy stacks in dependency order
sudo systemctl enable --now docker-compose@graylog
sudo systemctl enable --now docker-compose@fluentbit
sudo systemctl enable --now docker-compose@monitoring
sudo systemctl enable --now docker-compose@watchtower
# ... then application stacks
# 5. Validate
./scripts/validate.py --humanSee each stack's README for specific setup instructions.
.envfiles contain secrets β never commit them (see.gitignore).envpermissions should be600- Containers needing Docker socket (
/var/run/docker.sock) are explicitly documented - Resource limits prevent denial-of-service from runaway containers
The project uses a dedicated service account for file ownership and a group-based access model for operators.
| User/Group | Purpose |
|---|---|
docker-services |
Service account that owns /opt/docker. System user (no login shell). |
docker |
Docker daemon group. Required to run docker commands. |
Operator access: Add your user to both groups to manage the project without sudo:
# Add user to required groups
sudo usermod -aG docker-services $USER
sudo usermod -aG docker $USER
# Apply (or log out and back in)
newgrp docker-servicesDirectory permissions: /opt/docker must have group write and setgid:
| Permission | Purpose |
|---|---|
g+w |
Group members can create/modify files |
g+s (setgid) |
New files inherit docker-services group |
Fix permissions if needed:
sudo chmod -R g+w /opt/docker
sudo find /opt/docker -type d -exec chmod g+s {} \;Or use setup.py:
sudo ./scripts/setup.py --fixVerify access:
# Should show docker-services and docker in groups
id $USER
# Should be able to create files without sudo
touch /opt/docker/test && rm /opt/docker/test-
Create stack directory with required files:
mkdir -p myapp touch myapp/docker-compose.yml myapp/.env.example myapp/README.md
-
Edit
docker-compose.ymlwith required standards (healthcheck, restart, limits, labels) -
Create
.envfrom.env.example -
Create data directories:
sudo mkdir -p /opt/docker/myapp/data sudo chown -R docker-services:docker-services /opt/docker/myapp
-
Enable and start:
sudo systemctl enable --now docker-compose@myapp -
Validate:
./scripts/validate.py myapp
Scheduled maintenance is installed via ./scripts/setup.py --install. Full schedule and commands: docs/CRON.md.
Evergreen reference: docs/*.md (one doc per topic).
- docs/LIFECYCLE.md β stack lifecycle, Watchtower, backups (host-level ops)
- docs/OBSERVABILITY.md β logging, alerting, health endpoints
- docs/STANDARDS.md β validation rules
- docs/SCRIPTS.md β CLI reference
- docs/CRON.md β scheduled jobs
- docs/STACKS.md β stack list and ports
- scripts/ β Python tooling
- scripts/templates/ β systemd and cron templates
- Stack-specific READMEs in each stack directory