A Docker sidecar watchdog that monitors container health and automatically recovers failed containers without requiring an external API.
[!NOTE] This project is not officially part of the Dockhand project and is maintained independently.
[!TIP] π€ AI-Assisted Development: This project is developed with assistance from GitHub Copilot, leveraging AI to enhance code quality and development efficiency.
Dockhand Guardian is a lightweight Python-based monitoring service that watches over your Docker
containers (specifically dockhand-app and dockhand-database) via Docker socket. When containers
fail health checks for longer than a configured grace period, it automatically triggers a recovery
process by pulling the latest images and recreating the containers.
dockhand-guardian/
βββ src/ # Application source code
β βββ __init__.py
β βββ guardian.py # Main watchdog application
β
βββ tests/ # Unit tests
β βββ test_guardian.py
β
βββ docker/ # Docker & container configuration
β βββ Dockerfile # Container image definition
β βββ docker-compose.yml # Example deployment setup
β
βββ docs/ # Documentation
β βββ README.md # This file
β βββ CONTRIBUTING.md # Contribution guidelines
β βββ WEBHOOKS.md # Webhook configuration guide
β βββ CHANGELOG.md # Version history
β
βββ .github/ # GitHub configuration
β βββ workflows/ # CI/CD workflows
β βββ ISSUE_TEMPLATE/ # Issue templates
β βββ dependabot.yml # Dependency automation
β
βββ Root files # Config & symlinks
βββ pyproject.toml # Python dependencies & project config
βββ package.json # npm dev tools
βββ Makefile # Development commands
βββ .releaserc.json # Release automation
Note: Important files (README, Dockerfile, docker-compose.yml, CHANGELOG) are symlinked to the root for convenience and GitHub compatibility.
- π Container Health Monitoring: Monitors Docker container state and built-in health checks
- π Optional HTTP Checks: Additional HTTP endpoint health verification
- β±οΈ Grace Period: Configurable grace period before triggering recovery
- π Auto-Recovery: Automatically pulls latest images and recreates containers
- π§ Maintenance Mode: Support for maintenance flag file to pause monitoring
- βΈοΈ Cooldown Period: Prevents recovery loops with configurable cooldown
- π³ Docker Socket Communication: Direct communication with Docker daemon (no external API needed)
- π’ Webhook Notifications: Send alerts via 80+ services using Apprise (Discord, Teams, Slack, Email, etc.)
- βοΈ Configurable: All parameters configurable via environment variables
# Pull from GitHub Container Registry
docker pull ghcr.io/strausmann/dockhand-guardian:latest
# Or use specific version
docker pull ghcr.io/strausmann/dockhand-guardian:1.4.1 # Full version
docker pull ghcr.io/strausmann/dockhand-guardian:1.4 # Minor version
docker pull ghcr.io/strausmann/dockhand-guardian:1 # Major version
# Or use in docker-compose.yml
services:
guardian:
image: ghcr.io/strausmann/dockhand-guardian:latest
# ... rest of configuration-
Clone the repository:
git clone https://github.com/strausmann/dockhand-guardian.git cd dockhand-guardian -
Build and start the stack:
docker compose up -d
-
View guardian logs:
docker compose logs -f guardian
[!TIP] Recommended: Run the guardian in a separate stack from the monitored containers. This ensures the guardian remains running during recovery operations and can monitor multiple stacks.
[!NOTE] Alternative: You can run the guardian in the same stack as the monitored containers, but be aware that it will be briefly restarted during recovery operations when
docker compose up -d --force-recreateis executed.
Run guardian as a standalone container monitoring another stack:
docker run -d \
--name dockhand-guardian \
--restart unless-stopped \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v "/path/to/monitored/stack:/stack:ro" \
-e MONITORED_CONTAINERS=dockhand-app,dockhand-database \
-e GRACE_SECONDS=300 \
-e CHECK_INTERVAL=30 \
-e COOLDOWN_SECONDS=600 \
-e HTTP_CHECKS=dockhand-app=http://dockhand-app:80/health \
-e WEBHOOK_URLS=discord://webhook_id/token \
ghcr.io/strausmann/dockhand-guardian:latestGuardian Stack (guardian/docker-compose.yml):
services:
guardian:
image: ghcr.io/strausmann/dockhand-guardian:latest
container_name: dockhand-guardian
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /path/to/monitored/stack:/stack:ro
environment:
MONITORED_CONTAINERS: dockhand-app,dockhand-database
GRACE_SECONDS: 300
CHECK_INTERVAL: 30
COOLDOWN_SECONDS: 600Monitored Stack (app/docker-compose.yml):
services:
dockhand-app:
image: nginx:alpine
container_name: dockhand-app
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost/"]
interval: 30s
timeout: 10s
retries: 3
dockhand-database:
image: postgres:16-alpine
container_name: dockhand-database
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 30sSingle Stack (guardian monitors containers in same compose file):
Single Stack (guardian monitors containers in same compose file):
[!WARNING] When using this approach, the guardian will be restarted during recovery operations. Monitoring will be interrupted for a few seconds while the guardian restarts.
services:
dockhand-app:
image: nginx:alpine
container_name: dockhand-app
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
dockhand-database:
image: postgres:16-alpine
container_name: dockhand-database
restart: unless-stopped
environment:
POSTGRES_PASSWORD: example
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 30s
timeout: 10s
retries: 3
guardian:
image: ghcr.io/strausmann/dockhand-guardian:latest
container_name: dockhand-guardian
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- .:/stack:ro
environment:
MONITORED_CONTAINERS: dockhand-app,dockhand-database
GRACE_SECONDS: 300
CHECK_INTERVAL: 30
COOLDOWN_SECONDS: 600
HTTP_CHECKS: dockhand-app=http://dockhand-app:80/
WEBHOOK_URLS: discord://webhook_id/tokenUsing Docker Compose Secrets:
services:
guardian:
image: ghcr.io/strausmann/dockhand-guardian:latest
container_name: dockhand-guardian
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- .:/stack:ro
environment:
MONITORED_CONTAINERS: dockhand-app,dockhand-database
GRACE_SECONDS: 300
WEBHOOK_URLS_FILE: /run/secrets/webhook_urls
secrets:
- webhook_urls
secrets:
webhook_urls:
file: ./secrets/webhook_urls.txtAll configuration is done via environment variables:
| Variable | Description | Default |
|---|---|---|
MONITORED_CONTAINERS |
Comma-separated list of container names to monitor | dockhand-app,dockhand-database |
GRACE_SECONDS |
Time in seconds to wait before triggering recovery | 300 |
CHECK_INTERVAL |
How often to check container health (seconds) | 30 |
COOLDOWN_SECONDS |
Cooldown period after recovery (seconds) | 600 |
STACK_DIR |
Directory containing docker-compose.yml | /stack |
MAINTENANCE_FILE |
Maintenance mode flag file name | .maintenance |
HTTP_CHECKS |
Optional HTTP checks (format: container=url,container2=url2) |
(empty) |
WEBHOOK_URLS |
Webhook URLs for notifications (comma-separated Apprise URLs) | (empty) |
environment:
MONITORED_CONTAINERS: dockhand-app,dockhand-database
GRACE_SECONDS: 300
CHECK_INTERVAL: 30
COOLDOWN_SECONDS: 600
HTTP_CHECKS: dockhand-app=http://dockhand-app:80/healthGuardian uses Apprise for webhook notifications, supporting 80+ notification services including Discord, Microsoft Teams, Slack, Telegram, Email, and many more.
Configure notifications via Apprise URLs:
environment:
# Single service
WEBHOOK_URLS: discord://webhook_id/webhook_token
# Multiple services (comma-separated)
WEBHOOK_URLS: discord://webhook_id/token,mailto://user:pass@gmail.com-
Create webhook in Discord:
- Server Settings β Integrations β Webhooks β New Webhook
- Copy webhook URL:
https://discord.com/api/webhooks/ID/TOKEN
-
Configure guardian:
WEBHOOK_URLS: discord://webhook_id/webhook_token
-
Create webhook in Teams:
- Channel β Connectors β Incoming Webhook
- Copy webhook URL
-
Configure guardian:
WEBHOOK_URLS: msteams://TokenA/TokenB/TokenC/
- Create Slack App with incoming webhook
- Configure guardian:
WEBHOOK_URLS: slack://TokenA/TokenB/TokenC/
Send notifications to multiple services simultaneously:
WEBHOOK_URLS: discord://ID/TOKEN,msteams://A/B/C/,slack://X/Y/Z/Apprise supports 80+ services. See Apprise documentation for all supported URLs:
- Email (SMTP, Gmail, etc.)
- Telegram
- Matrix
- Pushover
- IFTTT
- Custom JSON endpoints
- And many more!
To pause monitoring during maintenance:
# Enable maintenance mode
touch .maintenance
# Disable maintenance mode
rm .maintenanceWhen the maintenance file exists in the stack directory, the guardian will skip all health checks.
- Monitoring: Guardian checks each monitored container every
CHECK_INTERVALseconds - Health Checks:
- Verifies container is running
- Checks Docker health status (if configured)
- Optionally checks HTTP endpoints
- Grace Period: If a container fails checks, guardian waits
GRACE_SECONDSbefore taking action - Recovery: After grace period expires:
- Executes
docker compose pullto get latest images - Executes
docker compose up -d --force-recreateto recreate containers
- Executes
- Cooldown: After recovery, waits
COOLDOWN_SECONDSbefore monitoring again
See docker-compose.yml for a complete example including:
- Sample application container (nginx)
- Sample database container (PostgreSQL)
- Guardian sidecar configuration
- Proper volume mounts and networking
# Local build
docker build -t dockhand-guardian .
# Multi-platform build (amd64 + arm64)
docker buildx build --platform linux/amd64,linux/arm64 -t dockhand-guardian .Docker images are automatically built and published to GitHub Container Registry on every release with semantic version tags:
latest- Always points to the newest releaseX.X.X- Full version (e.g.,1.4.1)X.X- Minor version, updated with patches (e.g.,1.4)X- Major version, updated with minor/patch (e.g.,1)
- Python 3.11+
- Docker
- Docker Compose
# Install dependencies (includes dev tools)
pip install -e .[dev]
# Install pre-commit hooks
pre-commit installThis project uses modern Python tooling:
- Ruff: Ultra-fast linter and formatter (10-100x faster than flake8/black/isort)
- mypy: Static type checking
- pre-commit: Automated Git hooks for code quality
- pytest: Testing framework with coverage reporting
# Lint code
make lint # Run ruff checks
# Format code
make format # Auto-fix issues and format
# Type check
make type-check # Run mypy
# Run tests
make test # Run pytest with coverage
# Run all checks
make check # Lint + format-check + type-check + tests
# Git workflow
make commit # Interactive commit with quality checks
make amend # Add changes to last commit
make push # Pull with rebase and push
# CI/Workflow validation
make validate-commit # Validate commit message format
make validate-workflows # Check workflow syntax
make ci-local # Run all CI checks locally
make ci-status # Show GitHub Actions status
make ci-logs # Show logs of latest workflow
make ci-watch # Watch running workflows# Set environment variables
export MONITORED_CONTAINERS=dockhand-app,dockhand-database
export GRACE_SECONDS=60
export STACK_DIR=/path/to/your/stack
# Run guardian
python src/guardian.pyThis project uses semantic versioning and conventional commits:
# Install dependencies
npm install
# Make changes and commit using commitizen
npm run commit
# Or commit manually with proper format
git commit -m "feat(monitoring): add new health check type"Pre-commit hooks will automatically:
- Run Ruff linting and formatting
- Check type hints with mypy
- Validate YAML files
- Run tests
See SCOPES.md for available commit scopes.
- The guardian requires read access to Docker socket (
/var/run/docker.sock) - Mount the stack directory as read-only (
:ro) when possible - Use Docker secrets for sensitive configuration in production
- The guardian has permission to recreate containers, so protect access appropriately
- Verify container names match exactly (check with
docker ps) - Ensure containers are in the same Docker network
- Check guardian logs:
docker compose logs guardian
- Check if maintenance mode is enabled (
.maintenancefile exists) - Verify grace period has elapsed
- Check if in cooldown period after previous recovery
- Review guardian logs for error messages
- Ensure Docker socket is properly mounted
- Verify guardian has access to stack directory
- Check Docker socket permissions on host
MIT License - see LICENSE file for details.
Contributions are welcome! This project uses:
- π Conventional Commits for automated versioning
- π Semantic Release for automated releases
- π³ Automatic Docker image publishing to GitHub Container Registry
- π― Required commit scopes (see SCOPES.md)
Important: Not all commits trigger releases:
- β
feat,fix,perf,refactor,buildβ Create releases + Docker images - βΈοΈ
docs,ci,test,style,choreβ No release (documentation & tooling only)
Dependency Updates:
- π³ Docker base image updates β Automatic patch release + new Docker image
- π Python package updates β Automatic patch release + new Docker image
- βοΈ GitHub Actions updates β No release (CI tooling only)
- π¦ npm updates β No release (dev tooling only)
For detailed guidelines, see CONTRIBUTING.md.
BjΓΆrn Strausmann