A process supervisor written in Python. Define a set of services in a YAML config, and procwatch will start them, watch them, restart them when they crash, and stream their logs to disk. There's a live terminal dashboard and a small HTTP API for controlling things at runtime.
Linux only — uses /proc/{pid}/stat for metrics and POSIX signals for shutdown.
requires Python 3.11+ and Linux
pip install -e .
procwatch examples/sample.yaml
From another terminal:
curl localhost:8080/status
Ctrl+C to stop. procwatch will SIGTERM all child processes, wait 5 seconds, then SIGKILL anything still running.
log_dir: /tmp/procwatch/logs
api_port: 8080
services:
- name: web
command: "python3 server.py"
restart: always
max_restarts: 5
backoff_base: 1.0
env:
PORT: "9000"
cwd: /opt/myapp
- name: worker
command: "python3 worker.py"
restart: on-failure
max_restarts: 3
- name: migrate
command: "python3 manage.py migrate"
restart: neverrestart options:
always— restart regardless of exit codeon-failure— only restart if exit code != 0never— run once
Backoff doubles each time: 1s, 2s, 4s... capped at 30s.
Each service gets its own log files:
/tmp/procwatch/logs/
web.stdout.log
web.stderr.log
Internal procwatch logs go to /tmp/procwatch/procwatch.log.
GET /routes list all endpoints
GET /status all services + state, pid, cpu, memory, uptime
POST /services/{name}/start start a stopped service
POST /services/{name}/stop stop a running service
POST /services/{name}/restart restart a service
Example:
curl localhost:8080/status
curl -X POST localhost:8080/services/web/restart
/status response:
[
{
"name": "web",
"state": "running",
"pid": 12345,
"restart_count": 1,
"uptime_seconds": 42.3,
"memory_kb": 18432
}
]States: pending running restarting stopped failed
failed means the service hit max_restarts and won't be retried.
Each service runs in its own asyncio.Task. The supervisor sits in the same event loop and waits on a stop event that gets set when SIGTERM or SIGINT arrives.
When a process exits, its worker checks the restart policy — if it should restart, it waits the backoff period and spawns again. stdout and stderr are drained concurrently with asyncio.gather alongside proc.wait(), so a process writing a lot of output doesn't block anything else.
Shutdown sends SIGTERM to the whole process group (not just the top-level PID) so child-of-child processes get cleaned up too. If they don't exit within 5 seconds they get SIGKILL.
Memory and CPU come from /proc/{pid}/stat — no psutil. CPU is calculated by diffing the cumulative tick counters between samples, which is the same approach top uses.
