This is a learning project. I am building a small process supervisor to understand how schedulers actually work under the hood: how they start processes in the right order, keep them alive, back off on repeated failures, and expose a sensible operator interface.
It is also an excuse to explore defensive programming practices in Go, things like bounded channels, goroutine panic recovery, nil and bounds checking, and keeping a long running process from ever crashing. Later on I would like to see if I can run it on a minimal Linux image as the service manager, instead of reaching for systemd, s6, or runit.
Coffey is not intended to be used in anything real. If you need a production process supervisor, reach for s6, runit, or systemd. Those are measured in kilobytes; this will be measured in megabytes, because Go drags a runtime and garbage collector along with it. The gap is real and it is not going away by being clever. Treat this repo as a study of how supervisors work, not as a replacement for the ones that already do the job well.
If it turns out well and I can be bothered, I might rewrite it later in something like Rust to get the binary size closer to the existing tools. That is a long way off and may never happen.
- Define a set of services in YAML and have them supervised from a single static binary
- Start services in dependency order and restart them on failure with exponential backoff
- Run liveness and readiness health checks against each service
- Talk to the running daemon over a unix socket with a small CLI
- Eventually run it as PID 1 on a minimal OS image
The config format is loosely inspired by bits and pieces from systemd unit files, docker compose, and the drop-in directory style used by s6 and runit. It is not a copy of any of them. This is all very experimental and the structure will almost certainly change as I work through the project.
The rough idea is to have a main file at /etc/coffey/config.yaml for global
settings, and individual service definitions dropped into
/etc/coffey/services.d/{service}.yaml so each service can be managed on its
own. The example below shows both in one file, which should also work.
coffey:
socket: /run/coffey.sock
pid_file: /run/coffey.pid
log:
path: /data/logs/coffey.log
console: true
services:
- name: api
binary: /usr/local/bin/api
env:
- PORT=8080
- DATABASE_PATH=/data/app.db
restart: always
healthcheck:
readiness:
type: http
url: http://127.0.0.1:8080/healthz
interval: 5s
- name: frontend
binary: /usr/bin/node
args:
- /opt/frontend/server.js
env:
- PORT=3000
- API_URL=http://127.0.0.1:8080
depends_on:
- name: api
condition: healthy
restart: on-failure
backoff:
initial: 1s
max: 30s
multiplier: 2.0There will be a single binary at cmd/coffeyd that does both jobs: running
with no subcommand starts the daemon, and running with a subcommand connects
to the running daemon and acts as a client. Later on I might also add a
separate cmd/coffeyctl binary for strict client-only use, so hosts that only
need the control interface do not have to ship the daemon code. That is
undecided for now.
Once the daemon is running it listens on a unix socket at /run/coffey.sock
(configurable in the main config). Every client subcommand connects to that
socket to send a command and read the response. If the daemon is not running
the client prints a clear error and exits.
The CLI aims for human readable output first. Results print as plain text
formatted for a terminal, with only minimal errors and fatals going to stderr
so stdout stays clean for piping. A global -j, --json flag switches all
output to JSON for scripting and tooling. Verbosity is controlled with -v,
-vv, and -vvv, though the exact content of each level is still to be
decided.
# Planned subcommands, though likely to change:
coffeyd # start the daemon
coffeyd status # list all services
coffeyd status <service> # show details for one service
coffeyd start <service> # start a stopped service
coffeyd stop <service> # stop a running service
coffeyd restart <service> # restart a service
coffeyd logs <service> # tail logs (last 50 lines then stream)
coffeyd reload # reload config and apply the diff
coffeyd validate # validate and lint config files
coffeyd shutdown # gracefully stop all services and the daemon
coffeyd version # show version and build infoThis project uses just as a task runner.
just build # build static binaries for linux/amd64 and linux/arm64
just test # run all tests
just lint # run golangci-lint
just run # run coffeyd locally (forwards extra args)
just clean # clean the built artifacts
Some of the code in this repo will be generated with the help of large language models. Every change is read, tested, and reviewed before it goes in.
More and more AI-generated code is ending up in production systems. Sometimes it works well, other times it causes absolute disasters. A process supervisor that must never crash is a good test case for that tension: the code generation is non-deterministic, but the system it produces has to be completely reliable. Learning how to bridge that gap, through review, testing, defensive rules, and knowing when to throw the output away, is a big part of what this project is for.
MIT. See LICENSE.