Rollbridge is a Node.js process supervisor and local traffic switcher for zero-downtime deploys.
Nginx points at one stable Rollbridge proxy port. Deploy tooling asks Rollbridge to start a new release, health-check it, switch new traffic to it, and drain old HTTP/WebSocket connections before stopping the previous release.
npm install rollbridgeFor local development in this repository:
npm install
npm run all-checksA Rollbridge config is a JavaScript module that export defaults a config
object. It can also export a function (sync or async) that returns the object,
which is handy for computing values from the environment. Write it in your
project's module system — export default for ESM ("type": "module") or
module.exports for CommonJS.
// rollbridge.js
export default {
application: "ticket-server",
control: {
path: "/tmp/rollbridge-ticket-server.sock"
},
proxy: {
host: "127.0.0.1",
port: 8182,
upstreamHost: "127.0.0.1",
healthPath: "/ping",
healthTimeoutMs: 30000,
drainTimeoutMs: 60000,
forceStopTimeoutMs: 10000
},
processes: [
{
id: "beacon",
policy: "companion",
cwd: "{{releasePath}}",
command: "env VELOCIOUS_BEACON_PORT={{port}} npx velocious beacon",
port: {from: 17330, to: 17399}
},
{
id: "background-jobs-worker",
policy: "companion",
cwd: "{{releasePath}}",
command: "npx velocious background-jobs-worker",
outputLines: 200
},
{
id: "background-jobs-main",
policy: "service",
cwd: "{{releasePath}}",
command: "npx velocious background-jobs-main"
},
{
id: "web",
policy: "proxied",
cwd: "{{releasePath}}",
command: "npx velocious server --host 127.0.0.1 --port {{port}}",
port: {from: 18182, to: 18299},
health: {path: "/ping", timeoutMs: 30000}
}
]
}Each process retains its most recent stdout/stderr lines and reports them in
status. Set outputLines (a positive integer, default 50) per process to keep
more or fewer lines for chatty or quiet processes.
Set control.mode to an octal permission string (for example "660") to
chmod the control socket after it binds. This restricts which users can send
control commands — useful when several deploy users share a group. When unset,
the socket keeps the default permissions from the daemon's umask. Pair it with
control.owner and control.group (a numeric id or a user/group name) to
chown the socket to a shared deploy group; names resolve via
/etc/passwd//etc/group, and the daemon must run as a user allowed to chown it.
Set the proxied process's health.startDelayMs (default 0) to wait that long
after the process starts before the first health probe — like a readiness
probe's initial delay, useful for apps with a known boot time. The delay runs
before the health.timeoutMs window begins.
Set a process's restart policy to control automatic restarts after a crash.
restart.maxRestarts caps how many restarts are allowed within restart.windowMs
before Rollbridge gives up and leaves the process failed (maxRestarts: 0
disables restarts entirely), while restart.backoffFactor — with an optional
restart.maxDelayMs cap — backs off the restartDelayMs delay on each successive
restart. With no restart block, a crashed process keeps restarting after
restartDelayMs, as before. See docs/config.md.
restart: {maxRestarts: 5, windowMs: 60000, backoffFactor: 2, maxDelayMs: 30000}Set a process's memory policy to supervise its resident memory (RSS) and
gracefully restart it when it grows too large. memory.limitBytes is the RSS
limit (measured across the whole process group, not just the wrapper);
memory.warnBytes logs a warning before the limit; memory.checkIntervalMs
(default 5000) sets how often RSS is sampled. A memory restart is reported in
status and recorded in events (a process started with reason: "memory").
See docs/config.md.
memory: {limitBytes: 536870912, warnBytes: 402653184, checkIntervalMs: 5000}Set a process's stopSignal (default "SIGTERM") to the signal it quiets on, so
a worker finishes its in-flight work before exiting. Rollbridge sends stopSignal
to gracefully stop the process and SIGKILLs it only if it hasn't exited within
gracefulStopMs. For example, a job worker that drains on SIGINT:
{id: "worker", policy: "companion", command: "…", stopSignal: "SIGINT", gracefulStopMs: 60000}Set replicas on a port-less companion to run a pool of identical workers.
Each instance runs as <id>#<index> (worker#0, worker#1, …) — visible in
status and targetable by rollbridge restart (base id for all, worker#0 for
one) — and gets {{replicaIndex}}/{{replicaCount}} and
ROLLBRIDGE_REPLICA_INDEX/_COUNT so each instance can pick a distinct shard or
queue. See docs/config.md.
{id: "worker", policy: "companion", command: "npx velocious background-jobs-worker", replicas: 4}For workers that quiesce or drain via a command, set a lifecycle block —
Rollbridge runs quietCommand, then drains (drainCommand/drainTimeoutMs),
then stopCommand/stopSignal, then SIGKILL after gracefulStopMs when
gracefully stopping the process. Each hook is bounded so it can't wedge a stop.
Set nonBlockingDrain: true on a worker companion to start its graceful stop the
moment its release is retired — in parallel with the proxied connection drain,
not after it — so new workers handle new work while the old workers finish theirs.
See docs/workers.md for the full safe background-job worker
deployment pattern — companion policy, replicas, and finishing in-flight jobs
on deploy with stopSignal/lifecycle + gracefulStopMs.
Set releaseRetention to bound how many stopped (drained) releases the daemon
keeps in memory and reports in status. keep (default 10) retains the most
recent stopped releases; maxAgeMs (default 0, disabled) also prunes stopped
releases older than that many milliseconds. The active and draining releases are
never pruned. This is Rollbridge's own release records — your deploy tool still
owns cleaning up on-disk release directories.
releaseRetention: {keep: 5, maxAgeMs: 86400000}Set statePath to have the daemon persist its state to a file (active/draining
releases, process pids, counters, recent events). On the next startup it reads
any leftover file and reports managed processes still alive from a daemon that
didn't shut down cleanly — advisory orphan detection. After a crash, run
rollbridge recover to list those leftovers and rollbridge recover --force to
stop them before restarting the daemon. A clean shutdown removes the file. See
docs/config.md.
statePath: "/var/lib/rollbridge/ticket-server.state.json"During the first migration from an old supervisor, set legacyTakeover and run
rollbridge predeploy-cleanup --release-path <path> before rollbridge deploy.
Rollbridge will only stop configured legacy processes when no reusable active
Rollbridge release is running.
legacyTakeover: {
screens: ["ticket-server"],
processes: [
{name: "legacy web", includes: ["/home/dev/ticket-server/", "velocious server", "--port 8082"]}
]
}A function export receives no arguments and lets you build the config at load time:
// rollbridge.js
export default () => ({
application: process.env.APP_NAME || "ticket-server",
control: {path: `/tmp/rollbridge-${process.env.APP_NAME || "ticket-server"}.sock`},
proxy: {host: "127.0.0.1", port: 8182},
processes: [
{id: "web", policy: "proxied", cwd: "{{releasePath}}", command: "npx velocious server --port {{port}}", port: {from: 18182, to: 18299}}
]
})A process command, cwd, and env values support {{...}} placeholders
rendered when the process starts:
{{releasePath}},{{releaseId}},{{revision}},{{application}},{{processId}}{{port}}— the port allocated to this process;{{ports.<id>}}— another process's allocated port{{proxy.host}},{{proxy.port}},{{proxy.upstreamHost}}{{env.<NAME>}}— a variable from the daemon's own environment, e.g.{{env.HOME}}
Referencing a placeholder with no value (including an unset {{env.<NAME>}})
fails the process start with a clear error, so typos surface immediately.
Production-ready examples live in examples/, including
examples/tensorbuzz.com.js for the current TensorBuzz backend deployment; see
docs/tensorbuzz-runbook.md for the matching
production runbook (ports, deploy ordering, rollback constraints, and day-to-day
operations).
See docs/velocious.md for a Velocious deployment guide —
how Beacon, background-jobs-main, background-jobs-worker, and the web process map
to Rollbridge policies, with startup ordering and deploy behavior.
See docs/config.md for the full config reference — every
field, its default, validation rules, template variables, and the environment
variables Rollbridge injects.
Every process declares a policy that controls its lifecycle. Pick one per
process:
| You need… | Use |
|---|---|
| The process that receives external HTTP/WebSocket traffic | proxied |
| A per-release helper tied to the release lifecycle | companion |
| Exactly one instance, never overlapping across deploys | singleton |
| A long-lived shared broker that survives deploys | service |
The web/API process — exactly one per config. Rollbridge forwards HTTP and
WebSocket traffic to the active release's proxied process and tracks open
connections so they can be drained on the next deploy. It must define a port
range, is health-checked before traffic switches to a new release, and is
auto-restarted while its release is active.
{
id: "web",
policy: "proxied",
cwd: "{{releasePath}}",
command: "npx velocious server --host 127.0.0.1 --port {{port}}",
port: {from: 18182, to: 18299},
health: {path: "/ping", timeoutMs: 30000}
}A release-scoped helper (for example a background worker bound to one release). It starts before the proxied process in the same release, so release-local dependencies are ready before the health check, and it is auto-restarted while its release is active. Each release gets its own companions; a release's companions stop when that release is drained and retired after a newer release takes over.
{
id: "background-jobs-worker",
policy: "companion",
cwd: "{{releasePath}}",
command: "npx velocious background-jobs-worker",
gracefulStopMs: 60000
}A one-at-a-time helper for duplicate-unsafe schedulers or job dispatchers. After a new release becomes active, Rollbridge stops the old singleton and then starts the new one, so two copies never run at once. Use it when running the old and new copies simultaneously during a deploy would be unsafe.
{
id: "scheduler",
policy: "singleton",
cwd: "{{releasePath}}",
command: "npx velocious scheduler"
}A daemon-wide broker that should outlive individual releases — for example
Velocious Beacon or background-jobs-main. Rollbridge starts it once (before
release processes that depend on it), keeps it running across deploys, and gives
it a stable port that does not change between releases. After each successful
deploy its restart template is refreshed to the latest release, so if it crashes
it restarts from the newest good release. It keeps restarting until the daemon
shuts down.
{
id: "background-jobs-main",
policy: "service",
cwd: "{{releasePath}}",
command: "npx velocious background-jobs-main",
port: 7331
}On rollbridge deploy, Rollbridge:
- starts any
servicethat is not already running; - starts the new release's
companions, then itsproxiedprocess, and health-checks the proxied process; - switches new traffic to the new release;
- refreshes each
service's restart template to the new release; - replaces
singletons (stops the old one, then starts the new one); - drains the previous release's connections, then stops its
proxiedandcompanionprocesses.
If the new release fails to start or health-check, the previous release stays active and any service started during this deploy is rolled back.
--config is optional for every command. When omitted, Rollbridge looks for
rollbridge.js in the current directory. The examples below pass --config
explicitly, but rollbridge validate (or any command) works with no flag when a
rollbridge.js is present.
For machine-readable output, deploy, status, stop, shutdown, and
ensure-daemon already print JSON, and validate, doctor, and logs accept
a --json flag that switches their output to JSON (with the same exit codes),
so deploy tooling can parse results.
See docs/cli.md for the full per-command reference (every
option, default, output shape, and exit code).
Validate a config without starting the daemon:
rollbridge validate --config rollbridge.jsvalidate reports every config error at once with an example fix and exits
non-zero when issues are found, so deploy tooling can gate on it. It checks
required fields and types, duplicate process IDs, port ranges, that exactly one
process is proxied, and that the proxied process defines a port range. Example
output for a misconfigured file:
Found 2 configuration issues in rollbridge.js:
1. Config must define exactly one proxied process; found 0
Fix: Mark exactly one process with policy: proxied so Rollbridge knows where to forward traffic.
2. Duplicate process id: web
Fix: Give each process a unique id; "web" is used more than once.
Check the environment before starting the daemon:
rollbridge doctor --config rollbridge.jsdoctor validates the config and then probes the runtime environment, exiting
non-zero if any check fails (so deploy tooling can gate on it):
✓ config: valid: 4 processes, proxy on 127.0.0.1:8182
✓ control socket: no daemon running; /tmp/rollbridge-ticket-server.sock is free to bind
✓ control socket directory: /tmp is writable
✓ proxy port: 127.0.0.1:8182 is available
All checks passed.
A free control socket, a writable socket directory, and a bindable proxy port
pass. Because rollbridge daemon cannot bind a socket or port that is already
taken, doctor fails the relevant check when a Rollbridge daemon (or any other
process) is already listening on the control socket or holding the proxy port —
so a green doctor means a fresh daemon can actually start.
Start the daemon:
rollbridge daemon --config rollbridge.jsStart the daemon only when it is not already running:
rollbridge ensure-daemon --config rollbridge.js --daemon-log-path log/rollbridge.log --daemon-pid-path tmp/pids/rollbridge.pidDeploy a prepared release:
rollbridge deploy --config rollbridge.js --release-path /home/dev/ticket-server/releases/20260521073000/ticket-server --revision abc123Deploy and start the daemon first when needed:
rollbridge deploy --ensure-daemon --config rollbridge.js --release-path /home/dev/ticket-server/releases/20260521073000/ticket-server --revision abc123Inspect state:
rollbridge status --config rollbridge.jsstatus reports each managed process's state, pid, recent logs, last
exitCode/exitSignal, and — per process — its automatic-restart count
(restarts), last start time (startedAt), current uptimeMs while running,
and why it last started (lastStartReason: deploy, crash, manual, or
memory). The same reason appears on each process started entry in
rollbridge events. For memory-supervised processes it also reports current
rssBytes, memoryRestarts, lastMemoryRestartAt, and children (the sampled
process tree — each group member's pid, command, and rssBytes).
Print the recent captured stdout/stderr per process (a one-shot snapshot of the
retained outputLines, not a live stream):
rollbridge logs --config rollbridge.js
rollbridge logs --config rollbridge.js --process webPrint the daemon's recent structured event history — deploys, traffic switches, release stops, process crashes/restarts, and failed commands (the most recent 1000 events, in memory):
rollbridge events --config rollbridge.js
rollbridge events --config rollbridge.js --limit 20Stop the active release:
rollbridge stop --config rollbridge.jsRoll back to a previous release — re-starts it, health-checks it, and switches traffic back (defaults to the most recently retired release; a failed rollback leaves the current release active). Rollback manages processes only, not database migrations:
rollbridge rollback --config rollbridge.js # the previous release
rollbridge rollback --config rollbridge.js --release-id v3Restart non-proxied processes in place — all of them, one by id, or a policy
group (the proxied process is never restarted; use deploy for that):
rollbridge restart --config rollbridge.js # all non-proxied processes
rollbridge restart --config rollbridge.js --process background-jobs-worker
rollbridge restart --config rollbridge.js --policy companionShut down the daemon and managed processes:
rollbridge shutdown --config rollbridge.jsPrepare a first Rollbridge deploy by recovering Rollbridge-managed orphans and stopping configured legacy processes:
rollbridge predeploy-cleanup --config rollbridge.js --release-path /srv/app/currentEnable shell completion (bash or zsh) for command names and option flags:
source <(rollbridge completion bash) # add to ~/.bashrc
source <(rollbridge completion zsh) # add to ~/.zshrcNginx should proxy to Rollbridge, not directly to Velocious:
location / {
proxy_pass http://127.0.0.1:8182;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}See docs/nginx.md for the full guide — WebSocket upgrade
headers, timeouts for long-lived connections, forwarded headers, and common
failure modes (502/503, dropped WebSockets).
Run the long-lived daemon as a systemd service so it starts on boot and is
restarted if it crashes. A ready-to-edit unit lives at
examples/rollbridge.service:
sudo cp examples/rollbridge.service /etc/systemd/system/rollbridge.service
# edit User/Group, WorkingDirectory, the ExecStart path, and --config
sudo systemctl daemon-reload
sudo systemctl enable --now rollbridge
sudo systemctl status rollbridgeThe unit runs rollbridge daemon --config <stable-config> in the foreground,
so its output goes to the journal (journalctl -u rollbridge). Key directives:
KillMode=mixed/KillSignal=SIGTERM: Rollbridge stops its own managed child process groups onSIGTERM, so systemd signals only the daemon and lets it shut down gracefully before escalating toSIGKILL.TimeoutStopSec: give the daemon time to stop its managed processes; size it above the largest processgracefulStopMs(the daemonSIGKILLs stragglers after that). Note thatsystemctl stop/reboot stops processes but does not drain HTTP/WebSocket connections — connection draining happens only duringrollbridge deployrelease transitions.
The daemon is long-lived and survives deploys. Deploy with
rollbridge deploy (or rollbridge deploy --ensure-daemon), not
systemctl restart — pointing --config at a stable, daemon-wide file while
release paths are passed per deploy. Use command -v rollbridge to find the
absolute CLI path for ExecStart.
See docs/logging.md for where the daemon's JSON logs go
(stdout / journald / the --daemon-log-path file) and how to rotate them — the
daemon holds its log file open, so logrotate needs copytruncate.
Run migrations before rollbridge deploy, and keep migrations backwards-compatible while old and new web releases overlap. For stable local brokers such as Velocious Beacon or background-jobs-main, use service when the process should survive deploys and restart from the latest successful release if it crashes.
See docs/deploy-recipes.md for ready-to-use shell, CI, and Capistrano recipes that drive Rollbridge through its CLI, and docs/troubleshooting.md for diagnosing health-check failures, port conflicts, stale sockets, crash loops, and stuck draining releases.
Maintainers can publish a patch release from the latest default branch:
npm run release:patchThe release script owns the package version bump, lockfile update, default-branch
commit, push, and npm publish. Do not run npm version manually before running
it.
See docs/releasing.md for the maintainer release checklist
— the pre-flight checks before npm run release:patch and what to verify after.
Rollbridge is released under the MIT License.