Skip to content

kaspernj/rollbridge

Repository files navigation

Rollbridge

Rollbridge is a Node.js process supervisor and local traffic switcher for zero-downtime deploys.

Nginx points at one stable Rollbridge proxy port. Deploy tooling asks Rollbridge to start a new release, health-check it, switch new traffic to it, and drain old HTTP/WebSocket connections before stopping the previous release.

Install

npm install rollbridge

For local development in this repository:

npm install
npm run all-checks

Config

A Rollbridge config is a JavaScript module that export defaults a config object. It can also export a function (sync or async) that returns the object, which is handy for computing values from the environment. Write it in your project's module system — export default for ESM ("type": "module") or module.exports for CommonJS.

// rollbridge.js
export default {
  application: "ticket-server",

  control: {
    path: "/tmp/rollbridge-ticket-server.sock"
  },

  proxy: {
    host: "127.0.0.1",
    port: 8182,
    upstreamHost: "127.0.0.1",
    healthPath: "/ping",
    healthTimeoutMs: 30000,
    drainTimeoutMs: 60000,
    forceStopTimeoutMs: 10000
  },

  processes: [
    {
      id: "beacon",
      policy: "companion",
      cwd: "{{releasePath}}",
      command: "env VELOCIOUS_BEACON_PORT={{port}} npx velocious beacon",
      port: {from: 17330, to: 17399}
    },
    {
      id: "background-jobs-worker",
      policy: "companion",
      cwd: "{{releasePath}}",
      command: "npx velocious background-jobs-worker",
      outputLines: 200
    },
    {
      id: "background-jobs-main",
      policy: "service",
      cwd: "{{releasePath}}",
      command: "npx velocious background-jobs-main"
    },
    {
      id: "web",
      policy: "proxied",
      cwd: "{{releasePath}}",
      command: "npx velocious server --host 127.0.0.1 --port {{port}}",
      port: {from: 18182, to: 18299},
      health: {path: "/ping", timeoutMs: 30000}
    }
  ]
}

Each process retains its most recent stdout/stderr lines and reports them in status. Set outputLines (a positive integer, default 50) per process to keep more or fewer lines for chatty or quiet processes.

Set control.mode to an octal permission string (for example "660") to chmod the control socket after it binds. This restricts which users can send control commands — useful when several deploy users share a group. When unset, the socket keeps the default permissions from the daemon's umask. Pair it with control.owner and control.group (a numeric id or a user/group name) to chown the socket to a shared deploy group; names resolve via /etc/passwd//etc/group, and the daemon must run as a user allowed to chown it.

Set the proxied process's health.startDelayMs (default 0) to wait that long after the process starts before the first health probe — like a readiness probe's initial delay, useful for apps with a known boot time. The delay runs before the health.timeoutMs window begins.

Set a process's restart policy to control automatic restarts after a crash. restart.maxRestarts caps how many restarts are allowed within restart.windowMs before Rollbridge gives up and leaves the process failed (maxRestarts: 0 disables restarts entirely), while restart.backoffFactor — with an optional restart.maxDelayMs cap — backs off the restartDelayMs delay on each successive restart. With no restart block, a crashed process keeps restarting after restartDelayMs, as before. See docs/config.md.

restart: {maxRestarts: 5, windowMs: 60000, backoffFactor: 2, maxDelayMs: 30000}

Set a process's memory policy to supervise its resident memory (RSS) and gracefully restart it when it grows too large. memory.limitBytes is the RSS limit (measured across the whole process group, not just the wrapper); memory.warnBytes logs a warning before the limit; memory.checkIntervalMs (default 5000) sets how often RSS is sampled. A memory restart is reported in status and recorded in events (a process started with reason: "memory"). See docs/config.md.

memory: {limitBytes: 536870912, warnBytes: 402653184, checkIntervalMs: 5000}

Set a process's stopSignal (default "SIGTERM") to the signal it quiets on, so a worker finishes its in-flight work before exiting. Rollbridge sends stopSignal to gracefully stop the process and SIGKILLs it only if it hasn't exited within gracefulStopMs. For example, a job worker that drains on SIGINT:

{id: "worker", policy: "companion", command: "…", stopSignal: "SIGINT", gracefulStopMs: 60000}

Set replicas on a port-less companion to run a pool of identical workers. Each instance runs as <id>#<index> (worker#0, worker#1, …) — visible in status and targetable by rollbridge restart (base id for all, worker#0 for one) — and gets {{replicaIndex}}/{{replicaCount}} and ROLLBRIDGE_REPLICA_INDEX/_COUNT so each instance can pick a distinct shard or queue. See docs/config.md.

{id: "worker", policy: "companion", command: "npx velocious background-jobs-worker", replicas: 4}

For workers that quiesce or drain via a command, set a lifecycle block — Rollbridge runs quietCommand, then drains (drainCommand/drainTimeoutMs), then stopCommand/stopSignal, then SIGKILL after gracefulStopMs when gracefully stopping the process. Each hook is bounded so it can't wedge a stop.

Set nonBlockingDrain: true on a worker companion to start its graceful stop the moment its release is retired — in parallel with the proxied connection drain, not after it — so new workers handle new work while the old workers finish theirs.

See docs/workers.md for the full safe background-job worker deployment pattern — companion policy, replicas, and finishing in-flight jobs on deploy with stopSignal/lifecycle + gracefulStopMs.

Set releaseRetention to bound how many stopped (drained) releases the daemon keeps in memory and reports in status. keep (default 10) retains the most recent stopped releases; maxAgeMs (default 0, disabled) also prunes stopped releases older than that many milliseconds. The active and draining releases are never pruned. This is Rollbridge's own release records — your deploy tool still owns cleaning up on-disk release directories.

releaseRetention: {keep: 5, maxAgeMs: 86400000}

Set statePath to have the daemon persist its state to a file (active/draining releases, process pids, counters, recent events). On the next startup it reads any leftover file and reports managed processes still alive from a daemon that didn't shut down cleanly — advisory orphan detection. After a crash, run rollbridge recover to list those leftovers and rollbridge recover --force to stop them before restarting the daemon. A clean shutdown removes the file. See docs/config.md.

statePath: "/var/lib/rollbridge/ticket-server.state.json"

During the first migration from an old supervisor, set legacyTakeover and run rollbridge predeploy-cleanup --release-path <path> before rollbridge deploy. Rollbridge will only stop configured legacy processes when no reusable active Rollbridge release is running.

legacyTakeover: {
  screens: ["ticket-server"],
  processes: [
    {name: "legacy web", includes: ["/home/dev/ticket-server/", "velocious server", "--port 8082"]}
  ]
}

A function export receives no arguments and lets you build the config at load time:

// rollbridge.js
export default () => ({
  application: process.env.APP_NAME || "ticket-server",
  control: {path: `/tmp/rollbridge-${process.env.APP_NAME || "ticket-server"}.sock`},
  proxy: {host: "127.0.0.1", port: 8182},
  processes: [
    {id: "web", policy: "proxied", cwd: "{{releasePath}}", command: "npx velocious server --port {{port}}", port: {from: 18182, to: 18299}}
  ]
})

Template variables

A process command, cwd, and env values support {{...}} placeholders rendered when the process starts:

  • {{releasePath}}, {{releaseId}}, {{revision}}, {{application}}, {{processId}}
  • {{port}} — the port allocated to this process; {{ports.<id>}} — another process's allocated port
  • {{proxy.host}}, {{proxy.port}}, {{proxy.upstreamHost}}
  • {{env.<NAME>}} — a variable from the daemon's own environment, e.g. {{env.HOME}}

Referencing a placeholder with no value (including an unset {{env.<NAME>}}) fails the process start with a clear error, so typos surface immediately.

Production-ready examples live in examples/, including examples/tensorbuzz.com.js for the current TensorBuzz backend deployment; see docs/tensorbuzz-runbook.md for the matching production runbook (ports, deploy ordering, rollback constraints, and day-to-day operations).

See docs/velocious.md for a Velocious deployment guide — how Beacon, background-jobs-main, background-jobs-worker, and the web process map to Rollbridge policies, with startup ordering and deploy behavior.

See docs/config.md for the full config reference — every field, its default, validation rules, template variables, and the environment variables Rollbridge injects.

Process Policies

Every process declares a policy that controls its lifecycle. Pick one per process:

You need… Use
The process that receives external HTTP/WebSocket traffic proxied
A per-release helper tied to the release lifecycle companion
Exactly one instance, never overlapping across deploys singleton
A long-lived shared broker that survives deploys service

proxied

The web/API process — exactly one per config. Rollbridge forwards HTTP and WebSocket traffic to the active release's proxied process and tracks open connections so they can be drained on the next deploy. It must define a port range, is health-checked before traffic switches to a new release, and is auto-restarted while its release is active.

{
  id: "web",
  policy: "proxied",
  cwd: "{{releasePath}}",
  command: "npx velocious server --host 127.0.0.1 --port {{port}}",
  port: {from: 18182, to: 18299},
  health: {path: "/ping", timeoutMs: 30000}
}

companion

A release-scoped helper (for example a background worker bound to one release). It starts before the proxied process in the same release, so release-local dependencies are ready before the health check, and it is auto-restarted while its release is active. Each release gets its own companions; a release's companions stop when that release is drained and retired after a newer release takes over.

{
  id: "background-jobs-worker",
  policy: "companion",
  cwd: "{{releasePath}}",
  command: "npx velocious background-jobs-worker",
  gracefulStopMs: 60000
}

singleton

A one-at-a-time helper for duplicate-unsafe schedulers or job dispatchers. After a new release becomes active, Rollbridge stops the old singleton and then starts the new one, so two copies never run at once. Use it when running the old and new copies simultaneously during a deploy would be unsafe.

{
  id: "scheduler",
  policy: "singleton",
  cwd: "{{releasePath}}",
  command: "npx velocious scheduler"
}

service

A daemon-wide broker that should outlive individual releases — for example Velocious Beacon or background-jobs-main. Rollbridge starts it once (before release processes that depend on it), keeps it running across deploys, and gives it a stable port that does not change between releases. After each successful deploy its restart template is refreshed to the latest release, so if it crashes it restarts from the newest good release. It keeps restarting until the daemon shuts down.

{
  id: "background-jobs-main",
  policy: "service",
  cwd: "{{releasePath}}",
  command: "npx velocious background-jobs-main",
  port: 7331
}

Deploy ordering

On rollbridge deploy, Rollbridge:

  1. starts any service that is not already running;
  2. starts the new release's companions, then its proxied process, and health-checks the proxied process;
  3. switches new traffic to the new release;
  4. refreshes each service's restart template to the new release;
  5. replaces singletons (stops the old one, then starts the new one);
  6. drains the previous release's connections, then stops its proxied and companion processes.

If the new release fails to start or health-check, the previous release stays active and any service started during this deploy is rolled back.

Commands

--config is optional for every command. When omitted, Rollbridge looks for rollbridge.js in the current directory. The examples below pass --config explicitly, but rollbridge validate (or any command) works with no flag when a rollbridge.js is present.

For machine-readable output, deploy, status, stop, shutdown, and ensure-daemon already print JSON, and validate, doctor, and logs accept a --json flag that switches their output to JSON (with the same exit codes), so deploy tooling can parse results.

See docs/cli.md for the full per-command reference (every option, default, output shape, and exit code).

Validate a config without starting the daemon:

rollbridge validate --config rollbridge.js

validate reports every config error at once with an example fix and exits non-zero when issues are found, so deploy tooling can gate on it. It checks required fields and types, duplicate process IDs, port ranges, that exactly one process is proxied, and that the proxied process defines a port range. Example output for a misconfigured file:

Found 2 configuration issues in rollbridge.js:

1. Config must define exactly one proxied process; found 0
   Fix: Mark exactly one process with policy: proxied so Rollbridge knows where to forward traffic.

2. Duplicate process id: web
   Fix: Give each process a unique id; "web" is used more than once.

Check the environment before starting the daemon:

rollbridge doctor --config rollbridge.js

doctor validates the config and then probes the runtime environment, exiting non-zero if any check fails (so deploy tooling can gate on it):

✓ config: valid: 4 processes, proxy on 127.0.0.1:8182
✓ control socket: no daemon running; /tmp/rollbridge-ticket-server.sock is free to bind
✓ control socket directory: /tmp is writable
✓ proxy port: 127.0.0.1:8182 is available

All checks passed.

A free control socket, a writable socket directory, and a bindable proxy port pass. Because rollbridge daemon cannot bind a socket or port that is already taken, doctor fails the relevant check when a Rollbridge daemon (or any other process) is already listening on the control socket or holding the proxy port — so a green doctor means a fresh daemon can actually start.

Start the daemon:

rollbridge daemon --config rollbridge.js

Start the daemon only when it is not already running:

rollbridge ensure-daemon --config rollbridge.js --daemon-log-path log/rollbridge.log --daemon-pid-path tmp/pids/rollbridge.pid

Deploy a prepared release:

rollbridge deploy --config rollbridge.js --release-path /home/dev/ticket-server/releases/20260521073000/ticket-server --revision abc123

Deploy and start the daemon first when needed:

rollbridge deploy --ensure-daemon --config rollbridge.js --release-path /home/dev/ticket-server/releases/20260521073000/ticket-server --revision abc123

Inspect state:

rollbridge status --config rollbridge.js

status reports each managed process's state, pid, recent logs, last exitCode/exitSignal, and — per process — its automatic-restart count (restarts), last start time (startedAt), current uptimeMs while running, and why it last started (lastStartReason: deploy, crash, manual, or memory). The same reason appears on each process started entry in rollbridge events. For memory-supervised processes it also reports current rssBytes, memoryRestarts, lastMemoryRestartAt, and children (the sampled process tree — each group member's pid, command, and rssBytes).

Print the recent captured stdout/stderr per process (a one-shot snapshot of the retained outputLines, not a live stream):

rollbridge logs --config rollbridge.js
rollbridge logs --config rollbridge.js --process web

Print the daemon's recent structured event history — deploys, traffic switches, release stops, process crashes/restarts, and failed commands (the most recent 1000 events, in memory):

rollbridge events --config rollbridge.js
rollbridge events --config rollbridge.js --limit 20

Stop the active release:

rollbridge stop --config rollbridge.js

Roll back to a previous release — re-starts it, health-checks it, and switches traffic back (defaults to the most recently retired release; a failed rollback leaves the current release active). Rollback manages processes only, not database migrations:

rollbridge rollback --config rollbridge.js                  # the previous release
rollbridge rollback --config rollbridge.js --release-id v3

Restart non-proxied processes in place — all of them, one by id, or a policy group (the proxied process is never restarted; use deploy for that):

rollbridge restart --config rollbridge.js                      # all non-proxied processes
rollbridge restart --config rollbridge.js --process background-jobs-worker
rollbridge restart --config rollbridge.js --policy companion

Shut down the daemon and managed processes:

rollbridge shutdown --config rollbridge.js

Prepare a first Rollbridge deploy by recovering Rollbridge-managed orphans and stopping configured legacy processes:

rollbridge predeploy-cleanup --config rollbridge.js --release-path /srv/app/current

Enable shell completion (bash or zsh) for command names and option flags:

source <(rollbridge completion bash)   # add to ~/.bashrc
source <(rollbridge completion zsh)    # add to ~/.zshrc

Nginx

Nginx should proxy to Rollbridge, not directly to Velocious:

location / {
  proxy_pass http://127.0.0.1:8182;
  proxy_http_version 1.1;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection "upgrade";
  proxy_set_header Host $host;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;
}

See docs/nginx.md for the full guide — WebSocket upgrade headers, timeouts for long-lived connections, forwarded headers, and common failure modes (502/503, dropped WebSockets).

Running under systemd

Run the long-lived daemon as a systemd service so it starts on boot and is restarted if it crashes. A ready-to-edit unit lives at examples/rollbridge.service:

sudo cp examples/rollbridge.service /etc/systemd/system/rollbridge.service
# edit User/Group, WorkingDirectory, the ExecStart path, and --config
sudo systemctl daemon-reload
sudo systemctl enable --now rollbridge
sudo systemctl status rollbridge

The unit runs rollbridge daemon --config <stable-config> in the foreground, so its output goes to the journal (journalctl -u rollbridge). Key directives:

  • KillMode=mixed / KillSignal=SIGTERM: Rollbridge stops its own managed child process groups on SIGTERM, so systemd signals only the daemon and lets it shut down gracefully before escalating to SIGKILL.
  • TimeoutStopSec: give the daemon time to stop its managed processes; size it above the largest process gracefulStopMs (the daemon SIGKILLs stragglers after that). Note that systemctl stop/reboot stops processes but does not drain HTTP/WebSocket connections — connection draining happens only during rollbridge deploy release transitions.

The daemon is long-lived and survives deploys. Deploy with rollbridge deploy (or rollbridge deploy --ensure-daemon), not systemctl restart — pointing --config at a stable, daemon-wide file while release paths are passed per deploy. Use command -v rollbridge to find the absolute CLI path for ExecStart.

See docs/logging.md for where the daemon's JSON logs go (stdout / journald / the --daemon-log-path file) and how to rotate them — the daemon holds its log file open, so logrotate needs copytruncate.

Deployment Notes

Run migrations before rollbridge deploy, and keep migrations backwards-compatible while old and new web releases overlap. For stable local brokers such as Velocious Beacon or background-jobs-main, use service when the process should survive deploys and restart from the latest successful release if it crashes.

See docs/deploy-recipes.md for ready-to-use shell, CI, and Capistrano recipes that drive Rollbridge through its CLI, and docs/troubleshooting.md for diagnosing health-check failures, port conflicts, stale sockets, crash loops, and stuck draining releases.

Releasing

Maintainers can publish a patch release from the latest default branch:

npm run release:patch

The release script owns the package version bump, lockfile update, default-branch commit, push, and npm publish. Do not run npm version manually before running it.

See docs/releasing.md for the maintainer release checklist — the pre-flight checks before npm run release:patch and what to verify after.

License

Rollbridge is released under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors