Skip to content

Retry and Dead Letter

sarmakska edited this page Jun 7, 2026 · 2 revisions

Retry and Dead-Letter

Delivery is decoupled from the request. POST /hooks/:source enqueues the rendered message and returns 202 immediately. A background worker drains the queue, retrying failed deliveries with exponential backoff, and writes anything that fails every attempt to a durable dead-letter inbox.

Retry queue

The queue (src/queue.js) attempts each job up to RETRY_MAX_ATTEMPTS times. Between attempts it waits an exponentially growing delay with full jitter:

delay(attempt) = random(0, min(baseDelayMs * 2^attempt, maxDelayMs))
Variable Default Meaning
RETRY_MAX_ATTEMPTS 5 Attempts before dead-lettering
RETRY_BASE_DELAY_MS 500 Base delay, doubled each attempt
RETRY_MAX_DELAY_MS 30000 Cap on a single backoff delay

Full jitter spreads retries randomly within the window, so a provider recovering from an outage is not hit by a synchronised burst. Only an email send failure triggers a retry; Slack and Telegram fan-out failures are logged and ignored.

The GET / endpoint reports the current queue depth, and GET /dead-letter reports the count of recorded failures.

Dead-letter inbox

When a job exhausts every attempt, it is recorded in the dead-letter inbox (src/deadletter.js):

  • Appended to a JSON Lines file at DEAD_LETTER_FILE (default ./data/dead-letter.jsonl) so it survives a restart.
  • Held in a bounded in-memory ring (the most recent 100 by default) so the listing endpoint is fast.

Each entry looks like:

{
  "id": "ltm3k9-a1b2c3",
  "ts": "2026-05-31T10:00:00.000Z",
  "source": "stripe",
  "subject": "Invoice paid: 99.00 GBP",
  "attempts": 5,
  "error": "resend rejected: invalid api key",
  "payload": { "type": "invoice.paid", "data": { "object": { "amount_paid": 9900 } } }
}

Browse failures

curl http://localhost:3000/dead-letter         # most recent 50
curl 'http://localhost:3000/dead-letter?limit=200'

The endpoint returns failures most recent first. Because entries contain the original payload, keep this endpoint behind your platform auth or a private network if your payloads are sensitive.

Replay failures (built-in endpoint)

Set WEBHOOK_REPLAY_TOKEN to enable an authenticated replay endpoint. It re-renders a stored failure from its saved payload and re-enqueues it for delivery. This skips the verifier and the source entirely, so it is the right tool once you have fixed a template or a flaky provider has recovered.

# find the id of the failure you want to replay
curl http://localhost:3000/dead-letter | jq -r '.items[0].id'

# replay it
curl -X POST http://localhost:3000/dead-letter/<id>/replay \
  -H "Authorization: Bearer $WEBHOOK_REPLAY_TOKEN"

A success returns 202 {"ok":true,"replayed":true} and removes the entry from the in-memory inbox. The original line stays in the JSONL audit log on disk, so the record of the failure is never lost. Behaviour by case:

Condition Response
WEBHOOK_REPLAY_TOKEN unset 404 (endpoint disabled)
Missing or wrong bearer token 401
Unknown id (aged out of the ring, or never existed) 404
Template now returns { skip: true } 200 {"replayed":false,"skipped":true}, entry removed
Valid token and id 202, entry re-enqueued and removed

The token is compared in constant time. Because the in-memory ring holds the most recent 100 failures by default, replay the endpoint targets those; older entries live only in the JSONL file.

Bulk replay from the file

The stored payload is the original webhook body, so a full file replay is a short script that re-POSTs each line through the public hooks endpoint:

while read -r line; do
  src=$(echo "$line" | jq -r .source)
  echo "$line" | jq -c .payload | \
    curl -sS -X POST "http://localhost:3000/hooks/$src" \
      -H "Content-Type: application/json" --data-binary @-
  echo
done < data/dead-letter.jsonl

If WEBHOOK_SECRET is set you will need to sign each replayed request, since the verifier runs on the hooks endpoint. The built-in /dead-letter/:id/replay endpoint avoids that because it re-enqueues directly.

Graceful shutdown

On SIGTERM or SIGINT the service flushes any undelivered jobs still in the queue to the dead-letter inbox before exiting, so a planned restart or redeploy never silently drops queued work. A hard crash can still lose a job that is mid-retry; if you need durability across crashes, put a real broker in front.

Persistence in containers

The Docker image creates /app/data, and the docker-compose file mounts a named volume there. Mount a volume at /app/data (or point DEAD_LETTER_FILE elsewhere) on any platform so the inbox survives redeploys.

Clone this wiki locally