Skip to content

refactor!: collapse state column + release_stuck into lease-only delivery#2

Merged
lesnik512 merged 1 commit intomainfrom
refactor/lease-only-delivery
May 7, 2026
Merged

refactor!: collapse state column + release_stuck into lease-only delivery#2
lesnik512 merged 1 commit intomainfrom
refactor/lease-only-delivery

Conversation

@lesnik512
Copy link
Copy Markdown
Member

The state column duplicated information already carried by acquired_token: state='processing' was equivalent to acquired_token IS NOT NULL. The redundancy required a third async loop (_release_stuck_loop) to flip stuck rows back to pending plus a Postgres advisory lock to coordinate that loop across replicas.

Replace both with a single lease check inside the fetch CTE: a row is available iff its lease is unset or expired (acquired_at < now() - lease_ttl_seconds). This deletes ~50 lines, drops one async task per subscriber, and shrinks worst-case crash recovery from ~7.5 min to ~60s with the new default TTL.

Breaking changes:

  • OutboxState enum removed from public exports.
  • state column dropped from make_outbox_table.
  • OutboxClient.release_stuck removed.
  • release_stuck_timeout + release_stuck_interval kwargs replaced with single lease_ttl_seconds: float = 60.0 (was release_stuck_timeout=300.0) on subscriber(), OutboxRoute, and create_subscriber.
  • OutboxClient.fetch requires new lease_ttl_seconds kwarg.

Migration:

ALTER TABLE outbox DROP CONSTRAINT outbox_state_check;
ALTER TABLE outbox DROP COLUMN state;
DROP INDEX IF EXISTS outbox_pending_idx;
CREATE INDEX outbox_pending_idx ON outbox (queue, next_attempt_at)
  WHERE acquired_token IS NULL;

103 tests pass with 100% coverage.

…very

The `state` column duplicated information already carried by `acquired_token`:
`state='processing'` was equivalent to `acquired_token IS NOT NULL`. The
redundancy required a third async loop (`_release_stuck_loop`) to flip stuck
rows back to `pending` plus a Postgres advisory lock to coordinate that loop
across replicas.

Replace both with a single lease check inside the fetch CTE: a row is available
iff its lease is unset *or* expired (`acquired_at < now() - lease_ttl_seconds`).
This deletes ~50 lines, drops one async task per subscriber, and shrinks
worst-case crash recovery from ~7.5 min to ~60s with the new default TTL.

Breaking changes:

- `OutboxState` enum removed from public exports.
- `state` column dropped from `make_outbox_table`.
- `OutboxClient.release_stuck` removed.
- `release_stuck_timeout` + `release_stuck_interval` kwargs replaced with
  single `lease_ttl_seconds: float = 60.0` (was `release_stuck_timeout=300.0`)
  on `subscriber()`, `OutboxRoute`, and `create_subscriber`.
- `OutboxClient.fetch` requires new `lease_ttl_seconds` kwarg.

Migration:

    ALTER TABLE outbox DROP CONSTRAINT outbox_state_check;
    ALTER TABLE outbox DROP COLUMN state;
    DROP INDEX IF EXISTS outbox_pending_idx;
    CREATE INDEX outbox_pending_idx ON outbox (queue, next_attempt_at)
      WHERE acquired_token IS NULL;

103 tests pass with 100% coverage.
@lesnik512 lesnik512 merged commit 8157336 into main May 7, 2026
3 checks passed
@lesnik512 lesnik512 deleted the refactor/lease-only-delivery branch May 7, 2026 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant