Skip to content

feat!: durable SQLx lease lock for QueuedRepository (postgres + sqlite)#50

Merged
patrickleet merged 2 commits into
mainfrom
feat/persistent-lock-sqlx
May 31, 2026
Merged

feat!: durable SQLx lease lock for QueuedRepository (postgres + sqlite)#50
patrickleet merged 2 commits into
mainfrom
feat/persistent-lock-sqlx

Conversation

@patrickleet
Copy link
Copy Markdown
Collaborator

@patrickleet patrickleet commented May 31, 2026

What & why

QueuedRepository serializes per-aggregate read/modify/write through an AsyncLockManager, but the only implementation was InMemoryAsyncLockManager — process-local and lost on restart. This adds the first durable, cross-process lock manager, the direction already sanctioned by docs/postgres-event-store.md §Locking Model and specs/persistent-repository-plan.

Implements [[tasks/persistent-lock-sqlx]].

What shipped

PostgresLockManager / SqliteLockManager — durable per-stream leases in a new aggregate_locks table, feature-gated like the existing SQLx repos. Drop-in:

let repo  = PostgresRepository::connect_and_migrate(&url).await?;
let locks = PostgresLockManager::new(repo.pool().clone());
let todos = repo.queued_async_with(locks).async_aggregate::<Todo>();
  • Lease table, not advisory locks — portable across both backends, doesn't pin a pooled connection, and fits the getcommit hold (which can't live inside one DB transaction).
  • Each per-key lock layers an in-process gate (InMemoryAsyncLock) over the DB lease: same-process tasks serialize with true wakeups (no DB polling); only the local winner contends cross-process.
  • Acquire is a single atomic conditional upsert (INSERT … ON CONFLICT … WHERE expired OR own-token … RETURNING) using the database clock (extract(epoch from now()) / unixepoch('now','subsec')) — one authoritative clock, no cross-process skew. Release is owner-token scoped, so it never frees a holder that reclaimed an expired lease. SQLite treats SQLITE_BUSY as contention.
  • Tunables: with_lease_ttl / with_retry_interval / with_max_wait / with_owner_id; migrate() for standalone use; sweep_expired() for cold-row GC.

It is a mutual-exclusion optimization, not a fencing guarantee — the event store's (aggregate_type, aggregate_id, sequence) primary key remains the authoritative concurrency boundary (a stale holder fails its optimistic commit rather than corrupting data). No lease renewal in v1 — set lease_ttl above the worst-case critical section.

Breaking change

AsyncLock::{try_lock,unlock} are now async (a durable lock releases/acquires via DB I/O), which makes AsyncUnlockableRepository and AsyncAggregateRepository::{abort,unlock} async too. Callers must .await them. The lock surface is now async-only, consistent with the framework-wide async-only migration. InMemoryAsyncLock keeps its sync mechanics as private cores wrapped in ready futures (the Waker reentrancy/poison regression tests are preserved).

Review

Ran a multi-agent adversarial review of the implementation (4 lenses + verification); 13 findings confirmed, 12 fixed + 1 nit deferred. Notable fixes: commit_batch_async now best-effort on release (a committed write must not fail on lock cleanup); store_token made infallible to remove a gate-leak edge; added free-key DB-race, max_wait timeout, and sweep_expired tests on both backends.

Verification

  • lock:: unit 7/7 · queued_repo_async 4/4 · todos 13/13
  • SQLite sql_lock_manager 10/10 (unconditional) · Postgres sql_lock_manager 10/10 (against a live PG) · skips cleanly without DATABASE_URL
  • full --all-features suite 584/0
  • cargo clippy --all-features --all-targets -- -D warnings clean · cargo fmt --check clean

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Durable SQL-backed lease locks for Postgres and SQLite with configurable TTL, retry, max-wait, owner scoping, and expired-lease cleanup
    • In-process cached lock handles and shared lease helpers for consistent cross-process coordination
  • Breaking Changes

    • Lock and repository release APIs are now asynchronous (call sites must await)
  • Documentation

    • README and docs updated with SQL-backed lock usage examples and migration guidance
  • Tests

    • Expanded SQL lock conformance and repository tests for both Postgres and SQLite

Add PostgresLockManager / SqliteLockManager — durable, cross-process
AsyncLockManager implementations backed by an `aggregate_locks` lease
table — so a QueuedRepository can serialize per-aggregate access across
processes, not just within one. Drop-in via `.queued_async_with(...)`.

Each per-key lock layers an in-process gate (InMemoryAsyncLock) over the
DB lease: same-process tasks serialize with true wakeups (no DB polling),
and only the local winner contends cross-process. Acquire is a single
atomic conditional upsert using the database clock (no cross-process
skew); release is owner-token scoped so it never frees a holder that
reclaimed an expired lease. It is a mutual-exclusion optimization, not a
fence — the event-store sequence PK remains the authoritative boundary.
v1 has no lease renewal; `sweep_expired` reclaims cold rows.

BREAKING CHANGE: `AsyncLock::{try_lock,unlock}` are now async (a durable
lock releases/acquires via I/O), which makes `AsyncUnlockableRepository`
and `AsyncAggregateRepository::{abort,unlock}` async as well. Callers must
`.await` these. The lock surface is now async-only.

Implements [[tasks/persistent-lock-sqlx]]

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: adfdf648-8b71-4f11-b236-6e0bbe912ba6

📥 Commits

Reviewing files that changed from the base of the PR and between 21a8cef and 455efd0.

📒 Files selected for processing (4)
  • src/aggregate/async_aggregate.rs
  • src/lock/async_in_memory.rs
  • src/lock/sqlx_common.rs
  • tests/sql_lock_manager/main.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/sql_lock_manager/main.rs
  • src/lock/sqlx_common.rs
  • src/lock/async_in_memory.rs

📝 Walkthrough

Walkthrough

This PR introduces durable SQL-backed lease-table locking for cross-process event sourcing, converts AsyncLock operations to async futures, implements Postgres/SQLite lock managers with shared lease helpers, adds migrations, updates repository APIs/tests, and expands documentation.

Changes

Durable lease-table locking integration

Layer / File(s) Summary
AsyncLock trait conversion and InMemory core helpers
src/lock/async_lock.rs, src/lock/async_in_memory.rs
try_lock and unlock converted from synchronous Result to asynchronous Send futures. InMemoryAsyncLock adds try_lock_core/unlock_core and moves waker wake-after-unlock out of mutex. Tests updated to async form.
Shared SQLx lease infrastructure
src/lock/sqlx_common.rs
Adds LeaseConfig, LockShared, LeaseBackend, token minting, deterministic jitter, and lease_lock/lease_try_lock/lease_unlock implementing in-process gating plus DB lease polling/release.
Database schema for durable leases
migrations/postgres/0001_initial.sql, migrations/sqlite/0001_initial.sql
Add aggregate_locks table (lock_key, owner_token, acquired_at, expires_at) with CHECKs and an expires_at index to support lease acquisition/stealing and expiry sweeps.
PostgreSQL lock manager
src/lock/postgres_lock.rs
PostgresLockManager caches per-key locks, offers configuration and migration/sweep APIs. PostgresLock implements LeaseBackend using atomic INSERT...ON CONFLICT upsert that conditionally steals expired leases and token-scoped release.
SQLite lock manager
src/lock/sqlite_lock.rs
SqliteLockManager mirrors Postgres manager. SqliteLock implements LeaseBackend with conditional SQLite upsert/RETURNING, treats SQLite busy as contention, and performs token-scoped releases.
Repository async lock release API
src/queued_repo/repository.rs, src/aggregate/async_aggregate.rs
AsyncUnlockableRepository::unlock/abort converted to return futures; QueuedRepository updated to async releases and performs best-effort async unlock on commit success (ignores DB release errors). AsyncAggregateRepository methods converted to async.
Public API wiring and exports
src/lock/mod.rs, src/lib.rs, Cargo.toml
Feature-gated modules for SQL lock managers and shared SQLx helpers; crate-root re-exports for PostgresLock/PostgresLockManager and SqliteLock/SqliteLockManager; tokio dependency enables time feature.
Comprehensive test suite
tests/sql_lock_manager/main.rs, tests/queued_repo_async/main.rs, tests/todos/main.rs
New parameterized async scenarios validating contention, distinct-key behavior, cached-handle identity, cross-manager serialization, expiry reclaim, owner-scoped release, queued-repo abort releasing locks, free-key race, max_wait timeout, and cancellation-safety. Existing tests updated for async abort/lock APIs.
Locking model documentation
README.md, docs/postgres-event-store.md
README updated to list durable SQL lease-based lock managers and add repo.abort(...).await examples and tuning guidance; docs/postgres-event-store.md Locking Model replaced with sequence-primary-key correctness plus lease-table optimization explanation.

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

  • patrickleet/sourced_rust#49: Broader async-only API consolidation that removed sync repository/lock methods; this PR converts lock-release APIs to async futures on that foundation.
  • patrickleet/sourced_rust#22: Earlier work on QueuedRepository lock lifecycle and abort semantics; this PR async-ifies and reworks those contracts.

"🧺
I nibble tokens, mint and bake,
I hop through leases for your sake,
Postgres, SQLite — gentle thump!
Locks released with soft async jump,
A rabbit cheers, then sips some tea."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main feature addition: durable SQLx lease locks for QueuedRepository with support for both postgres and sqlite backends, with a breaking change indicator.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/persistent-lock-sqlx

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@migrations/postgres/0001_initial.sql`:
- Around line 131-149: The new aggregate_locks table and its index were added
into the existing 0001_initial migration, which must not be edited for
already-applied databases; instead create a new forward-only migration that
contains the DDL for the aggregate_locks table (CREATE TABLE IF NOT EXISTS
aggregate_locks ... with the CHECKs and the index
aggregate_locks_expires_at_idx) and remove those lines from 0001_initial.sql so
the original migration remains unchanged.

In `@src/aggregate/async_aggregate.rs`:
- Around line 180-188: The abort() implementation in AsyncAggregateRepository
currently calls self.repo.unlock(&identity) so repository implementations never
see their abort() hook; change abort(&self, aggregate: &A) to forward to the
repository abort method instead—compute identity with
stream_identity_for::<A>(aggregate.entity().id())? as before and call
self.repo.abort(&identity). Leave unlock(&self, id: &str) unchanged so it still
calls repo.unlock(&identity). Ensure you reference the abort and unlock methods
on the wrapped repository (repo.abort and repo.unlock) when making the change.

In `@src/lock/async_in_memory.rs`:
- Around line 128-135: The in-memory lock futures run their side effects
immediately because they return std::future::ready(self.try_lock_core()) /
ready(self.unlock_core()); change try_lock and unlock on InMemoryAsyncLock to
return lazy futures that only execute on poll by replacing the ready(...)
wrapper with an async block that calls the core methods (e.g. use an async move
{ self.try_lock_core() } and async move { self.unlock_core() } so the core
methods run when the future is awaited/polled), keeping the existing signatures
of try_lock and unlock.

In `@src/lock/sqlx_common.rs`:
- Around line 99-125: The timer in lease_lock is started after awaiting
backend.shared().gate.lock().await which means backend.config().max_wait doesn't
limit time spent waiting for the same-process gate; move started =
Instant::now() to the start of lease_lock (before gate.lock().await), then in
the retry loop compute remaining time = max - started.elapsed() (using
checked_sub) and use that remaining budget to decide timeout and to bound
tokio::time::sleep/jittered calls; keep the existing unlock and error paths
(unlock before returning Err) and continue using backend.db_acquire,
backend.mint_token, store_token, and jittered as before.
- Around line 99-160: The gate is not cancellation-safe: lease_lock and
lease_try_lock hold backend.shared().gate across await points (db_acquire,
sleep) so a dropped future can never release it; also max_wait timer starts
after acquiring the gate so it doesn't bound local contention. Add a
synchronous-drop guard (e.g., a struct Guard that calls the gate core unlock
synchronously or use a new unlock_now API on InMemoryAsyncLock) that is created
immediately after acquiring the gate and releases the gate in its Drop impl to
guarantee unlock on cancellation; use that guard in lease_lock and
lease_try_lock around the db_acquire/sleep paths and remove manual unlocks where
the guard covers them. Move starting of the max_wait timer (started =
Instant::now()) to before attempting gate.lock().await (or document/align
behavior) so the timeout includes time waiting for the in-process gate. In
lease_unlock, call backend.shared().gate.unlock (synchronous/unlock_now) before
awaiting backend.db_release(&token).await to avoid holding the gate across the
remote call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2b8a036d-7321-43e4-9f14-6b7674a834c7

📥 Commits

Reviewing files that changed from the base of the PR and between 7fbd065 and 21a8cef.

📒 Files selected for processing (18)
  • Cargo.toml
  • README.md
  • docs/postgres-event-store.md
  • migrations/postgres/0001_initial.sql
  • migrations/sqlite/0001_initial.sql
  • src/aggregate/async_aggregate.rs
  • src/lib.rs
  • src/lock/async_in_memory.rs
  • src/lock/async_lock.rs
  • src/lock/mod.rs
  • src/lock/postgres_lock.rs
  • src/lock/sqlite_lock.rs
  • src/lock/sqlx_common.rs
  • src/queued_repo/repository.rs
  • src/sqlx_repo/mod.rs
  • tests/queued_repo_async/main.rs
  • tests/sql_lock_manager/main.rs
  • tests/todos/main.rs

Comment thread migrations/postgres/0001_initial.sql
Comment thread src/aggregate/async_aggregate.rs
Comment thread src/lock/async_in_memory.rs Outdated
Comment thread src/lock/sqlx_common.rs
Comment thread src/lock/sqlx_common.rs Outdated
…_wait

- Cancellation-safe in-process gate: a `GateGuard` releases the gate from
  `Drop`, so a `lease_lock`/`lease_try_lock`/`lease_unlock` future dropped
  mid-`await` (cancellation/timeout) no longer wedges the key. Replaces the
  explicit-error-path-only gate release.
- `max_wait` now measured from entry and bounds the in-process gate wait too
  (was only applied to DB polling, after the gate was acquired).
- In-memory `try_lock`/`unlock` are now lazy `async fn` (side effect runs on
  poll, not at call time) — consistent with the I/O-backed locks; a dropped,
  never-awaited future is a no-op. `unlock_core` is `pub(crate)` for the guard.
- `AsyncAggregateRepository::abort` forwards to the repo's `abort` hook instead
  of `unlock`, so an `AsyncUnlockableRepository` overriding `abort` is honored.
- Tests: add cancellation-safety regression (cancelled acquire releases the
  gate) on both backends.

Migration-split suggestion intentionally not taken (owner decision): this crate
re-runs the full idempotent `0001_initial.sql` on every `migrate()` — there is
no applied-migration tracking — so adding `CREATE TABLE IF NOT EXISTS` to the
baseline is the correct pattern here.

Refs [[tasks/persistent-lock-sqlx]]

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@patrickleet patrickleet merged commit a1021c9 into main May 31, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant