From 001d36a524857a1d3b102eba9707c30fb080989c Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Fri, 26 Jun 2026 22:03:28 +0300 Subject: [PATCH 1/8] docs: plan retriable-error seam (Full-lane bundle) Co-Authored-By: Claude Opus 4.8 (1M context) --- .../design.md | 135 ++++++++++++++ .../plan.md | 176 ++++++++++++++++++ 2 files changed, 311 insertions(+) create mode 100644 planning/changes/2026-06-26.01-retriable-error-seam/design.md create mode 100644 planning/changes/2026-06-26.01-retriable-error-seam/plan.md diff --git a/planning/changes/2026-06-26.01-retriable-error-seam/design.md b/planning/changes/2026-06-26.01-retriable-error-seam/design.md new file mode 100644 index 0000000..9025e12 --- /dev/null +++ b/planning/changes/2026-06-26.01-retriable-error-seam/design.md @@ -0,0 +1,135 @@ +--- +summary: Give the retriable-error predicate its own seam — a pure is_retriable(exc) -> bool in db_retry/retriable.py — so classification is tested in memory instead of through a live Postgres round-trip. +--- + +# Design: Give the retriable-error predicate its own seam + +## Summary + +The decision "is this exception worth retrying?" is the deepest logic in the +package — it unwraps a SQLAlchemy `DBAPIError`, classifies the underlying +asyncpg error, and walks the `__cause__`/`__context__` chain to find a retriable +link even when re-wrapped. Today it lives in two private functions in +`retry.py` (`_is_retriable_dbapi_error`, `_retry_handler`) reachable only +through `@postgres_retry`, so its only test surface is a live Postgres that +raises a chosen SQLSTATE. This change moves the seam: a pure +`is_retriable(exc) -> bool` in a new `db_retry/retriable.py`, tested directly +with in-memory exception chains built from real types. `postgres_retry` becomes +a thin consumer of the seam; the integration suite shrinks to proving the +wiring. + +## Motivation + +- The retriability logic is deep but sits behind the wrong seam — **the + interface is not the test surface**. The classification matrix in + `tests/test_retry.py` (which SQLSTATEs retry, re-wrapped chains, attempt + counts) round-trips a real database via a `CREATE FUNCTION raise_error()` + stored proc to exercise a pure predicate. +- The SQLSTATE/asyncpg taxonomy ("serialization failures and lost connections + are transient; `40002` is not") is an inline `isinstance` tuple buried in a + boolean expression — no name, no single place to change. +- Verified the in-memory approach works: a real `sqlalchemy.exc.DBAPIError` + whose `.orig.__cause__` is a real `asyncpg.SerializationError`, optionally + re-wrapped in another exception via `__cause__`, drives the current predicate + to the correct verdict with no database. + +## Non-goals + +- No behaviour change to what counts as retriable — same SQLSTATE classes in, + same out. +- No change to the public surface: `is_retriable` is an **internal seam**, not + added to `__init__.py`'s `__all__`. The package keeps its five public symbols. +- No change to the retry loop's logging output — the two debug lines are + preserved, only relocated. +- Not modelling the taxonomy as richer data (SQLSTATE codes / rationale as a + structure) — the *why* lives in `architecture/retriable.md`; the *what* is a + named tuple of asyncpg classes. + +## Design + +### 1. New module `db_retry/retriable.py` + +A named constant and one pure public function (plus a private per-link helper): + +```python +RETRIABLE_ASYNCPG_ERRORS = (asyncpg.SerializationError, asyncpg.PostgresConnectionError) + +def is_retriable(exception: BaseException) -> bool: + """Walk __cause__/__context__; True if any link is a retriable DBAPIError.""" + current: BaseException | None = exception + seen: set[int] = set() + while current is not None and id(current) not in seen: + seen.add(id(current)) + if _is_retriable_link(current): + return True + current = current.__cause__ or current.__context__ + return False +``` + +`is_retriable` is **pure** — no logging, no side effects. `_is_retriable_link` +is the current `_is_retriable_dbapi_error` body, checking `DBAPIError` → +`.orig is not None` → `isinstance(.orig.__cause__, RETRIABLE_ASYNCPG_ERRORS)`. +The cycle guard (`seen` set of `id()`s) and the cause-first/context-second walk +order are preserved exactly. + +### 2. `retry.py` consumes the seam + +`postgres_retry` imports `is_retriable` and wraps it in a thin local predicate +that carries the relocated logging: + +```python +def _log_and_decide(exception: BaseException) -> bool: + if is_retriable(exception): + logger.debug("postgres_retry, retrying") + return True + logger.debug("postgres_retry, giving up on retry") + return False + +retry=tenacity.retry_if_exception(_log_and_decide) +``` + +`_is_retriable_dbapi_error` and `_retry_handler` are deleted from `retry.py`. +The `retriable.py` module keeps no logger. + +### 3. Architecture promotion + +- **New** `architecture/retriable.md` — documents `is_retriable`, the + `RETRIABLE_ASYNCPG_ERRORS` taxonomy, the cause-chain walk + cycle guard, and + **why `40002` (`StatementCompletionUnknownError`) is excluded** (outcome + unknown, blind retry unsafe). The cause-chain prose moves here from + `retry.md`. +- **Edit** `architecture/retry.md` — drop the "What counts as retriable" / + "Cause-chain walk" sections; leave a one-line pointer to `retriable.md` and + note the predicate is wired in via `_log_and_decide`. +- **Edit** `architecture/README.md` — add the `retriable.md` capability row. +- **New** `architecture/glossary.md` — authored lazily (first term): + **Retriable error**. + +## Testing + +- **New** `tests/test_retriable.py` — in-memory, no database. A small helper + builds a `DBAPIError` whose `.orig.__cause__` is a given asyncpg error. + Parametrized matrix covering: `SerializationError` (40001) → retriable; + `PostgresConnectionError` and a subclass (08000/08003) → retriable; a + non-retriable `PostgresError` → not; a re-wrapped chain (advanced-alchemy + style, real `DBAPIError` hung off a `RepositoryError.__cause__`) → retriable; + a `__context__`-only link → retriable; a `__cause__` cycle → terminates and + returns the right verdict; a bare non-DBAPI exception → not. +- **Edit** `tests/test_retry.py` — keep exactly two integration cases (one + retriable `40001` → retries, asserts attempt count; one non-retriable + `40002` → no retry) proving the decorator wires the predicate into tenacity + and `reraise=True` surfaces the original error. The exhaustive matrix and the + advanced-alchemy case move to `test_retriable.py`. +- `just lint-ci` passes; `just test` (Docker Postgres) green for the trimmed + integration cases. + +## Risk + +- **Low. Behaviour-preserving refactor.** The predicate body, walk order, and + cycle guard are copied verbatim; the named constant is the same tuple. Risk is + an accidental semantic drift during the move — mitigated by writing + `test_retriable.py` first (TDD) against the documented matrix, and by keeping + the two integration cases as a live-Postgres backstop on the wiring. +- **Logging relocation** could drop or duplicate a debug line — mitigated by + `_log_and_decide` reproducing both lines verbatim and `retriable.py` carrying + no logger. diff --git a/planning/changes/2026-06-26.01-retriable-error-seam/plan.md b/planning/changes/2026-06-26.01-retriable-error-seam/plan.md new file mode 100644 index 0000000..81bb0a3 --- /dev/null +++ b/planning/changes/2026-06-26.01-retriable-error-seam/plan.md @@ -0,0 +1,176 @@ +# retriable-error-seam — implementation plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use +> superpowers:subagent-driven-development (recommended) or +> superpowers:executing-plans to implement this plan task-by-task. Steps +> use checkbox (`- [ ]`) syntax for tracking. Each bug/behaviour is TDD — +> failing test first. + +**Goal:** Move the retriable-error decision behind a pure +`is_retriable(exc) -> bool` seam in `db_retry/retriable.py`, tested in memory, +with `postgres_retry` as a thin consumer. + +**Spec:** [`design.md`](./design.md) + +**Branch:** `feat/retriable-error-seam` + +**Commit strategy:** Per-task commits. + +--- + +### Task 1: Pin the domain term (glossary) + +**Files:** +- Create: `architecture/glossary.md` + +Author `architecture/glossary.md` lazily with its first term, **Retriable +error**, using the `planning/_templates/glossary.md` shape. + +- [ ] **Step 1: Write the term** + + Entry: **Retriable error** — a PostgreSQL failure transient enough to retry + the operation unchanged: a serialization failure (`40001`) or a lost + connection (class `08`). `_Avoid_:` transient error, recoverable error. + No implementation detail. + +- [ ] **Step 2: Commit** + + ```bash + git add architecture/glossary.md + git commit -m "docs: add Retriable error to glossary" + ``` + +--- + +### Task 2: New `retriable.py` seam, TDD + +**Files:** +- Create: `tests/test_retriable.py` +- Create: `db_retry/retriable.py` + +Build the pure predicate test-first. + +- [ ] **Step 1: Write the failing matrix** + + `tests/test_retriable.py`: a `_make_dbapi_error(cause)` helper building a + `DBAPIError("...", None, orig)` whose `orig.__cause__ = cause`. Parametrized + cases per the spec's Testing section — 40001, 08000, an 08-subclass + (`ConnectionDoesNotExistError`), non-retriable `PostgresError`, re-wrapped + `RepositoryError.__cause__` chain, `__context__`-only link, a `__cause__` + cycle, and a bare non-DBAPI exception. Import `is_retriable` and + `RETRIABLE_ASYNCPG_ERRORS` from `db_retry.retriable`. + + Run: `uv run pytest tests/test_retriable.py` → fails (ModuleNotFound). + +- [ ] **Step 2: Write the module to green** + + `db_retry/retriable.py`: `RETRIABLE_ASYNCPG_ERRORS` tuple, private + `_is_retriable_link`, pure `is_retriable` with the cause/context walk + cycle + guard. No logger. + + Run: `uv run pytest tests/test_retriable.py` → all pass. No DB needed. + +- [ ] **Step 3: Commit** + + ```bash + git add db_retry/retriable.py tests/test_retriable.py + git commit -m "feat: extract is_retriable predicate into retriable.py" + ``` + +--- + +### Task 3: Rewire `retry.py` onto the seam + +**Files:** +- Modify: `db_retry/retry.py` + +Replace the private predicate with a thin logging wrapper over `is_retriable`. + +- [ ] **Step 1: Edit** + + Import `is_retriable` from `db_retry.retriable`. Add `_log_and_decide` + (preserves both debug lines verbatim). Point `retry=tenacity.retry_if_exception` + at it. Delete `_is_retriable_dbapi_error` and `_retry_handler`. Remove the now + unused `asyncpg` / `DBAPIError` imports if no longer referenced. + +- [ ] **Step 2: Verify lint + types** + + Run: `just lint-ci` → passes (ruff, ty, eof, planning). + +- [ ] **Step 3: Commit** + + ```bash + git add db_retry/retry.py + git commit -m "refactor: wire postgres_retry through is_retriable seam" + ``` + +--- + +### Task 4: Trim the integration suite + +**Files:** +- Modify: `tests/test_retry.py` + +Move the classification matrix to Task 2; keep two wiring proofs. + +- [ ] **Step 1: Edit** + + Reduce to two integration cases against the live engine: `40001` (retriable → + retries, assert attempt count) and `40002` (non-retriable → single call, + original `DBAPIError` reraised). Keep `test_postgres_retry_with_retries` + (per-callsite `retries=` override). Delete the full parametrized matrix and + the advanced-alchemy case (now covered in `test_retriable.py`). + +- [ ] **Step 2: Verify** + + Run: `just test` (Docker Postgres) → green. `just lint-ci` → passes. + +- [ ] **Step 3: Commit** + + ```bash + git add tests/test_retry.py + git commit -m "test: trim retry integration suite to wiring proofs" + ``` + +--- + +### Task 5: Promote architecture docs + finalize bundle + +**Files:** +- Create: `architecture/retriable.md` +- Modify: `architecture/retry.md`, `architecture/README.md` +- Modify: `planning/changes/2026-06-26.01-retriable-error-seam/design.md` (summary) + +- [ ] **Step 1: Write `architecture/retriable.md`** + + Document `is_retriable`, `RETRIABLE_ASYNCPG_ERRORS`, the cause-chain walk + + cycle guard, and why `40002` is excluded. No frontmatter. + +- [ ] **Step 2: Edit `architecture/retry.md`** + + Drop "What counts as retriable" / "Cause-chain walk"; replace with a one-line + pointer to `retriable.md` and note the `_log_and_decide` wiring. + +- [ ] **Step 3: Edit `architecture/README.md`** + + Add the `retriable.md` capability row; update the `retry.md` row so it no + longer claims the predicate. + +- [ ] **Step 4: Finalize `summary:` and verify planning** + + Set `design.md` `summary:` to the realized result. Run `just check-planning` + → `planning: OK`. + +- [ ] **Step 5: Commit** + + ```bash + git add architecture/ planning/ + git commit -m "docs: promote retriable capability into architecture/" + ``` + +--- + +### Task 6: Ship + +- [ ] **Step 1:** `just lint-ci` and `just test` both green. +- [ ] **Step 2:** Push `feat/retriable-error-seam`, open PR, watch CI. From f16ec654d377e76d3f5d128fea7dd9dc810cb91a Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Fri, 26 Jun 2026 22:06:37 +0300 Subject: [PATCH 2/8] docs: add Retriable error to glossary --- architecture/glossary.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 architecture/glossary.md diff --git a/architecture/glossary.md b/architecture/glossary.md new file mode 100644 index 0000000..ce772bf --- /dev/null +++ b/architecture/glossary.md @@ -0,0 +1,13 @@ +# Glossary + +The project's ubiquitous language — the domain terms that code, specs, and +capability pages share. Living prose, no frontmatter, dated by git. Each entry is +a term, what it *is* (not what it does), and the synonyms to avoid. No +implementation detail; this is a glossary, not a spec. + +**Retriable error**: +A PostgreSQL failure transient enough to retry the operation unchanged: a +serialization failure or a lost connection. The operation may succeed if retried +without modification, because the failure does not reflect a logical error in the +request itself. +_Avoid_: transient error, recoverable error From a213488155497560f882625a4e76f40bb04c40ae Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Fri, 26 Jun 2026 22:15:20 +0300 Subject: [PATCH 3/8] feat: extract is_retriable predicate into retriable.py Pure is_retriable(exc) -> bool in db_retry/retriable.py; walks __cause__/__context__ chain with cycle guard to classify retriable DBAPIErrors without a live database. --- db_retry/retriable.py | 25 ++++++++++++++++++ tests/test_retriable.py | 57 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+) create mode 100644 db_retry/retriable.py create mode 100644 tests/test_retriable.py diff --git a/db_retry/retriable.py b/db_retry/retriable.py new file mode 100644 index 0000000..c7e154f --- /dev/null +++ b/db_retry/retriable.py @@ -0,0 +1,25 @@ +import asyncpg +from sqlalchemy.exc import DBAPIError + + +RETRIABLE_ASYNCPG_ERRORS = (asyncpg.SerializationError, asyncpg.PostgresConnectionError) + + +def _is_retriable_link(exception: BaseException) -> bool: + return ( + isinstance(exception, DBAPIError) + and exception.orig is not None + and isinstance(exception.orig.__cause__, RETRIABLE_ASYNCPG_ERRORS) + ) + + +def is_retriable(exception: BaseException) -> bool: + """Walk __cause__/__context__; True if any link is a retriable DBAPIError.""" + current: BaseException | None = exception + seen: set[int] = set() + while current is not None and id(current) not in seen: + seen.add(id(current)) + if _is_retriable_link(current): + return True + current = current.__cause__ or current.__context__ + return False diff --git a/tests/test_retriable.py b/tests/test_retriable.py new file mode 100644 index 0000000..afbc0d2 --- /dev/null +++ b/tests/test_retriable.py @@ -0,0 +1,57 @@ +import asyncpg +import pytest +from sqlalchemy.exc import DBAPIError + +from db_retry.retriable import RETRIABLE_ASYNCPG_ERRORS, is_retriable + + +def _make_dbapi_error(cause: BaseException) -> DBAPIError: + orig = Exception("db error") + orig.__cause__ = cause + return DBAPIError("SELECT 1", None, orig) + + +def test_retriable_asyncpg_errors_contains_expected_classes() -> None: + assert asyncpg.SerializationError in RETRIABLE_ASYNCPG_ERRORS + assert asyncpg.PostgresConnectionError in RETRIABLE_ASYNCPG_ERRORS + + +@pytest.mark.parametrize( + ("exception", "expected"), + [ + pytest.param(_make_dbapi_error(asyncpg.SerializationError()), True, id="serialization_error_40001"), + pytest.param(_make_dbapi_error(asyncpg.PostgresConnectionError()), True, id="postgres_connection_error_08000"), + pytest.param( + _make_dbapi_error(asyncpg.ConnectionDoesNotExistError()), True, id="connection_does_not_exist_08003" + ), + pytest.param(_make_dbapi_error(asyncpg.PostgresError()), False, id="non_retriable_postgres_error"), + pytest.param(ValueError("not a db error"), False, id="bare_non_dbapi_exception"), + ], +) +def test_is_retriable(exception: BaseException, expected: bool) -> None: + assert is_retriable(exception) == expected + + +def test_is_retriable_rewrapped_cause() -> None: + class RepositoryError(Exception): + pass + + dbapi_err = _make_dbapi_error(asyncpg.SerializationError()) + repo_err = RepositoryError("wrapped") + repo_err.__cause__ = dbapi_err + assert is_retriable(repo_err) is True + + +def test_is_retriable_context_only() -> None: + dbapi_err = _make_dbapi_error(asyncpg.SerializationError()) + wrapper = ValueError("wrapper") + wrapper.__context__ = dbapi_err + assert is_retriable(wrapper) is True + + +def test_is_retriable_cause_cycle_terminates() -> None: + a = ValueError("a") + b = ValueError("b") + a.__cause__ = b + b.__cause__ = a + assert is_retriable(a) is False From 0c27f6fee76391c144d65ea801fdfee547a839b4 Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Fri, 26 Jun 2026 22:20:50 +0300 Subject: [PATCH 4/8] refactor: wire postgres_retry through is_retriable seam --- db_retry/retry.py | 27 ++++++--------------------- 1 file changed, 6 insertions(+), 21 deletions(-) diff --git a/db_retry/retry.py b/db_retry/retry.py index fb640b4..f449fea 100644 --- a/db_retry/retry.py +++ b/db_retry/retry.py @@ -2,34 +2,19 @@ import logging import typing -import asyncpg import tenacity -from sqlalchemy.exc import DBAPIError from db_retry import settings +from db_retry.retriable import is_retriable logger = logging.getLogger(__name__) -def _is_retriable_dbapi_error(exception: BaseException) -> bool: - return ( - isinstance(exception, DBAPIError) - and exception.orig is not None - and isinstance(exception.orig.__cause__, (asyncpg.SerializationError, asyncpg.PostgresConnectionError)) - ) - - -def _retry_handler(exception: BaseException) -> bool: - current: BaseException | None = exception - seen: set[int] = set() - while current is not None and id(current) not in seen: - seen.add(id(current)) - if _is_retriable_dbapi_error(current): - logger.debug("postgres_retry, retrying") - return True - current = current.__cause__ or current.__context__ - +def _log_and_decide(exception: BaseException) -> bool: + if is_retriable(exception): + logger.debug("postgres_retry, retrying") + return True logger.debug("postgres_retry, giving up on retry") return False @@ -57,7 +42,7 @@ async def wrapped_method(*args: P.args, **kwargs: P.kwargs) -> T: retryer = tenacity.AsyncRetrying( stop=tenacity.stop_after_attempt(retries if retries is not None else settings.get_retries_number()), wait=tenacity.wait_exponential_jitter(), - retry=tenacity.retry_if_exception(_retry_handler), + retry=tenacity.retry_if_exception(_log_and_decide), reraise=True, before=tenacity.before_log(logger, logging.DEBUG), ) From 3d13b94bd62bed96ab0bed9909da39fcd823e27a Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Fri, 26 Jun 2026 22:25:00 +0300 Subject: [PATCH 5/8] test: trim retry integration suite to wiring proofs Reduce test_postgres_retry to two rows (40001 retriable, 40002 non-retriable). Delete test_postgres_retry_advanced_alchemy (covered by test_retriable.py). Remove unused advanced_alchemy imports. --- tests/test_retry.py | 46 --------------------------------------------- 1 file changed, 46 deletions(-) diff --git a/tests/test_retry.py b/tests/test_retry.py index ff6185a..01ad5ba 100644 --- a/tests/test_retry.py +++ b/tests/test_retry.py @@ -1,6 +1,5 @@ import pytest import sqlalchemy -from advanced_alchemy.exceptions import RepositoryError, wrap_sqlalchemy_exception from sqlalchemy.exc import DBAPIError from sqlalchemy.ext import asyncio as sa_async @@ -10,8 +9,6 @@ @pytest.mark.parametrize( ("error_code", "expected_calls"), [ - ("08000", 2), # PostgresConnectionError - backoff triggered, 1 retry - ("08003", 2), # subclass of PostgresConnectionError - backoff triggered, 1 retry ("40001", 2), # SerializationError - backoff triggered, 1 retry ("40002", 1), # StatementCompletionUnknownError - backoff not triggered ], @@ -45,49 +42,6 @@ async def raise_error() -> None: assert call_count == expected_calls -@pytest.mark.parametrize( - ("error_code", "expected_calls"), - [ - ("08000", 2), - ("08003", 2), - ("40001", 2), - ("40002", 1), - ], -) -async def test_postgres_retry_advanced_alchemy( - async_engine: sa_async.AsyncEngine, - error_code: str, - expected_calls: int, -) -> None: - async with async_engine.connect() as connection: - await connection.execute( - sqlalchemy.text( - f""" - CREATE OR REPLACE FUNCTION raise_error() - RETURNS VOID AS $$ - BEGIN - RAISE SQLSTATE '{error_code}'; - END; - $$ LANGUAGE plpgsql; - """, - ), - ) - - call_count = 0 - - @postgres_retry - async def raise_error() -> None: - nonlocal call_count - call_count += 1 - with wrap_sqlalchemy_exception(): - await connection.execute(sqlalchemy.text("SELECT raise_error()")) - - with pytest.raises(RepositoryError): - await raise_error() - - assert call_count == expected_calls - - async def test_postgres_retry_with_retries(async_engine: sa_async.AsyncEngine) -> None: async with async_engine.connect() as connection: await connection.execute( From 4235c29f99fbdf57324df7391534d8b8caa552b6 Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Fri, 26 Jun 2026 22:31:57 +0300 Subject: [PATCH 6/8] docs: promote retriable capability into architecture/ --- architecture/README.md | 5 +- architecture/retriable.md | 56 +++++++++++++++++++ architecture/retry.md | 29 ++-------- .../design.md | 2 +- 4 files changed, 66 insertions(+), 26 deletions(-) create mode 100644 architecture/retriable.md diff --git a/architecture/README.md b/architecture/README.md index 5c1c631..4d0e73d 100644 --- a/architecture/README.md +++ b/architecture/README.md @@ -10,8 +10,9 @@ These files carry **no frontmatter** — they are prose, dated by git. ## Capabilities -- [retry.md](retry.md) — `postgres_retry`, the async tenacity decorator and its - cause-chain retry predicate. +- [retry.md](retry.md) — `postgres_retry`, the async tenacity decorator. +- [retriable.md](retriable.md) — `is_retriable` / `RETRIABLE_ASYNCPG_ERRORS`, + the pure retriable-error predicate and cause-chain walk. - [connections.md](connections.md) — `build_connection_factory`, multi-host load balancing and failover. - [dsn.md](dsn.md) — `build_db_dsn` / `is_dsn_multihost`, DSN parsing and diff --git a/architecture/retriable.md b/architecture/retriable.md new file mode 100644 index 0000000..76f3a60 --- /dev/null +++ b/architecture/retriable.md @@ -0,0 +1,56 @@ +# Retriable + +`db_retry/retriable.py` classifies a PostgreSQL exception as retriable — a pure +predicate that can be tested in memory without a live database. + +## `is_retriable` + +```python +def is_retriable(exception: BaseException) -> bool: ... +``` + +Returns `True` if `exception` or any exception in its `__cause__`/`__context__` +chain is a `sqlalchemy.exc.DBAPIError` whose `.orig.__cause__` is one of the +[retriable asyncpg errors](glossary.md). + +The predicate is **pure**: no logging, no side effects. `postgres_retry` wraps it +in `_log_and_decide`, which adds the two debug lines and feeds the result to +tenacity. + +## `RETRIABLE_ASYNCPG_ERRORS` + +```python +RETRIABLE_ASYNCPG_ERRORS = (asyncpg.SerializationError, asyncpg.PostgresConnectionError) +``` + +The two asyncpg error classes that make a `DBAPIError` retriable: + +| Class | SQLSTATE | Meaning | +|---|---|---| +| `asyncpg.SerializationError` | `40001` | Serialization failure — transaction conflicted with a concurrent write; retry may succeed. | +| `asyncpg.PostgresConnectionError` | class `08` (e.g. `08000`, `08003`) | Lost or refused connection — transient network or server state; retry may reconnect. | + +`StatementCompletionUnknownError` (`40002`) is **not** included: when the +statement's outcome is unknown, a blind retry risks duplicating a write that +already committed. Classification stops at the unknown boundary. + +## Cause-chain walk + +`is_retriable` does not inspect only the top exception — it walks the chain: + +1. Follow `__cause__` first (an explicit `raise … from …`), then `__context__` + (an implicit exception chain). +2. Guard against cycles with a `seen` set of `id()`s — if the same exception + object appears twice, the walk terminates. +3. Return `True` at the **first** link that is a retriable `DBAPIError`. + +The walk matters because `DBAPIError` is often re-wrapped before it surfaces. +For example, advanced-alchemy's `wrap_sqlalchemy_exception()` raises a +`RepositoryError` (or `IntegrityError`) with the real `DBAPIError` attached as +`__cause__`; without the walk, the decorator would see only the outer wrapper and +give up. + +## Related + +- [retry.md](retry.md) — `postgres_retry`, which consumes `is_retriable` via `_log_and_decide`. +- [glossary.md](glossary.md) — **Retriable error** definition. diff --git a/architecture/retry.md b/architecture/retry.md index 5969954..998d85b 100644 --- a/architecture/retry.md +++ b/architecture/retry.md @@ -27,33 +27,16 @@ Each call builds a `tenacity.AsyncRetrying` with: - `stop=stop_after_attempt(retries or get_retries_number())` - `wait=wait_exponential_jitter()` — exponential backoff with jitter -- `retry=retry_if_exception(_retry_handler)` — the predicate below +- `retry=retry_if_exception(_log_and_decide)` — delegates to + [`is_retriable`](retriable.md); `_log_and_decide` adds the two debug log lines + (`"postgres_retry, retrying"` / `"postgres_retry, giving up on retry"`) around + the pure predicate - `reraise=True` — the **original** exception propagates after the last attempt, not tenacity's `RetryError` - `before=before_log(logger, DEBUG)` — debug log before each attempt -## What counts as retriable - -`_is_retriable_dbapi_error` returns `True` only for a `sqlalchemy.exc.DBAPIError` -whose `.orig` is set and whose `.orig.__cause__` is an -`asyncpg.SerializationError` (SQLSTATE `40001`) or -`asyncpg.PostgresConnectionError` (class `08`, e.g. `08000`/`08003`). This -deliberately excludes lookalikes such as `StatementCompletionUnknownError` -(`40002`), where the statement's outcome is unknown and a blind retry is unsafe. - -## Cause-chain walk - -`_retry_handler` does not inspect only the raised exception — it walks the -`__cause__`/`__context__` chain (following `__cause__` first, then -`__context__`), guarding against cycles with a `seen` set of `id()`s, and -returns `True` as soon as any link is a retriable `DBAPIError`. - -The walk matters because the `DBAPIError` is often re-raised inside another -exception. For example advanced-alchemy's `wrap_sqlalchemy_exception()` surfaces -it as `RepositoryError`/`IntegrityError` with the real `DBAPIError` hanging off -`__cause__`; the walk lets the retry still fire. Both retry and give-up paths -emit a debug log. - ## Related - [settings.md](settings.md) — where the default attempt count comes from. +- [retriable.md](retriable.md) — the retriable-error predicate: error taxonomy, + cause-chain walk, and cycle guard. diff --git a/planning/changes/2026-06-26.01-retriable-error-seam/design.md b/planning/changes/2026-06-26.01-retriable-error-seam/design.md index 9025e12..5482dfa 100644 --- a/planning/changes/2026-06-26.01-retriable-error-seam/design.md +++ b/planning/changes/2026-06-26.01-retriable-error-seam/design.md @@ -1,5 +1,5 @@ --- -summary: Give the retriable-error predicate its own seam — a pure is_retriable(exc) -> bool in db_retry/retriable.py — so classification is tested in memory instead of through a live Postgres round-trip. +summary: Extracted the retriable-error predicate into a pure is_retriable(exc) -> bool in db_retry/retriable.py, enabling in-memory classification tests; postgres_retry now consumes it via _log_and_decide. --- # Design: Give the retriable-error predicate its own seam From 7a85624a9891eeea61132686bd613994e30eaa14 Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Fri, 26 Jun 2026 22:37:38 +0300 Subject: [PATCH 7/8] docs: correct retry stop-condition and orig guard in architecture --- architecture/retriable.md | 2 +- architecture/retry.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/architecture/retriable.md b/architecture/retriable.md index 76f3a60..8d71dd0 100644 --- a/architecture/retriable.md +++ b/architecture/retriable.md @@ -10,7 +10,7 @@ def is_retriable(exception: BaseException) -> bool: ... ``` Returns `True` if `exception` or any exception in its `__cause__`/`__context__` -chain is a `sqlalchemy.exc.DBAPIError` whose `.orig.__cause__` is one of the +chain is a `sqlalchemy.exc.DBAPIError` whose `.orig` is set and whose `.orig.__cause__` is one of the [retriable asyncpg errors](glossary.md). The predicate is **pure**: no logging, no side effects. `postgres_retry` wraps it diff --git a/architecture/retry.md b/architecture/retry.md index 998d85b..dbb5561 100644 --- a/architecture/retry.md +++ b/architecture/retry.md @@ -25,7 +25,7 @@ is read per invocation, not frozen at decoration. Each call builds a `tenacity.AsyncRetrying` with: -- `stop=stop_after_attempt(retries or get_retries_number())` +- `stop=stop_after_attempt(retries if retries is not None else get_retries_number())` - `wait=wait_exponential_jitter()` — exponential backoff with jitter - `retry=retry_if_exception(_log_and_decide)` — delegates to [`is_retriable`](retriable.md); `_log_and_decide` adds the two debug log lines From e2236a1393dcecca16d6b28534dc1cba4a3ea84c Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Fri, 26 Jun 2026 22:41:31 +0300 Subject: [PATCH 8/8] docs: point retriable link at the asyncpg-errors section Co-Authored-By: Claude Opus 4.8 (1M context) --- architecture/retriable.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/architecture/retriable.md b/architecture/retriable.md index 8d71dd0..bebf166 100644 --- a/architecture/retriable.md +++ b/architecture/retriable.md @@ -11,7 +11,7 @@ def is_retriable(exception: BaseException) -> bool: ... Returns `True` if `exception` or any exception in its `__cause__`/`__context__` chain is a `sqlalchemy.exc.DBAPIError` whose `.orig` is set and whose `.orig.__cause__` is one of the -[retriable asyncpg errors](glossary.md). +[`RETRIABLE_ASYNCPG_ERRORS`](#retriable_asyncpg_errors). The predicate is **pure**: no logging, no side effects. `postgres_retry` wraps it in `_log_and_decide`, which adds the two debug lines and feeds the result to