You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Severity: Medium Affected repos: `middleware-python` (this), `middleware-node` (reference) Component boundary: middleware cloud transport / 401 lifecycle parity
Context
`recost-dev/middleware-node#16` (closed by PR #32) added the full 401-auth-failure lifecycle to the Node SDK: typed `RecostAuthError` / `RecostFatalAuthError` through `onError`, configurable threshold (`maxConsecutiveAuthFailures`, default 5), one-time stderr warning on first 401, second distinct stderr line at fatal-suspend, suspended-state silent no-op, and counter reset on every non-401 outcome (success, non-401 4xx, 5xx-after-retries, network throw).
Python's `recost/_types.py` (lines 189–209) already declares `RecostError` / `RecostAuthError` / `RecostFatalAuthError` with the same constructor shape. `recost/_transport.py` already wires partial escalation: it increments `_consecutive_auth_failures`, fires a one-time stderr warning, dispatches the typed errors through `on_error`, sets `_suspended` at threshold, and short-circuits `send()` when suspended. So the classes are wired — the remaining work is bringing the lifecycle to full Node parity.
Gaps vs Node (the actual work)
Threshold is hardcoded to 5 (`recost/_transport.py:437`). Node exposes `max_consecutive_auth_failures: Optional[int] = 5` on `RecostConfig`, threaded through `_resolve_config`. Mirror that.
No second stderr line at fatal-suspend. Node emits a distinct `[recost] cloud transport suspended after N consecutive auth failures. Restart the process after rotating apiKey.` line at the threshold; Python emits only the first-401 line. Add the second line.
Counter does not reset on every non-401 outcome. Currently resets only on 2xx success (`_transport.py:349`). Must also reset on:
non-401 4xx (403/404/422) — fall-through path after `_handle_cloud_result` returns False
5xx after retries-exhausted — `_post_cloud` may return error result OR the catch path runs
Network throw — the `except` block (`_transport.py:374-375`)
The literal reading of "consecutive 401s" requires that any non-401 outcome resets, so transient outages do not accumulate toward the threshold.
stderr text format diverges from Node. Cross-SDK log-grep parity matters; Python says `"Recost: API rejected key (401). Telemetry will be dropped."` while Node says `"[recost] HTTP 401 — API key rejected. Telemetry will stop after N consecutive failures."` Align Python to the Node format (`[recost] HTTP 401 — API key rejected. Telemetry will stop after {N} consecutive failures. Check your api_key at https://recost.dev/dashboard/account.\`).
The Node spec's "Decisions and rationale" table and "Lifecycle table" carry over directly; the Python work is a translation, not a redesign.
Out of scope (file separately if relevant)
`handle.reconfigure` / `handle.reinit_after_fork` for in-process recovery — Python's `RecostFatalAuthError` docstring already mentions `reinit_after_fork()`, but the actual surface isn't designed yet. Restart-only, mirroring Node.
Severity: Medium
Affected repos: `middleware-python` (this), `middleware-node` (reference)
Component boundary: middleware cloud transport / 401 lifecycle parity
Context
`recost-dev/middleware-node#16` (closed by PR #32) added the full 401-auth-failure lifecycle to the Node SDK: typed `RecostAuthError` / `RecostFatalAuthError` through `onError`, configurable threshold (`maxConsecutiveAuthFailures`, default 5), one-time stderr warning on first 401, second distinct stderr line at fatal-suspend, suspended-state silent no-op, and counter reset on every non-401 outcome (success, non-401 4xx, 5xx-after-retries, network throw).
Python's `recost/_types.py` (lines 189–209) already declares `RecostError` / `RecostAuthError` / `RecostFatalAuthError` with the same constructor shape. `recost/_transport.py` already wires partial escalation: it increments `_consecutive_auth_failures`, fires a one-time stderr warning, dispatches the typed errors through `on_error`, sets `_suspended` at threshold, and short-circuits `send()` when suspended. So the classes are wired — the remaining work is bringing the lifecycle to full Node parity.
Gaps vs Node (the actual work)
The literal reading of "consecutive 401s" requires that any non-401 outcome resets, so transient outages do not accumulate toward the threshold.
Reference
The Node spec's "Decisions and rationale" table and "Lifecycle table" carry over directly; the Python work is a translation, not a redesign.
Out of scope (file separately if relevant)