fix(billing): snapshot 写入加 FK 违例捕获重试,闭环 PR #168 残留的 race#169
Merged
Conversation
PR #168 引入 reconcile(从 request_logs 回读 api_key_id / upstream_id) 只缩小了 race 窗口、没有消除:reconcile SELECT 与 INSERT 之间,cascade SET NULL 仍可能在 caller 持有的 id 上触发。v0.3.0-alpha.3 部署后冒烟测试 再次复现,时序坐实: T+0 SSE 流请求开始 T+5ms 上游 SSE 响应到来 T+139ms smoke test 删除 api_key(cascade 把 request_logs.api_key_id 置 NULL) T+142ms snapshot INSERT 用 reconcile 拿到的旧 id 撞 FK 修复方式:在 INSERT 处加 catch-FK-and-retry-with-NULL。捕获 PG 错误码 23503 + 解析 constraint_name 命中 api_key_id / upstream_id 任一约束, 把违反那列置 NULL 后单次重试;reconcile 仍保留作为常见路径的清场层, 避免无谓的重试日志。 辅以三处不可分割的细节: 1. helper 返回实际写入数据库的 apiKeyId / upstreamId, applyQuotaDeltaAfterSnapshotUpsert 用这组值,避免"DB 已置 NULL 但内存 配额仍按旧 id 累加"导致的幽灵配额; 2. 用 constraint_name 而不是 detail 做程序化判断,且未识别的约束直接 throw,不无脑兜底; 3. 仅重试一次,无退避——目标是修正引用值而非等待外部状态恢复。 新增 5 个单元测试覆盖:api_key_id FK 重试、upstream_id FK 重试、重试后 配额不累加、非 FK 错误透传、未识别约束透传。验证 PR #168 在生产 v0.3.0-alpha.3 部署后再次出现的同一 FK 违例已被堵住。
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #169 +/- ##
==========================================
+ Coverage 78.50% 78.58% +0.08%
==========================================
Files 145 145
Lines 11484 11504 +20
Branches 3968 3977 +9
==========================================
+ Hits 9015 9040 +25
+ Misses 1632 1630 -2
+ Partials 837 834 -3
*This pull request uses carry forward flags. Click here to find out more. 🚀 New features to boost your workflow:
|
7 tasks
g1331
added a commit
that referenced
this pull request
May 23, 2026
…zzle-wrap fix(billing): 识别 drizzle 包装后的 FK 违例,补齐 PR #169 漏掉的真实形状
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #168 上线(v0.3.0-alpha.3)后,部署 smoke test 再次复现
request_billing_snapshots的 FK 违例。原因:上次引入的reconcileFkColumnsWithRequestLog只是把 race 窗口从 198ms 缩到几毫秒,没有消除——reconcile SELECT 与 INSERT 之间,cascadeSET NULL仍可能在 caller 持有的 id 上触发。生产时序坐实
同一个
requestId: 1ab55a08:reconcile 必然发生在 T+139ms 之前(否则会读到 NULL 走对路径),INSERT 在 T+139ms 之后。TOCTOU 窗口客观存在,单纯"读一下再写"消除不掉。
修复
在 INSERT 处加 catch-FK-and-retry-with-NULL:
23503err.constraint_name,命中..._api_key_id_..._fk/..._upstream_id_..._fk任一约束 → 把违反的那列置 NULLreconcileFkColumnsWithRequestLog保留——它仍能让"事前已删"的常见路径直接走 NULL 路径,不会产生重试日志噪音。两层叠加:reconcile 是常见路径清场,catch-retry 是事中 race 兜底。三处不可分割的细节
apiKeyId/upstreamId,applyQuotaDeltaAfterSnapshotUpsert用这组值。否则会出现"DB 已置 NULL 但内存配额仍按旧 id 累加"的幽灵配额。constraint_name(程序化可靠字段)而非detail(人类可读文本);未识别的约束直接 throw,不做无脑兜底。Test plan
新增 5 个用例,覆盖每条修复链路:
retries with null api_key_id when INSERT hits api_keys FK violationretries with null upstream_id when INSERT hits upstreams FK violationskips quota delta for FK-retried column to avoid db/memory state driftrethrows non-FK errors without retryrethrows FK violation when constraint name is not recognized自动化校验:
设计依据
本 PR 的方案选型经 Codex 二次评审协助钉定。关键分歧点:
已知后续
部署日志中同时出现一条独立的 `request_logs.duration_ms` integer 溢出错误(值 9461868115 ≈ 109 天),来自后台 stale reconciler。属于另一个独立 bug,将在另一个 PR 单独修。