Skip to content

fix(billing): 防止 snapshot 在 api_key/upstream 被删后写入失败#168

Merged
g1331 merged 1 commit into
masterfrom
fix/billing-snapshot-fk-race
May 23, 2026
Merged

fix(billing): 防止 snapshot 在 api_key/upstream 被删后写入失败#168
g1331 merged 1 commit into
masterfrom
fix/billing-snapshot-fk-race

Conversation

@g1331
Copy link
Copy Markdown
Owner

@g1331 g1331 commented May 23, 2026

Summary

修复生产环境中 request_billing_snapshots 写入时出现的 FK 违例:

insert or update on table "request_billing_snapshots" violates foreign key constraint
"request_billing_snapshots_api_key_id_api_keys_id_fk"

根因

snapshot 持久化是请求主流程的"最后一公里"——尤其在 SSE 流式响应结束后才在 .then(...) 里异步触发。如果调用方(route.ts 里 8 处 persistBillingSnapshotSafely)持有的内存版 validApiKey.id 在此期间已被外部删除(典型场景:部署冒烟测试创建临时 key、发请求、收响应后立刻删 key),INSERT 时这个 UUID 不再存在于 api_keys 表,FK 约束失败。

onDelete: SET NULL 只在"先存在、后被删"时由 DB 触发;救不了"INSERT 时引用已不存在的行"。

生产证据

数据库反查同一条 request_logs 行:

字段
api_key_id NULL(cascade SET NULL 已触发)
upstream_id NULL(同上)
api_key_name deploy-smoke-26296857443-key
model gpt-4.1-smoke
created_at → snapshot billed_at 时差 198 ms

冒烟测试在 198 ms 窗口内删 key + 删 upstream,snapshot 异步写入时撞 FK。

修复

  • calculateAndPersistRequestBillingSnapshot 入口新增 reconcileFkColumnsWithRequestLog:根据 requestLogId 反查 request_logs 行,用其 api_key_id / upstream_id 覆盖 input。
  • request_logs 两列都带 ON DELETE SET NULL,行内值已是 DB 最终态,天然 FK 安全。
  • 单点收敛于 billing service 入口,8 处调用点零改动自动受益

影响

  • 行为变化:snapshot 的 api_key_id / upstream_id 始终与 request_logs 行保持一致(即使调用方传入旧 id)。这本就是预期语义,过去因为没显式 reconcile 才会撞 FK。
  • 性能:每次 snapshot 写入新增 1 次主键索引 SELECT。snapshot 不在请求关键路径上(响应已发出后异步写),开销可接受。

Test plan

  • pnpm test:run tests/unit/services/billing-cost-service.test.ts —— 13/13 通过(原有 11 + 新增 2)
  • pnpm test:run tests/unit/services/ —— 1112/1112 全量单元测试通过,无回归
  • pnpm exec tsc --noEmit 通过
  • pnpm lint 通过
  • pnpm build 通过
  • 新增用例 writes null FK columns when request_logs row was cascade-nulled after key/upstream deletion 直接复现并验证生产 race 场景
  • 新增用例 falls back to null FK columns when request_logs row is missing 覆盖 request_logs 行不存在的兜底分支
  • 上线后观察 failed to persist billing snapshot 日志频次回落到 0

后续可选改进

  • 冒烟测试的 teardown 加短暂等待,避免后续异步落库逻辑(不止 billing,也包含 quota tracker 等)撞上删除窗口。这是系统性问题,billing snapshot 只是第一个被发现的症状。

snapshot 写入时若 caller 传入的 api_key_id / upstream_id 在 api_keys /
upstreams 中已不存在(典型场景:冒烟测试或外部清理在 SSE 响应结束后立刻
删除临时资源),INSERT 会撞 FK 约束。

修复方式:在 calculateAndPersistRequestBillingSnapshot 入口处从
request_logs 反查 apiKeyId / upstreamId 作为 snapshot 的最终值。
request_logs 行已被 DB 的 ON DELETE SET NULL 处理过,天然 FK 安全。
所有 8 处调用点自动受益,无需改动调用方。

补充两个单元测试覆盖 race 与 request_logs 缺失兜底两种分支。
@codecov
Copy link
Copy Markdown

codecov Bot commented May 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.50%. Comparing base (31afbf8) to head (1b37795).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #168      +/-   ##
==========================================
+ Coverage   78.46%   78.50%   +0.03%     
==========================================
  Files         145      145              
  Lines       11479    11484       +5     
  Branches     3967     3968       +1     
==========================================
+ Hits         9007     9015       +8     
  Misses       1632     1632              
+ Partials      840      837       -3     
Flag Coverage Δ *Carryforward flag
typescript 76.04% <ø> (ø) Carriedforward from 31afbf8
verify 74.15% <100.00%> (+0.01%) ⬆️

*This pull request uses carry forward flags. Click here to find out more.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@g1331 g1331 merged commit 13a8803 into master May 23, 2026
12 checks passed
@g1331 g1331 deleted the fix/billing-snapshot-fk-race branch May 23, 2026 03:53
g1331 added a commit that referenced this pull request May 23, 2026
fix(billing): snapshot 写入加 FK 违例捕获重试,闭环 PR #168 残留的 race
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant