Skip to content

fix(observability): capsule.request.durationMs spans full request lifecycle#44

Merged
arapov merged 1 commit into
masterfrom
fix/capsule-request-duration-includes-body
May 22, 2026
Merged

fix(observability): capsule.request.durationMs spans full request lifecycle#44
arapov merged 1 commit into
masterfrom
fix/capsule-request-duration-includes-body

Conversation

@arapov
Copy link
Copy Markdown
Collaborator

@arapov arapov commented May 22, 2026

Summary

  • v1.6.0's capsule.request.durationMs measured only TTFB. 48 h of production data made the gap visible: list_entries reported 542 ms in the dashboard while tool.call reported 2720 ms wall-clock — the 2178 ms gap was body stream + JSON parse, invisible to the metric.
  • Refactor: doFetch() now returns a RequestStart bookkeeping object instead of emitting; a new consumeBody() helper wraps body consumption in try/finally so the emit fires after the body has been read (on both success and error paths).
  • All 9 callers in src/capsule/client.ts updated. One new test pins the contract via a slow-body ReadableStream. 470 tests (was 469). Bundle +0.72 KB.

Background — why this surfaced now

The 48 h v1.6.0 traffic sample (post-deploy verification window) had 62 tool calls across 7 distinct tools. Six of them showed tool.call.durationMs ≈ capsule.request.durationMs with <10 ms overhead — looked fine. list_entries was the outlier: 2720 ms vs 542 ms. After tracing, the gap mapped entirely to res.json() time after fetch() returned headers. Other endpoints had small bodies so the gap was hidden; the global-entries feed surfaces the worst case.

What this is not

  • Not a fix for the underlying slow list_entries call (that's just Capsule sending a big body — out of our control).
  • Not a new event shape — durationMs is still a single number, just a more accurate one.
  • Not an N+1 fix for list_party_entries, which was the OTHER finding from the 48 h sample (65 % of all calls). Adding batch_list_party_entries is minor-bump territory; filed in IDEAS.md with the new evidence.

Test plan

  • npm run typecheck clean
  • npm run lint + npm run format:check clean
  • npm run build clean (148.15 KB stdio / 174.62 KB HTTP, +~0.7 KB)
  • npm test 470/470 pass (was 469; +1 new test)
  • New test verifies the contract: a 60 ms body-stream delay must show up in durationMs
  • Existing capsule.request test still passes (the duration was typed as number and the test didn't pin a specific value, so the refactor is contract-compatible)
  • Post-merge: deploy to production, watch tool.call.durationMs - capsule.request.durationMs for list_entries — should drop near zero (overhead → ~0 ms), confirming the metric now captures what was previously hidden.

🤖 Generated with Claude Code

…ecycle

48h of v1.6.0 production traffic surfaced a metric-fidelity bug: the
`capsule.request.durationMs` event measured only TTFB, not body
stream + JSON parse. Visible on `list_entries` (tenant-wide feed,
large body): dashboard showed 542 ms while wall-clock was 2720 ms.
Other endpoints were too small to expose the gap (<10 ms body read).

Refactor: doFetch() now returns a RequestStart bookkeeping object and
no longer emits. A new consumeBody() helper wraps body consumption
in a try/finally that emits at scope exit, so durationMs spans the
full request (issued → body fully read), and the emit fires on
both success and body-read errors (4xx/5xx, malformed JSON, etc).

Touches all 9 callers in src/capsule/client.ts. One new test pins
the new contract by mocking a slow body-read stream and asserting
durationMs reflects the delay. 470 tests (was 469); bundle +0.72 KB.

Also: bumped batch_list_party_entries to top of the IDEAS.md
"child-list fan-out" ranking — the same 48h data showed 40/62 tool
calls (65 %) were list_party_entries with 1:1 Capsule fan-out, the
canonical N+1 the entry was speculating about. Out of scope for the
patch (new tool = minor bump).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@arapov arapov merged commit 6686622 into master May 22, 2026
1 check passed
@arapov arapov deleted the fix/capsule-request-duration-includes-body branch May 22, 2026 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant