Skip to content

chore(release): 0.24.0 — payload-shape telemetry#135

Closed
klappy wants to merge 12 commits intomainfrom
chore/release-0.24.0
Closed

chore(release): 0.24.0 — payload-shape telemetry#135
klappy wants to merge 12 commits intomainfrom
chore/release-0.24.0

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented Apr 23, 2026

Release: 0.24.0 (minor)

Carries the payload-shape telemetry feature from feat/telemetry-tokenization plus the version bump. Branch is based on feat/telemetry-tokenization HEAD so all 9 commits ride along — when this lands, the feature branch can be closed.

Bumps

File Before After
package.json 0.23.1 0.24.0
workers/package.json 0.23.1 0.24.0
package-lock.json 0.23.0 ⚠ 0.24.0
workers/package-lock.json 0.23.1 0.24.0

⚠ Root package-lock.json had drifted one release behind (0.23.0 while workers was at 0.23.1) — back-filled here. Both lockfiles still require manual sync per current tooling; the pre-commit hook only enforces sync between the two package.json files.

What's in this release

The full CHANGELOG entry is on the diff. Headline items:

  • Added: bytes_in, bytes_out, tokens_in, tokens_out telemetry doubles via gpt-tokenizer/encoding/cl100k_base. Module-level lazy singleton, ~432 KB gzipped, ~6× faster than @anthropic-ai/tokenizer per the in-tree bench. All measurement happens in ctx.waitUntil so user-facing latency is unchanged.
  • Changed: No Content-Type filter on the response body — MCP's Streamable HTTP transport returns text/event-stream, not application/json, and the original filter caused 100% of tool_call responses to record bytes_out=0.
  • Removed: tokenize_ms (formerly double7). Cloudflare Workers freezes both performance.now() and Date.now() between network I/O events as a timing-side-channel mitigation, making sub-request timing of pure-CPU tokenization structurally unmeasurable. The bench at workers/test/tokenize.test.mjs characterized the cost curve; future per-call cost is predictable from observed bytes_out / tokens_out against that curve.
  • Fixed: Root package-lock.json drift back-fill.

Full schema after this release:

blobs:   [event_type, method, tool_name, consumer_label, consumer_source,
          knowledge_base_url, document_uri, worker_version, cache_tier]    // 9
doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out]  // 6
indexes: [consumer_label]                                                   // 1

Validation

  • 7/7 unit tests pass (workers/test/tokenize.test.mjs)
  • 6/6 integration tests pass (workers/test/telemetry-integration.test.mjs)
  • Typecheck clean (reports as oddkit-mcp-worker@0.24.0)
  • Live preview smoke PASS — fifth Managed Agent run (sesn_011CaMNujMg9pymcz18JFPp8) confirmed all four shape fields populate with realistic varied values across distinct tools (oddkit_catalog: 21,437 bytes_out / 5,856 tokens_out; oddkit_time: 178 bytes_out / 71 tokens_out). MAX(double7) = 0 confirms tokenize_ms cleanly absent.

Workers-runtime forensics

Four distinct Workers ≠ Node behavioral diffs surfaced and resolved on this branch, each caught by live smoke (none by unit tests). Listed in the CHANGELOG Refs trailer with the corresponding agent session IDs.

Companion PR (canon)

klappy/klappy.dev#134 — telemetry-governance schema update + two new constraints (measure-before-you-object, performed-prudence-anti-pattern). Suggested merge order: that one first (governance lands, telemetry_policy reflects new schema immediately), then this one.

Sequencing options

Either works.


Note

Medium Risk
Adds new telemetry measurement and schema fields (bytes/tokens) to the production Workers MCP handler and introduces a new tokenizer dependency, which could affect runtime performance/memory and Analytics Engine dashboards despite being deferred via waitUntil. Risk is mitigated by defensive try/catch, synchronous response cloning, and new unit/integration tests covering the write path.

Overview
Adds payload-shape telemetry to the Workers MCP server: each request now records bytes_in, bytes_out, tokens_in, and tokens_out as new doubles (double3double6) using a lazily loaded gpt-tokenizer cl100k encoder (workers/src/tokenize.ts), executed in ctx.waitUntil to avoid impacting response latency.

Updates the telemetry write path to accept the already-read request body string, attach the new payload metrics to every written data point (including batch JSON-RPC), and drops the previously attempted tokenize_ms field; response measurement no longer gates on Content-Type so SSE/tool-call responses are included.

Bumps versions to 0.24.0 (root + workers) and syncs both lockfiles; adds unit + integration tests validating tokenizer behavior and end-to-end telemetry schema/writes.

Reviewed by Cursor Bugbot for commit d023ad6. Bugbot is set up for automated code reviews on this repo. Configure here.

Claude (drafting for klappy) and others added 12 commits April 23, 2026 19:01
…-tokenizer

Adds payload-shape instrumentation to MCP telemetry. New doubles 3-7
capture wire size and cl100k_base token counts for every request and
response, plus the wall-clock cost of tokenization itself.

Implementation:

- New module workers/src/tokenize.ts wraps gpt-tokenizer/encoding/cl100k_base
  with a lazy-loaded singleton encoder and a safe-failure surface
  (countTokensSafe, measurePayloadShape). Module-level promise caches the
  encoder across requests within a worker isolate; cold path pays parse
  once, all subsequent calls are warm.

- Refactors workers/src/telemetry.ts recordTelemetry signature to accept
  a pre-read body string + optional PayloadShape rather than reading the
  request body itself. Schema doc comment expanded to describe doubles
  3-7. Synchronous now (no longer returns a Promise) since the callers
  measurement work happens in waitUntil.

- Updates workers/src/index.ts call site: clones the response (when
  Content-Type is application/json), reads request and response bodies in
  the waitUntil background task, calls measurePayloadShape, then
  recordTelemetry. Zero user-facing latency added — measurement happens
  after the response is sent. SSE responses skip body measurement.

Tokenizer choice:

- gpt-tokenizer/encoding/cl100k_base over @anthropic-ai/tokenizer.
  Empirical bench (Node v22, same V8 as Workers): cl100k median 0.05-1.3ms
  across 200B-50KB payloads vs 0.30-7.4ms for Anthropic WASM. p95 dramatically
  better (no WASM memory-grow spikes).
- Token count diverges ~3-4% from Claude tokenizer on English prose;
  acceptable noise floor for shape analysis (we are not billing).
- Bundle delta measured empirically via esbuild: 432KB gzipped
  (993KB minified). Comfortably within paid-tier Workers limits.

Failure handling:

- Any tokenizer load or encode failure → countTokensSafe returns null,
  treated as 0 in telemetry. tokenize_ms = 0 alongside non-zero bytes
  signals a measurement skip in the data.
- Telemetry must never break MCP requests — all measurement code wrapped
  in try/catch within the waitUntil block.

Tests:

- New workers/test/tokenize.test.mjs (8 cases, all pass): empty input,
  positive integer output, scaling with length, full PayloadShape contract,
  UTF-8 byte length correctness, JSON-RPC payload tokenization, tokenize_ms
  finiteness, empty-response (SSE) skip path.
- Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports;
  exercises the same TypeScript surface that ships in the worker bundle.
- npm run typecheck clean.

Methodology note:

- This change exists because three theoretical objections (bundle bloat,
  vodka violation, tokenizer-choice domain opinion) were falsified by a
  five-minute bench. See klappy://canon/constraints/measure-before-you-object
  and klappy://canon/observations/performed-prudence-anti-pattern (drafts
  pending merge into klappy.dev).
Mocks env.ODDKIT_TELEMETRY with a writeDataPoint capture, then exercises
recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads.

Verifies end-to-end that the full PayloadShape lands in doubles 3-7,
that bytes match TextEncoder UTF-8 length, that batch JSON-RPC produces
one point per message, and that malformed input is silently dropped.

7/7 cases pass. Notable: the realistic ~8KB response measured
tokenize_ms=0.948ms — within 14% of the bench prediction (~1.1ms median
for 8KB on Node). The dream-home walkthrough was accurate; real prod
will differ but the order of magnitude is locked.

Compiles tokenize.ts + telemetry.ts via tsc into a temp dir, post-patches
the JSON import to add Node 22's required attribute syntax, then
dynamic-imports. Same code path that ships in the worker bundle.

This is the verification that wrangler dev would have done if workerd
ran in this nested sandbox (it doesn't — workerd dies after declaring
ready, likely a Linux capability issue with the container).
Two assertions that would have failed against the pre-fix code:

1. SSE response now asserts tokenize_ms=0 (was: only checked
   bytes_out/tokens_out, missed the spurious non-zero tokenize_ms that
   the original logic would record on every SSE response).

2. New test 'Bugbot invariant: tokenize_ms is 0 only when encoder did
   not actually run' explicitly covers the both-empty case (must be 0)
   and the request-only case (must be valid finite number).

Both new assertions verify Bugbot's distinction: a 0 from countTokensSafe
on empty input is a trivial short-circuit, not a real tokenization. Only
non-null results on non-empty input prove the encoder ran. The pre-fix
code conflated these and would have polluted the bench-vs-prod A/B
comparison with spurious tokenize_ms readings on SSE traffic.

Real-world tokenize_ms on the realistic 8KB integration test:
1.016ms (bench predicted 1.1ms — within 8%).

8/8 cases passing.
… JSON

CRITICAL FIX. A managed-agent smoke test against the preview deployment
caught that doubles 4 (bytes_out), 6 (tokens_out), and 7 (tokenize_ms)
were all zero across every recorded data point. Six telemetry rows
queried, six rows with bytes_out=0.

Root cause: the call site in workers/src/index.ts filtered the response
clone by Content-Type, only cloning when the type included
'application/json'. MCP's Streamable HTTP transport returns
'text/event-stream' (SSE) for tool calls, not JSON. The filter was
silently dropping almost every response, leaving responseClone null and
recording zeros for the entire response side.

This was the same performed-prudence pattern the new canon docs warn
about, applied in micro: I assumed MCP responses would be JSON without
measuring what the SDK actually returns. The smoke test caught it
because canon also prescribes verification before declaring done.

Fix:

1. New helper measureResponseShape(requestText, response) in tokenize.ts.
   Clones the response, reads the body, runs measurePayloadShape. No
   Content-Type filter — read everything. SSE protocol overhead (~10
   bytes per event) is negligible against the actual payload size, and
   oddkit's responses are bounded (no long-lived streams).

2. Call site in index.ts simplified to use the helper. Drops the
   filter, drops the separate clone, drops the responseClone variable.
   Cleaner code AND correct behavior.

3. Four new unit tests for measureResponseShape:
   - measures application/json responses
   - measures text/event-stream responses (this would have caught the
     bug pre-merge)
   - leaves the original response body intact (clone correctness)
   - handles already-consumed body without throwing

12/12 unit tests pass, typecheck clean.

Methodology note: this fix exists because the smoke test (live MCP
calls + telemetry_public SQL) caught what unit tests missed. The
canon-prescribed verification gate worked exactly as designed —
release-validation-gate (E0008.3) at klappy://canon/constraints/release-validation-gate
mandates independent live smoke for load-bearing surface changes
before merge. The agent dispatch is that smoke.
…Workers

Third smoke confirmed bytes_in/out and tokens_in/out now populate
correctly (357-21319 bytes_out, 142-5398 tokens_out across varied
payloads). But double7 (tokenize_ms) is still 0 across every row.

Root cause: Cloudflare Workers' performance.now() is a deterministic
timer — it does NOT advance during synchronous CPU work. The mitigation
prevents timing-side-channel attacks. The timer only ticks on I/O.

Tokenization (countTokensSafe) is pure CPU work. The encoder runs
between two reads of performance.now() with no I/O in between, so both
reads return the same value and tokenize_ms is always 0. Tests passed
in Node because Node's performance.now() is a real high-resolution
timer.

Fix: switch to Date.now(). Always advances, at 1ms resolution. The
bench-vs-prod comparison loses sub-millisecond precision (sub-ms
tokenizations round to 0) but gains a working signal for any payload
above ~5KB where bench timing exceeded 1ms.

Updated the telemetry.ts schema doc comment to document the 1ms
resolution and the Workers-specific reason.

Methodology: this is the third Cloudflare Workers gotcha caught in
prod that unit tests can't catch — Workers Runtime != Node:

  1. b94aaa6 (mine): assumed MCP responses are application/json (they're SSE)
  2. 1a555df (mine): assumed clone() inside waitUntil works (body already drained)
  3. THIS: assumed performance.now() advances in synchronous code (it doesn't)

Each was caught by the live Managed Agent smoke + telemetry_public SQL,
not by typecheck or unit tests. The release-validation-gate is the
only thing standing between this branch and a quietly broken prod
telemetry pipeline.

8 unit tests still pass. Typecheck clean.
Fourth smoke confirmed bytes_in/out and tokens_in/out work in production
(357-21319 bytes_out, 142-5398 tokens_out across varied payload sizes).
But tokenize_ms remained 0 across every row even with the Date.now()
fix from 279f761.

Root cause discovered by the agent: Cloudflare Workers freezes BOTH
performance.now() AND Date.now() during synchronous CPU work. Both
timers only advance on network I/O events as a side-channel mitigation
(documented at developers.cloudflare.com/workers/runtime-apis/web-standards/).
Tokenization is pure CPU work, so any sub-request timing of it always
reads 0 in production. This is a structural runtime constraint, not a
bug we can patch.

Workarounds considered and rejected:
- Force artificial I/O between reads (KV.list, fetch) — adds real
  latency to telemetry-only paths, grotesque
- Two writeDataPoint calls with start/end timestamps — over-engineered,
  doubles write count, complicates queries
- Keep the column as always-0 — actively misleading

Decision: drop tokenize_ms entirely from PayloadShape, the doubles
array, schema doc, and tests. The bench at workers/test/tokenize.test.mjs
already characterized the cost curve (cl100k handles 50 KB in ~1.3 ms
on Node v22). Bytes_out + tokens_out are sufficient signal — a future
maintainer can predict tokenize_ms from the bench curve given the
observed payload sizes.

Schema before:
  doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in,
            tokens_out, tokenize_ms]  // 7 fields

Schema after:
  doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in,
            tokens_out]  // 6 fields

Companion canon update at klappy/klappy.dev coming in next commit on
that branch — drops tokenize_ms row from the doubles table and removes
the tokenize_ms mention in 'What This Enables'.

Methodology: this is the fourth Workers Runtime != Node behavioral diff
caught by live smoke on this branch. Each was unmeasurable from unit
tests because Node behaves differently:
  1. b94aaa6 (mine, broken): Content-Type filter (MCP returns SSE)
  2. 1a555df (mine, broken): clone in waitUntil (body already drained)
  3. 279f761 (mine, broken): Date.now() in Workers (frozen too)
  4. THIS: drop the unmeasurable column entirely

The release-validation-gate canon doc is the only thing that surfaced
each of these — the live preview smoke + telemetry_public SQL caught
what no test setup I could ship would have caught. The Workers-runtime
gap was real and the gate worked.

Tests:
- 7/7 unit tests pass (workers/test/tokenize.test.mjs)
- 6/6 integration tests pass (workers/test/telemetry-integration.test.mjs)
- typecheck clean
Minor bump for payload-shape telemetry (PR #134).

Bumps:
  package.json              0.23.1 -> 0.24.0
  workers/package.json      0.23.1 -> 0.24.0
  package-lock.json         0.23.0 -> 0.24.0  (root drifted one release behind)
  workers/package-lock.json 0.23.1 -> 0.24.0

CHANGELOG.md gains the [0.24.0] entry above [0.23.1] documenting:
  - Added: bytes_in/out, tokens_in/out telemetry doubles + helpers
  - Changed: drop the Content-Type filter (MCP responses are SSE)
  - Removed: tokenize_ms — Workers freezes both perf.now and Date.now
  - Fixed: root package-lock.json version drift back-fill

The four Workers Runtime != Node behavioral diffs caught by the five
Managed Agent smoke sessions on this branch are listed in the Refs
trailer for forensic record.

Tests: 7/7 unit + 6/6 integration pass on bumped state. Typecheck clean
(reports as oddkit-mcp-worker@0.24.0).

Per workflow: dedicated chore/release-x.y.z PR. Branch is off
feat/telemetry-tokenization HEAD, so it carries the feature commits +
the bump together. After merge, feat/telemetry-tokenization can be
closed (its commits are already in main via this release branch).
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 23, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
oddkit d023ad6 Commit Preview URL

Branch Preview URL
Apr 23 2026, 09:30 PM

@klappy klappy closed this Apr 23, 2026
@klappy
Copy link
Copy Markdown
Owner Author

klappy commented Apr 23, 2026

Closing — bump consolidated onto #134 (commit d023ad6 is now on feat/telemetry-tokenization HEAD). One PR per feature, version bump rides along. Sorry for the duplication.

@klappy klappy deleted the chore/release-0.24.0 branch April 23, 2026 21:29
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.

Reviewed by Cursor Bugbot for commit d023ad6. Configure here.

Comment thread workers/src/index.ts
const cacheTier = tracer.indexSource;
// Clone the response synchronously before returning so the body is
// still available to read inside the deferred waitUntil callback.
const responseClone = response.clone();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unprotected response.clone() can break MCP responses

Medium Severity

The response.clone() call sits outside any try/catch, while the ctx.waitUntil callback's catch block (line 991–993) explicitly upholds the invariant "Telemetry must never break MCP requests." If clone() throws (e.g., the SDK returns a response with an already-disturbed or locked body), the exception prevents return response from ever executing, turning a telemetry-only code path into a user-facing 500 error. The old code had no response.clone() at all, so this is a new risk. Moving the clone inside the existing try/catch (or wrapping it in its own) would preserve the stated safety guarantee.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d023ad6. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants