You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(telemetry): drop tokenize_ms — Workers timer is unmeasurable
Fourth smoke confirmed bytes_in/out and tokens_in/out work in production
(357-21319 bytes_out, 142-5398 tokens_out across varied payload sizes).
But tokenize_ms remained 0 across every row even with the Date.now()
fix from 279f761.
Root cause discovered by the agent: Cloudflare Workers freezes BOTH
performance.now() AND Date.now() during synchronous CPU work. Both
timers only advance on network I/O events as a side-channel mitigation
(documented at developers.cloudflare.com/workers/runtime-apis/web-standards/).
Tokenization is pure CPU work, so any sub-request timing of it always
reads 0 in production. This is a structural runtime constraint, not a
bug we can patch.
Workarounds considered and rejected:
- Force artificial I/O between reads (KV.list, fetch) — adds real
latency to telemetry-only paths, grotesque
- Two writeDataPoint calls with start/end timestamps — over-engineered,
doubles write count, complicates queries
- Keep the column as always-0 — actively misleading
Decision: drop tokenize_ms entirely from PayloadShape, the doubles
array, schema doc, and tests. The bench at workers/test/tokenize.test.mjs
already characterized the cost curve (cl100k handles 50 KB in ~1.3 ms
on Node v22). Bytes_out + tokens_out are sufficient signal — a future
maintainer can predict tokenize_ms from the bench curve given the
observed payload sizes.
Schema before:
doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in,
tokens_out, tokenize_ms] // 7 fields
Schema after:
doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in,
tokens_out] // 6 fields
Companion canon update at klappy/klappy.dev coming in next commit on
that branch — drops tokenize_ms row from the doubles table and removes
the tokenize_ms mention in 'What This Enables'.
Methodology: this is the fourth Workers Runtime != Node behavioral diff
caught by live smoke on this branch. Each was unmeasurable from unit
tests because Node behaves differently:
1. b94aaa6 (mine, broken): Content-Type filter (MCP returns SSE)
2. 1a555df (mine, broken): clone in waitUntil (body already drained)
3. 279f761 (mine, broken): Date.now() in Workers (frozen too)
4. THIS: drop the unmeasurable column entirely
The release-validation-gate canon doc is the only thing that surfaced
each of these — the live preview smoke + telemetry_public SQL caught
what no test setup I could ship would have caught. The Workers-runtime
gap was real and the gate worked.
Tests:
- 7/7 unit tests pass (workers/test/tokenize.test.mjs)
- 6/6 integration tests pass (workers/test/telemetry-integration.test.mjs)
- typecheck clean
0 commit comments