Skip to content

Commit 8153745

Browse files
author
Claude (drafting for klappy)
committed
feat(telemetry): drop tokenize_ms — Workers timer is unmeasurable
Fourth smoke confirmed bytes_in/out and tokens_in/out work in production (357-21319 bytes_out, 142-5398 tokens_out across varied payload sizes). But tokenize_ms remained 0 across every row even with the Date.now() fix from 279f761. Root cause discovered by the agent: Cloudflare Workers freezes BOTH performance.now() AND Date.now() during synchronous CPU work. Both timers only advance on network I/O events as a side-channel mitigation (documented at developers.cloudflare.com/workers/runtime-apis/web-standards/). Tokenization is pure CPU work, so any sub-request timing of it always reads 0 in production. This is a structural runtime constraint, not a bug we can patch. Workarounds considered and rejected: - Force artificial I/O between reads (KV.list, fetch) — adds real latency to telemetry-only paths, grotesque - Two writeDataPoint calls with start/end timestamps — over-engineered, doubles write count, complicates queries - Keep the column as always-0 — actively misleading Decision: drop tokenize_ms entirely from PayloadShape, the doubles array, schema doc, and tests. The bench at workers/test/tokenize.test.mjs already characterized the cost curve (cl100k handles 50 KB in ~1.3 ms on Node v22). Bytes_out + tokens_out are sufficient signal — a future maintainer can predict tokenize_ms from the bench curve given the observed payload sizes. Schema before: doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out, tokenize_ms] // 7 fields Schema after: doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out] // 6 fields Companion canon update at klappy/klappy.dev coming in next commit on that branch — drops tokenize_ms row from the doubles table and removes the tokenize_ms mention in 'What This Enables'. Methodology: this is the fourth Workers Runtime != Node behavioral diff caught by live smoke on this branch. Each was unmeasurable from unit tests because Node behaves differently: 1. b94aaa6 (mine, broken): Content-Type filter (MCP returns SSE) 2. 1a555df (mine, broken): clone in waitUntil (body already drained) 3. 279f761 (mine, broken): Date.now() in Workers (frozen too) 4. THIS: drop the unmeasurable column entirely The release-validation-gate canon doc is the only thing that surfaced each of these — the live preview smoke + telemetry_public SQL caught what no test setup I could ship would have caught. The Workers-runtime gap was real and the gate worked. Tests: - 7/7 unit tests pass (workers/test/tokenize.test.mjs) - 6/6 integration tests pass (workers/test/telemetry-integration.test.mjs) - typecheck clean
1 parent 279f761 commit 8153745

5 files changed

Lines changed: 35 additions & 108 deletions

File tree

workers/src/index.ts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -958,9 +958,10 @@ export default {
958958

959959
// Phase 1 telemetry — non-blocking, fire-and-forget (E0008)
960960
// Phase 1.5: cache_tier from tracer feeds blob9 (E0008.1)
961-
// Phase 2: payload shape (bytes_in/out, tokens_in/out, tokenize_ms) feeds
962-
// doubles 3–7. All measurement happens inside waitUntil so the response
963-
// returns to the caller with zero added latency. Response body is
961+
// Phase 2: payload shape (bytes_in/out, tokens_in/out) feeds doubles
962+
// 3-6. tokenize_ms was tried and dropped — Workers freezes both
963+
// performance.now() and Date.now() during synchronous CPU work, making
964+
// sub-request timing of pure-CPU tokenization unmeasurable. Response body is
964965
// measured universally — MCP's Streamable HTTP transport returns SSE,
965966
// not JSON, so a Content-Type filter would (and did) drop almost every
966967
// response. The helper handles clone failures safely.

workers/src/telemetry.ts

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -39,17 +39,16 @@
3939
* 0 when tokenization was skipped or failed.
4040
* double6: tokens_out — cl100k_base token count of the response body. 0 for
4141
* streamed responses or tokenizer failure.
42-
* double7: tokenize_ms — Total wall-clock time spent tokenizing both payloads
43-
* in the waitUntil() background task. Distinct from
44-
* the response trace — tokenization happens after the
45-
* response is sent so it never adds user-facing latency.
46-
* A value of 0 alongside non-zero bytes indicates the
47-
* tokenizer was skipped (load failure or empty payload).
48-
* Resolution is 1ms (Date.now), not sub-ms. Cloudflare
49-
* Workers' performance.now() does not advance during
50-
* synchronous CPU work, so it cannot measure pure-CPU
51-
* tokenization. Sub-ms tokenizations round to 0; the
52-
* bench-vs-prod comparison is therefore lower-bounded.
42+
*
43+
* NOTE: a previous iteration shipped a `double7: tokenize_ms` field intended
44+
* to capture the wall-clock cost of tokenization for bench-vs-prod
45+
* comparison. It is gone. Cloudflare Workers freezes both
46+
* `performance.now()` and `Date.now()` between network I/O events as a
47+
* timing-side-channel mitigation, so any timing of pure CPU work always
48+
* reads 0 in production. The cost was characterized in the bench (workers/
49+
* test/tokenize.test.mjs) and bytes_in/out + tokens_in/out are sufficient
50+
* to predict per-call cost from that bench curve.
51+
*
5352
* index1: sampling_key — consumer label (for sampling consistency)
5453
*
5554
* See: klappy://canon/constraints/telemetry-governance
@@ -258,7 +257,6 @@ export function recordTelemetry(
258257
const bytesOut = shape?.bytes_out ?? 0;
259258
const tokensIn = shape?.tokens_in ?? 0;
260259
const tokensOut = shape?.tokens_out ?? 0;
261-
const tokenizeMs = shape?.tokenize_ms ?? 0;
262260

263261
for (const payload of messages) {
264262
const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
@@ -296,7 +294,6 @@ export function recordTelemetry(
296294
bytesOut, // double4: bytes_out
297295
tokensIn, // double5: tokens_in
298296
tokensOut, // double6: tokens_out
299-
tokenizeMs, // double7: tokenize_ms
300297
],
301298
indexes: [consumerLabel],
302299
});

workers/src/tokenize.ts

Lines changed: 14 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
* Failure mode: if the tokenizer fails to load or throws on a payload,
2222
* `countTokensSafe` returns null. Telemetry treats null as "not measured"
2323
* and writes `0` to keep the schema dense; the absence is visible in the
24-
* tokenize_ms column being 0 alongside non-zero bytes.
24+
* tokens columns being 0 alongside non-zero bytes.
2525
*
2626
* See: klappy://canon/constraints/telemetry-governance
2727
*/
@@ -67,32 +67,31 @@ export async function countTokensSafe(text: string): Promise<number | null> {
6767

6868
/**
6969
* Result of measuring a payload pair. All fields default to 0 on failure
70-
* so the telemetry schema stays dense; the `tokenize_ms` field carries
71-
* the signal — a value of 0 alongside non-zero bytes indicates the
72-
* tokenizer was skipped or failed.
70+
* so the telemetry schema stays dense; the absence of a real value is
71+
* encoded by tokens_in / tokens_out being 0 alongside non-zero bytes
72+
* (encoder skipped or failed).
73+
*
74+
* Note: this struct does NOT carry a tokenize_ms field. Cloudflare Workers
75+
* freezes both `performance.now()` and `Date.now()` during synchronous
76+
* CPU work as a timing-side-channel mitigation — neither timer advances
77+
* unless a network I/O event occurs between reads. Tokenization is pure
78+
* CPU work, so any sub-request timing of it would always read 0 in
79+
* production. The cost was already characterized in the bench (bench
80+
* file at workers/test/tokenize.test.mjs and integration test). We keep
81+
* the bytes/tokens shape and trust the bench for the per-payload cost
82+
* curve.
7383
*/
7484
export interface PayloadShape {
7585
bytes_in: number;
7686
bytes_out: number;
7787
tokens_in: number;
7888
tokens_out: number;
79-
tokenize_ms: number;
8089
}
8190

8291
/**
8392
* Measure the byte and token shape of a request/response pair. Tokenization
8493
* is performed once per payload using the lazy-loaded cl100k_base encoder.
8594
* Bytes are measured via TextEncoder (UTF-8 byte length, the wire size).
86-
*
87-
* Timing note: uses `Date.now()` rather than `performance.now()`. Cloudflare
88-
* Workers' `performance.now()` does not advance during synchronous CPU work
89-
* (a deterministic-timing mitigation against timing-side-channel attacks —
90-
* the timer only ticks when the worker performs I/O). Tokenization is pure
91-
* CPU work, so `performance.now()` returns the same value before and after
92-
* the encode and `tokenize_ms` always reads 0 in production. `Date.now()`
93-
* always advances, at 1ms resolution. The bench-vs-prod comparison loses
94-
* sub-millisecond precision but gains a working signal — payloads that take
95-
* ≥1ms (8KB and up per the bench) show up as 1ms and above.
9695
*/
9796
export async function measurePayloadShape(
9897
requestText: string,
@@ -102,26 +101,15 @@ export async function measurePayloadShape(
102101
const bytes_in = requestText ? encoder.encode(requestText).length : 0;
103102
const bytes_out = responseText ? encoder.encode(responseText).length : 0;
104103

105-
const start = Date.now();
106104
const [tIn, tOut] = await Promise.all([
107105
countTokensSafe(requestText),
108106
countTokensSafe(responseText),
109107
]);
110-
const tokenize_ms = Date.now() - start;
111-
112-
// A `0` from countTokensSafe on empty text is a trivial short-circuit, not
113-
// a real tokenization — only a non-null result on non-empty text proves the
114-
// encoder ran. If neither payload was actually tokenized, zero out
115-
// tokenize_ms to preserve the documented "skipped/failed" signal.
116-
const tokenizerRan =
117-
(requestText !== "" && tIn !== null) ||
118-
(responseText !== "" && tOut !== null);
119108

120109
return {
121110
bytes_in,
122111
bytes_out,
123112
tokens_in: tIn ?? 0,
124113
tokens_out: tOut ?? 0,
125-
tokenize_ms: tokenizerRan ? tokenize_ms : 0,
126114
};
127115
}

workers/test/telemetry-integration.test.mjs

Lines changed: 6 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,13 @@
66
* recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads.
77
*
88
* Verifies end-to-end:
9-
* - The full PayloadShape lands in doubles 3-7
9+
* - The full PayloadShape lands in doubles 3-6
1010
* - bytes_in/out match TextEncoder UTF-8 byte length on the actual payloads
1111
* - tokens_in/out are positive integers when payloads are non-empty
12-
* - tokenize_ms is non-negative and finite
1312
* - Batch JSON-RPC produces one data point per message
1413
* - SSE simulation (responseText="") records zeros for the response side
1514
* - Tool-call payloads correctly populate blob3 (tool_name)
16-
* - The blob array is exactly 9 entries and the doubles array is exactly 7
15+
* - The blob array is exactly 9 entries and the doubles array is exactly 6
1716
*
1817
* This is the verification that wrangler dev would have done — same code
1918
* path, same schema, real tokenizer.
@@ -171,7 +170,7 @@ await test("oddkit_time tool call lands a complete telemetry record", async () =
171170

172171
// Schema shape
173172
assert.equal(point.blobs.length, 9, `blobs should be 9, got ${point.blobs.length}`);
174-
assert.equal(point.doubles.length, 7, `doubles should be 7, got ${point.doubles.length}`);
173+
assert.equal(point.doubles.length, 6, `doubles should be 6, got ${point.doubles.length}`);
175174
assert.equal(point.indexes.length, 1, "indexes should be 1");
176175

177176
// Blobs
@@ -190,11 +189,9 @@ await test("oddkit_time tool call lands a complete telemetry record", async () =
190189
assert.equal(point.doubles[3], shape.bytes_out, "double4 = bytes_out");
191190
assert.equal(point.doubles[4], shape.tokens_in, "double5 = tokens_in");
192191
assert.equal(point.doubles[5], shape.tokens_out, "double6 = tokens_out");
193-
assert.equal(point.doubles[6], shape.tokenize_ms, "double7 = tokenize_ms");
194192

195193
console.log(` bytes_in=${shape.bytes_in} bytes_out=${shape.bytes_out} ` +
196-
`tokens_in=${shape.tokens_in} tokens_out=${shape.tokens_out} ` +
197-
`tokenize_ms=${shape.tokenize_ms.toFixed(3)}`);
194+
`tokens_in=${shape.tokens_in} tokens_out=${shape.tokens_out}`);
198195
});
199196

200197
// ─── Test 2: oddkit_search with realistic large response ───────────────────
@@ -227,18 +224,13 @@ await test("oddkit_search with realistic ~8KB response — measurements are sane
227224
assert.ok(shape.bytes_out > 5000, `bytes_out should be > 5000, got ${shape.bytes_out}`);
228225
assert.ok(shape.tokens_out > 1000, `tokens_out should be > 1000, got ${shape.tokens_out}`);
229226

230-
// Tokenization cost should be in the bench-predicted range (1-5ms for ~8KB)
231-
assert.ok(shape.tokenize_ms < 100,
232-
`tokenize_ms should be < 100ms for ~8KB payload (bench predicted ~1ms), got ${shape.tokenize_ms}`);
233-
234227
console.log(` bytes_out=${shape.bytes_out} (~${(shape.bytes_out/1024).toFixed(1)}KB) ` +
235-
`tokens_out=${shape.tokens_out} ` +
236-
`tokenize_ms=${shape.tokenize_ms.toFixed(3)} (bench predicted ~1ms for 8KB)`);
228+
`tokens_out=${shape.tokens_out}`);
237229
});
238230

239231
// ─── Test 3: SSE response (empty body) records zeros ───────────────────────
240232

241-
await test("SSE response (empty body) records bytes_out=0, tokens_out=0, tokenize_ms=0", async () => {
233+
await test("SSE response (empty body) records bytes_out=0 and tokens_out=0", async () => {
242234
const env = mockEnv();
243235
const requestBody = JSON.stringify({
244236
jsonrpc: "2.0",
@@ -256,28 +248,6 @@ await test("SSE response (empty body) records bytes_out=0, tokens_out=0, tokeniz
256248
assert.ok(point.doubles[2] > 0, "bytes_in should still be > 0");
257249
});
258250

259-
// Bugbot's fix (commit c4f5752) — distinguish "encoder ran" from
260-
// "encoder short-circuited on empty input." If the response is empty (SSE)
261-
// AND the encoder only ran on the request, that still counts as "ran" and
262-
// tokenize_ms must reflect the real cost. But if BOTH sides are empty,
263-
// tokenize_ms must be 0. This case locks both halves of that invariant in.
264-
await test("Bugbot invariant: tokenize_ms is 0 only when encoder did not actually run", async () => {
265-
// Case A: both empty → tokenize_ms must be 0 (no encoder call did meaningful work)
266-
const bothEmpty = await measurePayloadShape("", "");
267-
assert.equal(bothEmpty.tokenize_ms, 0,
268-
`both empty: tokenize_ms must be 0, got ${bothEmpty.tokenize_ms}`);
269-
270-
// Case B: request only → tokenize_ms can be non-zero (encoder ran on request)
271-
const requestOnly = await measurePayloadShape("hello world payload", "");
272-
assert.ok(requestOnly.tokenize_ms >= 0, "tokenize_ms must be >= 0");
273-
assert.ok(requestOnly.tokens_in > 0, "tokens_in should be > 0 when request has content");
274-
// tokenize_ms may be 0 if the call was extremely fast, but it must NOT be
275-
// forced to zero just because responseText is empty. Confirming only that
276-
// the field is present and finite — the prior bug was a non-zero value
277-
// being recorded when nothing ran, not the inverse.
278-
assert.ok(Number.isFinite(requestOnly.tokenize_ms), "tokenize_ms must be finite");
279-
});
280-
281251
// ─── Test 4: Batch JSON-RPC writes one point per message ───────────────────
282252

283253
await test("batch JSON-RPC produces one data point per message", async () => {
@@ -329,28 +299,5 @@ await test("missing env.ODDKIT_TELEMETRY is a graceful no-op", async () => {
329299
recordTelemetry(mockRequest(), requestBody, env, 5, "memory", shape);
330300
});
331301

332-
// ─── Test 7: The tokenize_ms warm-vs-cold pattern ──────────────────────────
333-
334-
await test("tokenize_ms cold-call > warm-call (encoder caches across calls)", async () => {
335-
const reqA = JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/call",
336-
params: { name: "oddkit_time", arguments: {} } });
337-
const resA = JSON.stringify({ jsonrpc: "2.0", id: 1, result: { x: 1 } });
338-
339-
const cold = await measurePayloadShape(reqA, resA);
340-
const warm = await measurePayloadShape(reqA, resA);
341-
const warmer = await measurePayloadShape(reqA, resA);
342-
343-
console.log(` cold=${cold.tokenize_ms.toFixed(3)}ms ` +
344-
`warm=${warm.tokenize_ms.toFixed(3)}ms ` +
345-
`warmer=${warmer.tokenize_ms.toFixed(3)}ms`);
346-
347-
// The warm calls should be bounded — not asserting strict ordering
348-
// because timing jitter can flip them, but the median should be tiny.
349-
assert.ok(warm.tokenize_ms < 50,
350-
`warm tokenize_ms should be < 50ms, got ${warm.tokenize_ms}`);
351-
assert.ok(warmer.tokenize_ms < 50,
352-
`warmer tokenize_ms should be < 50ms, got ${warmer.tokenize_ms}`);
353-
});
354-
355302
console.log(`\n${pass} passed, ${fail} failed`);
356303
process.exit(fail > 0 ? 1 : 0);

workers/test/tokenize.test.mjs

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ await test("countTokensSafe scales with text length", async () => {
9494

9595
await test("measurePayloadShape returns all required fields as numbers", async () => {
9696
const s = await measurePayloadShape("request", "response");
97-
for (const field of ["bytes_in", "bytes_out", "tokens_in", "tokens_out", "tokenize_ms"]) {
97+
for (const field of ["bytes_in", "bytes_out", "tokens_in", "tokens_out"]) {
9898
assert.ok(field in s, `missing field: ${field}`);
9999
assert.equal(typeof s[field], "number", `${field} must be number, got ${typeof s[field]}`);
100100
}
@@ -117,12 +117,6 @@ await test("measurePayloadShape produces positive token counts for non-empty inp
117117
assert.ok(s.tokens_out > 0, "tokens_out should be > 0");
118118
});
119119

120-
await test("measurePayloadShape tokenize_ms is non-negative and finite", async () => {
121-
const s = await measurePayloadShape("a", "b");
122-
assert.ok(s.tokenize_ms >= 0, "tokenize_ms must be >= 0");
123-
assert.ok(Number.isFinite(s.tokenize_ms), "tokenize_ms must be finite");
124-
});
125-
126120
await test("measurePayloadShape handles empty response (SSE skipped)", async () => {
127121
const s = await measurePayloadShape("hello", "");
128122
assert.equal(s.bytes_out, 0);

0 commit comments

Comments
 (0)