feat(telemetry): add bytes_in/out, tokens_in/out, tokenize_ms via gpt-tokenizer#134
feat(telemetry): add bytes_in/out, tokens_in/out, tokenize_ms via gpt-tokenizer#134
Conversation
…-tokenizer Adds payload-shape instrumentation to MCP telemetry. New doubles 3-7 capture wire size and cl100k_base token counts for every request and response, plus the wall-clock cost of tokenization itself. Implementation: - New module workers/src/tokenize.ts wraps gpt-tokenizer/encoding/cl100k_base with a lazy-loaded singleton encoder and a safe-failure surface (countTokensSafe, measurePayloadShape). Module-level promise caches the encoder across requests within a worker isolate; cold path pays parse once, all subsequent calls are warm. - Refactors workers/src/telemetry.ts recordTelemetry signature to accept a pre-read body string + optional PayloadShape rather than reading the request body itself. Schema doc comment expanded to describe doubles 3-7. Synchronous now (no longer returns a Promise) since the callers measurement work happens in waitUntil. - Updates workers/src/index.ts call site: clones the response (when Content-Type is application/json), reads request and response bodies in the waitUntil background task, calls measurePayloadShape, then recordTelemetry. Zero user-facing latency added — measurement happens after the response is sent. SSE responses skip body measurement. Tokenizer choice: - gpt-tokenizer/encoding/cl100k_base over @anthropic-ai/tokenizer. Empirical bench (Node v22, same V8 as Workers): cl100k median 0.05-1.3ms across 200B-50KB payloads vs 0.30-7.4ms for Anthropic WASM. p95 dramatically better (no WASM memory-grow spikes). - Token count diverges ~3-4% from Claude tokenizer on English prose; acceptable noise floor for shape analysis (we are not billing). - Bundle delta measured empirically via esbuild: 432KB gzipped (993KB minified). Comfortably within paid-tier Workers limits. Failure handling: - Any tokenizer load or encode failure → countTokensSafe returns null, treated as 0 in telemetry. tokenize_ms = 0 alongside non-zero bytes signals a measurement skip in the data. - Telemetry must never break MCP requests — all measurement code wrapped in try/catch within the waitUntil block. Tests: - New workers/test/tokenize.test.mjs (8 cases, all pass): empty input, positive integer output, scaling with length, full PayloadShape contract, UTF-8 byte length correctness, JSON-RPC payload tokenization, tokenize_ms finiteness, empty-response (SSE) skip path. - Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports; exercises the same TypeScript surface that ships in the worker bundle. - npm run typecheck clean. Methodology note: - This change exists because three theoretical objections (bundle bloat, vodka violation, tokenizer-choice domain opinion) were falsified by a five-minute bench. See klappy://canon/constraints/measure-before-you-object and klappy://canon/observations/performed-prudence-anti-pattern (drafts pending merge into klappy.dev).
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
oddkit | d023ad6 | Commit Preview URL Branch Preview URL |
Apr 23 2026, 09:30 PM |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed:
tokenize_mszeroing logic broken for encoder-failure + empty-text- Replaced the
tIn === null && tOut === nullguard with a check that requires at least one non-empty text to have returned a non-null token count, so the empty-string short-circuit (which returns 0, not null) no longer masks encoder failure and tokenize_ms is correctly zeroed.
- Replaced the
Preview (c4f5752ecf)
diff --git a/workers/package-lock.json b/workers/package-lock.json
--- a/workers/package-lock.json
+++ b/workers/package-lock.json
@@ -1,15 +1,16 @@
{
"name": "oddkit-mcp-worker",
- "version": "0.23.0",
+ "version": "0.23.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "oddkit-mcp-worker",
- "version": "0.23.0",
+ "version": "0.23.1",
"dependencies": {
"agents": "^0.4.1",
"fflate": "^0.8.2",
+ "gpt-tokenizer": "^3.0.0",
"zod": "^4.3.6"
},
"devDependencies": {
@@ -2149,6 +2150,12 @@
"url": "https://github.com/sponsors/ljharb"
}
},
+ "node_modules/gpt-tokenizer": {
+ "version": "3.4.0",
+ "resolved": "https://registry.npmjs.org/gpt-tokenizer/-/gpt-tokenizer-3.4.0.tgz",
+ "integrity": "sha512-wxFLnhIXTDjYebd9A9pGl3e31ZpSypbpIJSOswbgop5jLte/AsZVDvjlbEuVFlsqZixVKqbcoNmRlFDf6pz/UQ==",
+ "license": "MIT"
+ },
"node_modules/has-symbols": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
diff --git a/workers/package.json b/workers/package.json
--- a/workers/package.json
+++ b/workers/package.json
@@ -12,7 +12,8 @@
"dependencies": {
"agents": "^0.4.1",
"fflate": "^0.8.2",
- "zod": "^4.3.6"
+ "zod": "^4.3.6",
+ "gpt-tokenizer": "^3.0.0"
},
"devDependencies": {
"@cloudflare/workers-types": "^4.20250124.0",
diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -958,14 +958,33 @@
// Phase 1 telemetry — non-blocking, fire-and-forget (E0008)
// Phase 1.5: cache_tier from tracer feeds blob9 (E0008.1)
+ // Phase 2: payload shape (bytes_in/out, tokens_in/out, tokenize_ms) feeds
+ // doubles 3–7. All measurement happens inside waitUntil so the response
+ // returns to the caller with zero added latency. SSE responses are
+ // recognized by content-type and skip body measurement (zeros recorded).
if (telemetryClone) {
const durationMs = Date.now() - startTime;
const cacheTier = tracer.indexSource;
+
+ // Clone the response NOW (before it's consumed by the network) so we
+ // can read its body in the background. The original `response` flows
+ // back to the caller untouched.
+ const responseContentType = response.headers.get("content-type") ?? "";
+ const responseClone = responseContentType.includes("application/json")
+ ? response.clone()
+ : null;
+
ctx.waitUntil(
(async () => {
try {
+ const requestText = await telemetryClone.text();
+ const responseText = responseClone ? await responseClone.text() : "";
+
+ const { measurePayloadShape } = await import("./tokenize");
const { recordTelemetry } = await import("./telemetry");
- await recordTelemetry(telemetryClone, env, durationMs, cacheTier);
+
+ const shape = await measurePayloadShape(requestText, responseText);
+ recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
} catch {
// Telemetry must never break MCP requests
}
diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts
--- a/workers/src/telemetry.ts
+++ b/workers/src/telemetry.ts
@@ -28,12 +28,30 @@
* handler's internal compute. Expect a long tail on
* cache-miss requests even for trivial actions like
* oddkit_time.
+ * double3: bytes_in — UTF-8 byte length of the JSON-RPC request body.
+ * 0 when telemetry was unable to read the body.
+ * Tokenizer-agnostic; exact wire size.
+ * double4: bytes_out — UTF-8 byte length of the response body. 0 for
+ * streamed responses (SSE) where the body cannot be
+ * measured without consuming the stream.
+ * double5: tokens_in — cl100k_base token count of the request body.
+ * See `tokenize.ts` for the tokenizer-choice rationale.
+ * 0 when tokenization was skipped or failed.
+ * double6: tokens_out — cl100k_base token count of the response body. 0 for
+ * streamed responses or tokenizer failure.
+ * double7: tokenize_ms — Total wall-clock time spent tokenizing both payloads
+ * in the waitUntil() background task. Distinct from
+ * the response trace — tokenization happens after the
+ * response is sent so it never adds user-facing latency.
+ * A value of 0 alongside non-zero bytes indicates the
+ * tokenizer was skipped (load failure or empty payload).
* index1: sampling_key — consumer label (for sampling consistency)
*
* See: klappy://canon/constraints/telemetry-governance
*/
import type { Env } from "./zip-baseline-fetcher";
+import type { PayloadShape } from "./tokenize";
import pkg from "../package.json";
// Build-time fallback for blob8 (worker_version). env.ODDKIT_VERSION is
@@ -198,55 +216,86 @@
* Record one telemetry data point per JSON-RPC message.
* Non-blocking — uses env.ODDKIT_TELEMETRY.writeDataPoint() which requires
* no await (fire-and-forget via Analytics Engine).
- * Called with a cloned request to avoid consuming the original body.
+ *
+ * Caller responsibilities:
+ * - Pass the raw request body as `requestBody` (string). Already-cloned and
+ * read; this function will parse it as JSON-RPC.
+ * - Pass the original `request` so consumer-label resolution can read URL
+ * params and headers.
+ * - Pass `shape` describing the payload byte and token shape, or null to
+ * write zeros for the shape doubles (e.g. when the response could not be
+ * measured because it was an SSE stream).
*/
-export function recordTelemetry(request: Request, env: Env, durationMs: number, cacheTier?: string): Promise<void> {
- if (!env.ODDKIT_TELEMETRY) return Promise.resolve();
+export function recordTelemetry(
+ request: Request,
+ requestBody: string,
+ env: Env,
+ durationMs: number,
+ cacheTier?: string,
+ shape?: PayloadShape | null,
+): void {
+ if (!env.ODDKIT_TELEMETRY) return;
- // Parse the request body to extract JSON-RPC details
- return request
- .json()
- .then((body: unknown) => {
- // Handle batch requests — process each message
- const messages = Array.isArray(body) ? body : [body];
+ let body: unknown;
+ try {
+ body = JSON.parse(requestBody);
+ } catch {
+ // Malformed JSON-RPC — silently drop, telemetry must never break MCP requests
+ return;
+ }
- for (const payload of messages) {
- const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
- request,
- payload,
- );
- const toolCall = parseToolCall(payload);
+ // Handle batch requests — process each message
+ const messages = Array.isArray(body) ? body : [body];
- const msg =
- typeof payload === "object" && payload !== null
- ? (payload as Record<string, unknown>)
- : {};
- const method = typeof msg.method === "string" ? msg.method : "unknown";
+ // Bytes/tokens are per-request (not per-message); for batches we attribute
+ // the full payload shape to each message rather than fabricating a split.
+ const bytesIn = shape?.bytes_in ?? 0;
+ const bytesOut = shape?.bytes_out ?? 0;
+ const tokensIn = shape?.tokens_in ?? 0;
+ const tokensOut = shape?.tokens_out ?? 0;
+ const tokenizeMs = shape?.tokenize_ms ?? 0;
- const eventType = toolCall ? "tool_call" : "mcp_request";
- const toolName = toolCall?.toolName ?? "";
- const documentUri = toolCall?.documentUri ?? "";
+ for (const payload of messages) {
+ const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
+ request,
+ payload,
+ );
+ const toolCall = parseToolCall(payload);
- env.ODDKIT_TELEMETRY!.writeDataPoint({
- blobs: [
- eventType,
- method,
- toolName,
- consumerLabel,
- consumerSource,
- toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
- documentUri,
- env.ODDKIT_VERSION || BUILD_VERSION,
- cacheTier || "none", // blob9: E0008.1 x-ray cache tier
- ],
- doubles: [1, durationMs],
- indexes: [consumerLabel],
- });
- }
- })
- .catch(() => {
- // Telemetry must never break MCP requests — silently drop parse failures
+ const msg =
+ typeof payload === "object" && payload !== null
+ ? (payload as Record<string, unknown>)
+ : {};
+ const method = typeof msg.method === "string" ? msg.method : "unknown";
+
+ const eventType = toolCall ? "tool_call" : "mcp_request";
+ const toolName = toolCall?.toolName ?? "";
+ const documentUri = toolCall?.documentUri ?? "";
+
+ env.ODDKIT_TELEMETRY!.writeDataPoint({
+ blobs: [
+ eventType,
+ method,
+ toolName,
+ consumerLabel,
+ consumerSource,
+ toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
+ documentUri,
+ env.ODDKIT_VERSION || BUILD_VERSION,
+ cacheTier || "none", // blob9: E0008.1 x-ray cache tier
+ ],
+ doubles: [
+ 1, // double1: count
+ durationMs, // double2: duration_ms
+ bytesIn, // double3: bytes_in
+ bytesOut, // double4: bytes_out
+ tokensIn, // double5: tokens_in
+ tokensOut, // double6: tokens_out
+ tokenizeMs, // double7: tokenize_ms
+ ],
+ indexes: [consumerLabel],
});
+ }
}
// ──────────────────────────────────────────────────────────────────────────────
diff --git a/workers/src/tokenize.ts b/workers/src/tokenize.ts
new file mode 100644
--- /dev/null
+++ b/workers/src/tokenize.ts
@@ -1,0 +1,117 @@
+/**
+ * Tokenizer module for oddkit MCP Worker telemetry (E0008).
+ *
+ * Provides cl100k_base token counts for request and response payloads.
+ * cl100k_base is GPT-4's tokenizer; we use it as a tokenizer-agnostic
+ * proxy for "payload token shape," not as a billing-accurate measure
+ * for any specific consumer model.
+ *
+ * Choice of cl100k_base over @anthropic-ai/tokenizer: the cl100k bundle
+ * benchmarks ~6x faster (median 0.05–1.3ms across 200B–50KB payloads on
+ * Node v8, the same engine as Cloudflare Workers) and has dramatically
+ * better p95 (no WASM memory-grow spikes). Token counts diverge from the
+ * Claude tokenizer by ~3–4% on English prose — acceptable noise floor
+ * for shape analysis. See `klappy://canon/constraints/measure-before-you-object`
+ * for the bench methodology that drove this choice.
+ *
+ * Bundle impact: ~432 KB gzipped via the `gpt-tokenizer/encoding/cl100k_base`
+ * subpath import. Loaded via dynamic import so cold paths that don't
+ * tokenize don't pay the parse cost.
+ *
+ * Failure mode: if the tokenizer fails to load or throws on a payload,
+ * `countTokensSafe` returns null. Telemetry treats null as "not measured"
+ * and writes `0` to keep the schema dense; the absence is visible in the
+ * tokenize_ms column being 0 alongside non-zero bytes.
+ *
+ * See: klappy://canon/constraints/telemetry-governance
+ */
+
+type CountTokensFn = (text: string) => number;
+
+let encoderPromise: Promise<CountTokensFn | null> | null = null;
+
+/**
+ * Lazily import gpt-tokenizer's cl100k_base encoder. Cached across requests
+ * via the module-level promise; the first call within a worker isolate pays
+ * the parse cost, all subsequent calls are warm.
+ */
+function getEncoder(): Promise<CountTokensFn | null> {
+ if (encoderPromise) return encoderPromise;
+
+ encoderPromise = import("gpt-tokenizer/encoding/cl100k_base")
+ .then((mod) => {
+ const fn = (mod as { countTokens?: CountTokensFn }).countTokens;
+ if (typeof fn !== "function") return null;
+ return fn;
+ })
+ .catch(() => null);
+
+ return encoderPromise;
+}
+
+/**
+ * Count cl100k_base tokens in `text`. Returns null on any failure
+ * (load failure, encoder throw, etc). Telemetry must never break MCP
+ * requests — this function never throws.
+ */
+export async function countTokensSafe(text: string): Promise<number | null> {
+ if (!text) return 0;
+ try {
+ const fn = await getEncoder();
+ if (!fn) return null;
+ return fn(text);
+ } catch {
+ return null;
+ }
+}
+
+/**
+ * Result of measuring a payload pair. All fields default to 0 on failure
+ * so the telemetry schema stays dense; the `tokenize_ms` field carries
+ * the signal — a value of 0 alongside non-zero bytes indicates the
+ * tokenizer was skipped or failed.
+ */
+export interface PayloadShape {
+ bytes_in: number;
+ bytes_out: number;
+ tokens_in: number;
+ tokens_out: number;
+ tokenize_ms: number;
+}
+
+/**
+ * Measure the byte and token shape of a request/response pair. Tokenization
+ * is performed once per payload using the lazy-loaded cl100k_base encoder.
+ * Bytes are measured via TextEncoder (UTF-8 byte length, the wire size).
+ */
+export async function measurePayloadShape(
+ requestText: string,
+ responseText: string,
+): Promise<PayloadShape> {
+ const encoder = new TextEncoder();
+ const bytes_in = requestText ? encoder.encode(requestText).length : 0;
+ const bytes_out = responseText ? encoder.encode(responseText).length : 0;
+
+ const start = performance.now();
+ const [tIn, tOut] = await Promise.all([
+ countTokensSafe(requestText),
+ countTokensSafe(responseText),
+ ]);
+ const tokenize_ms = Math.round((performance.now() - start) * 1000) / 1000;
+
+ // A `0` from countTokensSafe on empty text is a trivial short-circuit, not
+ // a real tokenization — only a non-null result on non-empty text proves the
+ // encoder ran. If neither payload was actually tokenized, zero out
+ // tokenize_ms to preserve the documented "skipped/failed" signal.
+ const tokenizerRan =
+ (requestText !== "" && tIn !== null) ||
+ (responseText !== "" && tOut !== null);
+
+ return {
+ bytes_in,
+ bytes_out,
+ tokens_in: tIn ?? 0,
+ tokens_out: tOut ?? 0,
+ tokenize_ms: tokenizerRan ? tokenize_ms : 0,
+ };
+}
diff --git a/workers/test/tokenize.test.mjs b/workers/test/tokenize.test.mjs
new file mode 100644
--- /dev/null
+++ b/workers/test/tokenize.test.mjs
@@ -1,0 +1,134 @@
+#!/usr/bin/env node
+/**
+ * Unit test for workers/src/tokenize.ts.
+ *
+ * Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports the
+ * compiled .js. The compile step exercises the same TypeScript surface
+ * that ships in the worker bundle.
+ */
+
+import assert from "node:assert/strict";
+import { spawnSync } from "node:child_process";
+import { mkdtempSync, writeFileSync, symlinkSync, existsSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const WORKERS_ROOT = join(__dirname, "..");
+const TOKENIZE_TS = join(WORKERS_ROOT, "src", "tokenize.ts");
+
+const tmp = mkdtempSync(join(tmpdir(), "oddkit-tokenize-test-"));
+const tsconfig = {
+ compilerOptions: {
+ target: "ES2022",
+ module: "ES2022",
+ moduleResolution: "bundler",
+ lib: ["ES2022", "DOM"],
+ types: [],
+ strict: false,
+ skipLibCheck: true,
+ resolveJsonModule: true,
+ allowSyntheticDefaultImports: true,
+ esModuleInterop: true,
+ rootDir: join(WORKERS_ROOT, "src"),
+ outDir: tmp,
+ },
+ include: [TOKENIZE_TS],
+};
+const tsconfigPath = join(tmp, "tsconfig.json");
+writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2));
+
+const tmpNodeModules = join(tmp, "node_modules");
+if (!existsSync(tmpNodeModules)) {
+ symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules);
+}
+
+const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], {
+ encoding: "utf8",
+});
+if (compile.status !== 0) {
+ console.error("TypeScript compile failed:");
+ console.error(compile.stdout);
+ console.error(compile.stderr);
+ process.exit(1);
+}
+
+const compiledPath = join(tmp, "tokenize.js");
+const { countTokensSafe, measurePayloadShape } = await import(compiledPath);
+
+let pass = 0;
+let fail = 0;
+
+async function test(name, fn) {
+ try {
+ await fn();
+ console.log(` \u2713 ${name}`);
+ pass++;
+ } catch (err) {
+ console.log(` \u2717 ${name}`);
+ console.log(` ${err.message}`);
+ fail++;
+ }
+}
+
+console.log("tokenize.ts unit tests");
+
+await test("countTokensSafe returns 0 for empty string", async () => {
+ const n = await countTokensSafe("");
+ assert.equal(n, 0);
+});
+
+await test("countTokensSafe returns a positive integer for normal text", async () => {
+ const n = await countTokensSafe("hello world this is a test");
+ assert.equal(typeof n, "number");
+ assert.ok(n > 0, `expected > 0, got ${n}`);
+ assert.equal(n, Math.floor(n), "must be an integer");
+});
+
+await test("countTokensSafe scales with text length", async () => {
+ const small = await countTokensSafe("hello world");
+ const big = await countTokensSafe("hello world ".repeat(100));
+ assert.ok(big > small * 50, `big (${big}) should be much larger than small (${small})`);
+});
+
+await test("measurePayloadShape returns all required fields as numbers", async () => {
+ const s = await measurePayloadShape("request", "response");
+ for (const field of ["bytes_in", "bytes_out", "tokens_in", "tokens_out", "tokenize_ms"]) {
+ assert.ok(field in s, `missing field: ${field}`);
+ assert.equal(typeof s[field], "number", `${field} must be number, got ${typeof s[field]}`);
+ }
+});
+
+await test("measurePayloadShape bytes match UTF-8 byte length", async () => {
+ const req = "hello"; // 5 bytes
+ const res = "caf\u00e9"; // 4 chars, 5 UTF-8 bytes (\u00e9 = 2 bytes)
+ const s = await measurePayloadShape(req, res);
+ assert.equal(s.bytes_in, 5, `bytes_in: expected 5, got ${s.bytes_in}`);
+ assert.equal(s.bytes_out, 5, `bytes_out: expected 5, got ${s.bytes_out}`);
+});
+
+await test("measurePayloadShape produces positive token counts for non-empty input", async () => {
+ const s = await measurePayloadShape(
+ JSON.stringify({ jsonrpc: "2.0", method: "tools/call", id: 1 }),
+ JSON.stringify({ jsonrpc: "2.0", id: 1, result: { ok: true } }),
+ );
+ assert.ok(s.tokens_in > 0, "tokens_in should be > 0");
+ assert.ok(s.tokens_out > 0, "tokens_out should be > 0");
+});
+
+await test("measurePayloadShape tokenize_ms is non-negative and finite", async () => {
+ const s = await measurePayloadShape("a", "b");
+ assert.ok(s.tokenize_ms >= 0, "tokenize_ms must be >= 0");
+ assert.ok(Number.isFinite(s.tokenize_ms), "tokenize_ms must be finite");
+});
+
+await test("measurePayloadShape handles empty response (SSE skipped)", async () => {
+ const s = await measurePayloadShape("hello", "");
+ assert.equal(s.bytes_out, 0);
+ assert.equal(s.tokens_out, 0);
+ assert.ok(s.bytes_in > 0);
+});
+
+console.log(`\n${pass} passed, ${fail} failed`);
+process.exit(fail > 0 ? 1 : 0);You can send follow-ups to the cloud agent here.
Mocks env.ODDKIT_TELEMETRY with a writeDataPoint capture, then exercises recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads. Verifies end-to-end that the full PayloadShape lands in doubles 3-7, that bytes match TextEncoder UTF-8 length, that batch JSON-RPC produces one point per message, and that malformed input is silently dropped. 7/7 cases pass. Notable: the realistic ~8KB response measured tokenize_ms=0.948ms — within 14% of the bench prediction (~1.1ms median for 8KB on Node). The dream-home walkthrough was accurate; real prod will differ but the order of magnitude is locked. Compiles tokenize.ts + telemetry.ts via tsc into a temp dir, post-patches the JSON import to add Node 22's required attribute syntax, then dynamic-imports. Same code path that ships in the worker bundle. This is the verification that wrangler dev would have done if workerd ran in this nested sandbox (it doesn't — workerd dies after declaring ready, likely a Linux capability issue with the container).
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Response clone outside try/catch can break requests
- Wrapped the response.clone() call in a try/catch that falls back to null, preserving the invariant that telemetry never breaks MCP requests.
Preview (6b8dac410e)
diff --git a/workers/package-lock.json b/workers/package-lock.json
--- a/workers/package-lock.json
+++ b/workers/package-lock.json
@@ -1,15 +1,16 @@
{
"name": "oddkit-mcp-worker",
- "version": "0.23.0",
+ "version": "0.23.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "oddkit-mcp-worker",
- "version": "0.23.0",
+ "version": "0.23.1",
"dependencies": {
"agents": "^0.4.1",
"fflate": "^0.8.2",
+ "gpt-tokenizer": "^3.0.0",
"zod": "^4.3.6"
},
"devDependencies": {
@@ -2149,6 +2150,12 @@
"url": "https://github.com/sponsors/ljharb"
}
},
+ "node_modules/gpt-tokenizer": {
+ "version": "3.4.0",
+ "resolved": "https://registry.npmjs.org/gpt-tokenizer/-/gpt-tokenizer-3.4.0.tgz",
+ "integrity": "sha512-wxFLnhIXTDjYebd9A9pGl3e31ZpSypbpIJSOswbgop5jLte/AsZVDvjlbEuVFlsqZixVKqbcoNmRlFDf6pz/UQ==",
+ "license": "MIT"
+ },
"node_modules/has-symbols": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
diff --git a/workers/package.json b/workers/package.json
--- a/workers/package.json
+++ b/workers/package.json
@@ -12,7 +12,8 @@
"dependencies": {
"agents": "^0.4.1",
"fflate": "^0.8.2",
- "zod": "^4.3.6"
+ "zod": "^4.3.6",
+ "gpt-tokenizer": "^3.0.0"
},
"devDependencies": {
"@cloudflare/workers-types": "^4.20250124.0",
diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -958,14 +958,39 @@
// Phase 1 telemetry — non-blocking, fire-and-forget (E0008)
// Phase 1.5: cache_tier from tracer feeds blob9 (E0008.1)
+ // Phase 2: payload shape (bytes_in/out, tokens_in/out, tokenize_ms) feeds
+ // doubles 3–7. All measurement happens inside waitUntil so the response
+ // returns to the caller with zero added latency. SSE responses are
+ // recognized by content-type and skip body measurement (zeros recorded).
if (telemetryClone) {
const durationMs = Date.now() - startTime;
const cacheTier = tracer.indexSource;
+
+ // Clone the response NOW (before it's consumed by the network) so we
+ // can read its body in the background. The original `response` flows
+ // back to the caller untouched.
+ const responseContentType = response.headers.get("content-type") ?? "";
+ let responseClone: Response | null = null;
+ try {
+ responseClone = responseContentType.includes("application/json")
+ ? response.clone()
+ : null;
+ } catch {
+ // Telemetry must never break MCP requests
+ responseClone = null;
+ }
+
ctx.waitUntil(
(async () => {
try {
+ const requestText = await telemetryClone.text();
+ const responseText = responseClone ? await responseClone.text() : "";
+
+ const { measurePayloadShape } = await import("./tokenize");
const { recordTelemetry } = await import("./telemetry");
- await recordTelemetry(telemetryClone, env, durationMs, cacheTier);
+
+ const shape = await measurePayloadShape(requestText, responseText);
+ recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
} catch {
// Telemetry must never break MCP requests
}
diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts
--- a/workers/src/telemetry.ts
+++ b/workers/src/telemetry.ts
@@ -28,12 +28,30 @@
* handler's internal compute. Expect a long tail on
* cache-miss requests even for trivial actions like
* oddkit_time.
+ * double3: bytes_in — UTF-8 byte length of the JSON-RPC request body.
+ * 0 when telemetry was unable to read the body.
+ * Tokenizer-agnostic; exact wire size.
+ * double4: bytes_out — UTF-8 byte length of the response body. 0 for
+ * streamed responses (SSE) where the body cannot be
+ * measured without consuming the stream.
+ * double5: tokens_in — cl100k_base token count of the request body.
+ * See `tokenize.ts` for the tokenizer-choice rationale.
+ * 0 when tokenization was skipped or failed.
+ * double6: tokens_out — cl100k_base token count of the response body. 0 for
+ * streamed responses or tokenizer failure.
+ * double7: tokenize_ms — Total wall-clock time spent tokenizing both payloads
+ * in the waitUntil() background task. Distinct from
+ * the response trace — tokenization happens after the
+ * response is sent so it never adds user-facing latency.
+ * A value of 0 alongside non-zero bytes indicates the
+ * tokenizer was skipped (load failure or empty payload).
* index1: sampling_key — consumer label (for sampling consistency)
*
* See: klappy://canon/constraints/telemetry-governance
*/
import type { Env } from "./zip-baseline-fetcher";
+import type { PayloadShape } from "./tokenize";
import pkg from "../package.json";
// Build-time fallback for blob8 (worker_version). env.ODDKIT_VERSION is
@@ -198,55 +216,86 @@
* Record one telemetry data point per JSON-RPC message.
* Non-blocking — uses env.ODDKIT_TELEMETRY.writeDataPoint() which requires
* no await (fire-and-forget via Analytics Engine).
- * Called with a cloned request to avoid consuming the original body.
+ *
+ * Caller responsibilities:
+ * - Pass the raw request body as `requestBody` (string). Already-cloned and
+ * read; this function will parse it as JSON-RPC.
+ * - Pass the original `request` so consumer-label resolution can read URL
+ * params and headers.
+ * - Pass `shape` describing the payload byte and token shape, or null to
+ * write zeros for the shape doubles (e.g. when the response could not be
+ * measured because it was an SSE stream).
*/
-export function recordTelemetry(request: Request, env: Env, durationMs: number, cacheTier?: string): Promise<void> {
- if (!env.ODDKIT_TELEMETRY) return Promise.resolve();
+export function recordTelemetry(
+ request: Request,
+ requestBody: string,
+ env: Env,
+ durationMs: number,
+ cacheTier?: string,
+ shape?: PayloadShape | null,
+): void {
+ if (!env.ODDKIT_TELEMETRY) return;
- // Parse the request body to extract JSON-RPC details
- return request
- .json()
- .then((body: unknown) => {
- // Handle batch requests — process each message
- const messages = Array.isArray(body) ? body : [body];
+ let body: unknown;
+ try {
+ body = JSON.parse(requestBody);
+ } catch {
+ // Malformed JSON-RPC — silently drop, telemetry must never break MCP requests
+ return;
+ }
- for (const payload of messages) {
- const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
- request,
- payload,
- );
- const toolCall = parseToolCall(payload);
+ // Handle batch requests — process each message
+ const messages = Array.isArray(body) ? body : [body];
- const msg =
- typeof payload === "object" && payload !== null
- ? (payload as Record<string, unknown>)
- : {};
- const method = typeof msg.method === "string" ? msg.method : "unknown";
+ // Bytes/tokens are per-request (not per-message); for batches we attribute
+ // the full payload shape to each message rather than fabricating a split.
+ const bytesIn = shape?.bytes_in ?? 0;
+ const bytesOut = shape?.bytes_out ?? 0;
+ const tokensIn = shape?.tokens_in ?? 0;
+ const tokensOut = shape?.tokens_out ?? 0;
+ const tokenizeMs = shape?.tokenize_ms ?? 0;
- const eventType = toolCall ? "tool_call" : "mcp_request";
- const toolName = toolCall?.toolName ?? "";
- const documentUri = toolCall?.documentUri ?? "";
+ for (const payload of messages) {
+ const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
+ request,
+ payload,
+ );
+ const toolCall = parseToolCall(payload);
- env.ODDKIT_TELEMETRY!.writeDataPoint({
- blobs: [
- eventType,
- method,
- toolName,
- consumerLabel,
- consumerSource,
- toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
- documentUri,
- env.ODDKIT_VERSION || BUILD_VERSION,
- cacheTier || "none", // blob9: E0008.1 x-ray cache tier
- ],
- doubles: [1, durationMs],
- indexes: [consumerLabel],
- });
- }
- })
- .catch(() => {
- // Telemetry must never break MCP requests — silently drop parse failures
+ const msg =
+ typeof payload === "object" && payload !== null
+ ? (payload as Record<string, unknown>)
+ : {};
+ const method = typeof msg.method === "string" ? msg.method : "unknown";
+
+ const eventType = toolCall ? "tool_call" : "mcp_request";
+ const toolName = toolCall?.toolName ?? "";
+ const documentUri = toolCall?.documentUri ?? "";
+
+ env.ODDKIT_TELEMETRY!.writeDataPoint({
+ blobs: [
+ eventType,
+ method,
+ toolName,
+ consumerLabel,
+ consumerSource,
+ toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
+ documentUri,
+ env.ODDKIT_VERSION || BUILD_VERSION,
+ cacheTier || "none", // blob9: E0008.1 x-ray cache tier
+ ],
+ doubles: [
+ 1, // double1: count
+ durationMs, // double2: duration_ms
+ bytesIn, // double3: bytes_in
+ bytesOut, // double4: bytes_out
+ tokensIn, // double5: tokens_in
+ tokensOut, // double6: tokens_out
+ tokenizeMs, // double7: tokenize_ms
+ ],
+ indexes: [consumerLabel],
});
+ }
}
// ──────────────────────────────────────────────────────────────────────────────
diff --git a/workers/src/tokenize.ts b/workers/src/tokenize.ts
new file mode 100644
--- /dev/null
+++ b/workers/src/tokenize.ts
@@ -1,0 +1,117 @@
+/**
+ * Tokenizer module for oddkit MCP Worker telemetry (E0008).
+ *
+ * Provides cl100k_base token counts for request and response payloads.
+ * cl100k_base is GPT-4's tokenizer; we use it as a tokenizer-agnostic
+ * proxy for "payload token shape," not as a billing-accurate measure
+ * for any specific consumer model.
+ *
+ * Choice of cl100k_base over @anthropic-ai/tokenizer: the cl100k bundle
+ * benchmarks ~6x faster (median 0.05–1.3ms across 200B–50KB payloads on
+ * Node v8, the same engine as Cloudflare Workers) and has dramatically
+ * better p95 (no WASM memory-grow spikes). Token counts diverge from the
+ * Claude tokenizer by ~3–4% on English prose — acceptable noise floor
+ * for shape analysis. See `klappy://canon/constraints/measure-before-you-object`
+ * for the bench methodology that drove this choice.
+ *
+ * Bundle impact: ~432 KB gzipped via the `gpt-tokenizer/encoding/cl100k_base`
+ * subpath import. Loaded via dynamic import so cold paths that don't
+ * tokenize don't pay the parse cost.
+ *
+ * Failure mode: if the tokenizer fails to load or throws on a payload,
+ * `countTokensSafe` returns null. Telemetry treats null as "not measured"
+ * and writes `0` to keep the schema dense; the absence is visible in the
+ * tokenize_ms column being 0 alongside non-zero bytes.
+ *
+ * See: klappy://canon/constraints/telemetry-governance
+ */
+
+type CountTokensFn = (text: string) => number;
+
+let encoderPromise: Promise<CountTokensFn | null> | null = null;
+
+/**
+ * Lazily import gpt-tokenizer's cl100k_base encoder. Cached across requests
+ * via the module-level promise; the first call within a worker isolate pays
+ * the parse cost, all subsequent calls are warm.
+ */
+function getEncoder(): Promise<CountTokensFn | null> {
+ if (encoderPromise) return encoderPromise;
+
+ encoderPromise = import("gpt-tokenizer/encoding/cl100k_base")
+ .then((mod) => {
+ const fn = (mod as { countTokens?: CountTokensFn }).countTokens;
+ if (typeof fn !== "function") return null;
+ return fn;
+ })
+ .catch(() => null);
+
+ return encoderPromise;
+}
+
+/**
+ * Count cl100k_base tokens in `text`. Returns null on any failure
+ * (load failure, encoder throw, etc). Telemetry must never break MCP
+ * requests — this function never throws.
+ */
+export async function countTokensSafe(text: string): Promise<number | null> {
+ if (!text) return 0;
+ try {
+ const fn = await getEncoder();
+ if (!fn) return null;
+ return fn(text);
+ } catch {
+ return null;
+ }
+}
+
+/**
+ * Result of measuring a payload pair. All fields default to 0 on failure
+ * so the telemetry schema stays dense; the `tokenize_ms` field carries
+ * the signal — a value of 0 alongside non-zero bytes indicates the
+ * tokenizer was skipped or failed.
+ */
+export interface PayloadShape {
+ bytes_in: number;
+ bytes_out: number;
+ tokens_in: number;
+ tokens_out: number;
+ tokenize_ms: number;
+}
+
+/**
+ * Measure the byte and token shape of a request/response pair. Tokenization
+ * is performed once per payload using the lazy-loaded cl100k_base encoder.
+ * Bytes are measured via TextEncoder (UTF-8 byte length, the wire size).
+ */
+export async function measurePayloadShape(
+ requestText: string,
+ responseText: string,
+): Promise<PayloadShape> {
+ const encoder = new TextEncoder();
+ const bytes_in = requestText ? encoder.encode(requestText).length : 0;
+ const bytes_out = responseText ? encoder.encode(responseText).length : 0;
+
+ const start = performance.now();
+ const [tIn, tOut] = await Promise.all([
+ countTokensSafe(requestText),
+ countTokensSafe(responseText),
+ ]);
+ const tokenize_ms = Math.round((performance.now() - start) * 1000) / 1000;
+
+ // A `0` from countTokensSafe on empty text is a trivial short-circuit, not
+ // a real tokenization — only a non-null result on non-empty text proves the
+ // encoder ran. If neither payload was actually tokenized, zero out
+ // tokenize_ms to preserve the documented "skipped/failed" signal.
+ const tokenizerRan =
+ (requestText !== "" && tIn !== null) ||
+ (responseText !== "" && tOut !== null);
+
+ return {
+ bytes_in,
+ bytes_out,
+ tokens_in: tIn ?? 0,
+ tokens_out: tOut ?? 0,
+ tokenize_ms: tokenizerRan ? tokenize_ms : 0,
+ };
+}
diff --git a/workers/test/telemetry-integration.test.mjs b/workers/test/telemetry-integration.test.mjs
new file mode 100644
--- /dev/null
+++ b/workers/test/telemetry-integration.test.mjs
@@ -1,0 +1,334 @@
+#!/usr/bin/env node
+/**
+ * Integration test for the telemetry write path.
+ *
+ * Mocks env.ODDKIT_TELEMETRY with a writeDataPoint capture, then exercises
+ * recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads.
+ *
+ * Verifies end-to-end:
+ * - The full PayloadShape lands in doubles 3-7
+ * - bytes_in/out match TextEncoder UTF-8 byte length on the actual payloads
+ * - tokens_in/out are positive integers when payloads are non-empty
+ * - tokenize_ms is non-negative and finite
+ * - Batch JSON-RPC produces one data point per message
+ * - SSE simulation (responseText="") records zeros for the response side
+ * - Tool-call payloads correctly populate blob3 (tool_name)
+ * - The blob array is exactly 9 entries and the doubles array is exactly 7
+ *
+ * This is the verification that wrangler dev would have done — same code
+ * path, same schema, real tokenizer.
+ */
+
+import assert from "node:assert/strict";
+import { spawnSync } from "node:child_process";
+import { mkdtempSync, writeFileSync, symlinkSync, existsSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const WORKERS_ROOT = join(__dirname, "..");
+
+// Compile both telemetry.ts and tokenize.ts to a temp dir so we can import them
+const tmp = mkdtempSync(join(tmpdir(), "oddkit-telemetry-int-"));
+const tsconfig = {
+ compilerOptions: {
+ target: "ES2022",
+ module: "ES2022",
+ moduleResolution: "bundler",
+ lib: ["ES2022", "DOM"],
+ types: ["@cloudflare/workers-types"],
+ noEmitOnError: false,
+ strict: false,
+ skipLibCheck: true,
+ resolveJsonModule: true,
+ allowSyntheticDefaultImports: true,
+ esModuleInterop: true,
+ rootDir: join(WORKERS_ROOT, "src"),
+ outDir: join(tmp, "build"),
+ },
+ include: [
+ join(WORKERS_ROOT, "src", "tokenize.ts"),
+ join(WORKERS_ROOT, "src", "telemetry.ts"),
+ ],
+};
+const tsconfigPath = join(tmp, "tsconfig.json");
+writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2));
+
+const tmpNodeModules = join(tmp, "node_modules");
+if (!existsSync(tmpNodeModules)) {
+ symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules);
+}
+
+// telemetry.ts imports `../package.json` — symlink that too
+if (!existsSync(join(tmp, "package.json"))) {
+ symlinkSync(join(WORKERS_ROOT, "package.json"), join(tmp, "package.json"));
+}
+
+const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], {
+ encoding: "utf8",
+});
+
+// With noEmitOnError: false, tsc may exit non-zero on type errors elsewhere
+// in the dep graph (zip-baseline-fetcher.ts has some workers-types friction)
+// while still producing the .js files we need. Only bail if the files we
+// actually need weren't emitted.
+const tokenizeJs = join(tmp, "build", "tokenize.js");
+const telemetryJs = join(tmp, "build", "telemetry.js");
+if (!existsSync(tokenizeJs) || !existsSync(telemetryJs)) {
+ console.error("TypeScript compile failed (target files not emitted):");
+ console.error(compile.stdout);
+ console.error(compile.stderr);
+ process.exit(1);
+}
+if (compile.status !== 0 && process.env.DEBUG) {
+ console.error("Note: tsc reported errors but target .js files were emitted:");
+ console.error(compile.stdout);
+}
+
+// Newer Node requires `with { type: "json" }` on JSON imports in ESM.
+// TypeScript doesn't add this — patch it in.
+const { readFileSync, writeFileSync: wf } = await import("node:fs");
+let telemetrySrc = readFileSync(telemetryJs, "utf8");
+telemetrySrc = telemetrySrc.replace(
+ /from ["']\.\.\/package\.json["'];/g,
+ 'from "../package.json" with { type: "json" };',
+);
+wf(telemetryJs, telemetrySrc);
+
+const { measurePayloadShape } = await import(tokenizeJs);
+const { recordTelemetry } = await import(telemetryJs);
+
+// ─── Mock env with writeDataPoint capture ──────────────────────────────────
+
+class MockAnalyticsEngine {
+ constructor() {
+ this.writes = [];
+ }
+ writeDataPoint(point) {
+ this.writes.push(point);
+ }
+}
+
+function mockEnv() {
+ return {
+ ODDKIT_TELEMETRY: new MockAnalyticsEngine(),
+ DEFAULT_KNOWLEDGE_BASE_URL: "https://raw.githubusercontent.com/klappy/klappy.dev/main",
+ ODDKIT_VERSION: "0.23.1-test",
+ };
+}
+
+function mockRequest(consumerLabel = "integration-test") {
+ return new Request(`https://oddkit.klappy.dev/mcp?consumer=${consumerLabel}`, {
+ method: "POST",
+ headers: { "Content-Type": "application/json" },
+ });
+}
+
+let pass = 0;
+let fail = 0;
+
+async function test(name, fn) {
+ try {
+ await fn();
+ console.log(` \u2713 ${name}`);
+ pass++;
+ } catch (err) {
+ console.log(` \u2717 ${name}`);
+ console.log(` ${err.message}`);
+ if (err.stack && process.env.DEBUG) console.log(err.stack);
+ fail++;
+ }
+}
+
+console.log("telemetry integration tests (full write path)\n");
+
+// ─── Test 1: oddkit_time tool call ─────────────────────────────────────────
+
+await test("oddkit_time tool call lands a complete telemetry record", async () => {
+ const env = mockEnv();
+ const requestBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 1,
+ method: "tools/call",
+ params: { name: "oddkit_time", arguments: {} },
+ });
+ const responseBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 1,
+ result: {
+ content: [
+ { type: "text", text: "Current UTC time: 2026-04-23T19:30:00.000Z" },
+ ],
+ },
+ });
+
+ const shape = await measurePayloadShape(requestBody, responseBody);
+ recordTelemetry(mockRequest(), requestBody, env, 42, "memory", shape);
+
+ assert.equal(env.ODDKIT_TELEMETRY.writes.length, 1, "should write 1 data point");
+ const point = env.ODDKIT_TELEMETRY.writes[0];
+
+ // Schema shape
+ assert.equal(point.blobs.length, 9, `blobs should be 9, got ${point.blobs.length}`);
+ assert.equal(point.doubles.length, 7, `doubles should be 7, got ${point.doubles.length}`);
+ assert.equal(point.indexes.length, 1, "indexes should be 1");
+
+ // Blobs
+ assert.equal(point.blobs[0], "tool_call", "blob1 = event_type");
+ assert.equal(point.blobs[1], "tools/call", "blob2 = method");
+ assert.equal(point.blobs[2], "oddkit_time", "blob3 = tool_name");
+ assert.equal(point.blobs[3], "integration-test", "blob4 = consumer_label");
+ assert.equal(point.blobs[4], "query-param", "blob5 = consumer_source");
+ assert.equal(point.blobs[7], "0.23.1-test", "blob8 = worker_version");
+ assert.equal(point.blobs[8], "memory", "blob9 = cache_tier");
+
+ // Doubles
+ assert.equal(point.doubles[0], 1, "double1 = count");
+ assert.equal(point.doubles[1], 42, "double2 = duration_ms");
+ assert.equal(point.doubles[2], shape.bytes_in, "double3 = bytes_in");
+ assert.equal(point.doubles[3], shape.bytes_out, "double4 = bytes_out");
+ assert.equal(point.doubles[4], shape.tokens_in, "double5 = tokens_in");
+ assert.equal(point.doubles[5], shape.tokens_out, "double6 = tokens_out");
+ assert.equal(point.doubles[6], shape.tokenize_ms, "double7 = tokenize_ms");
+
+ console.log(` bytes_in=${shape.bytes_in} bytes_out=${shape.bytes_out} ` +
+ `tokens_in=${shape.tokens_in} tokens_out=${shape.tokens_out} ` +
+ `tokenize_ms=${shape.tokenize_ms.toFixed(3)}`);
+});
+
+// ─── Test 2: oddkit_search with realistic large response ───────────────────
+
+await test("oddkit_search with realistic ~8KB response — measurements are sane", async () => {
+ const env = mockEnv();
+ const requestBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 2,
+ method: "tools/call",
+ params: { name: "oddkit", arguments: { action: "search", input: "telemetry tokens payload" } },
+ });
+ const snippet = "Telemetry exists to make decisions informed instead of blind. " +
+ "Not to profile users, not to feed a roadmap. ";
+ const responseBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 2,
+ result: {
+ content: [{ type: "text", text: snippet.repeat(80) }],
+ },
+ });
+
+ const shape = await measurePayloadShape(requestBody, responseBody);
+ recordTelemetry(mockRequest("realistic-test"), requestBody, env, 215, "r2", shape);
+
+ const point = env.ODDKIT_TELEMETRY.writes[0];
+ assert.equal(point.blobs[2], "oddkit", "tool_name = oddkit (router)");
+
+ // Realistic-sized response should be measurable
+ assert.ok(shape.bytes_out > 5000, `bytes_out should be > 5000, got ${shape.bytes_out}`);
+ assert.ok(shape.tokens_out > 1000, `tokens_out should be > 1000, got ${shape.tokens_out}`);
+
+ // Tokenization cost should be in the bench-predicted range (1-5ms for ~8KB)
+ assert.ok(shape.tokenize_ms < 100,
+ `tokenize_ms should be < 100ms for ~8KB payload (bench predicted ~1ms), got ${shape.tokenize_ms}`);
+
+ console.log(` bytes_out=${shape.bytes_out} (~${(shape.bytes_out/1024).toFixed(1)}KB) ` +
+ `tokens_out=${shape.tokens_out} ` +
+ `tokenize_ms=${shape.tokenize_ms.toFixed(3)} (bench predicted ~1ms for 8KB)`);
+});
+
+// ─── Test 3: SSE response (empty body) records zeros ───────────────────────
+
+await test("SSE response (empty body) records bytes_out=0, tokens_out=0", async () => {
+ const env = mockEnv();
+ const requestBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 3,
+ method: "tools/call",
+ params: { name: "oddkit_orient", arguments: { input: "exploring telemetry" } },
+ });
+ // Simulating the call site path where Content-Type was not application/json
+ const shape = await measurePayloadShape(requestBody, "");
+ recordTelemetry(mockRequest(), requestBody, env, 50, "memory", shape);
+
+ const point = env.ODDKIT_TELEMETRY.writes[0];
+ assert.equal(point.doubles[3], 0, "bytes_out should be 0 for empty response");
+ assert.equal(point.doubles[5], 0, "tokens_out should be 0 for empty response");
+ assert.ok(point.doubles[2] > 0, "bytes_in should still be > 0");
+});
+
+// ─── Test 4: Batch JSON-RPC writes one point per message ───────────────────
+
+await test("batch JSON-RPC produces one data point per message", async () => {
+ const env = mockEnv();
+ const batch = [
+ { jsonrpc: "2.0", id: 1, method: "tools/call", params: { name: "oddkit_time", arguments: {} } },
+ { jsonrpc: "2.0", id: 2, method: "tools/call", params: { name: "oddkit_orient", arguments: { input: "x" } } },
+ { jsonrpc: "2.0", id: 3, method: "tools/list" },
+ ];
+ const requestBody = JSON.stringify(batch);
+ const responseBody = JSON.stringify(batch.map(m => ({ jsonrpc: "2.0", id: m.id, result: { ok: true } })));
+
+ const shape = await measurePayloadShape(requestBody, responseBody);
+ recordTelemetry(mockRequest(), requestBody, env, 30, "cache", shape);
+
+ assert.equal(env.ODDKIT_TELEMETRY.writes.length, 3, `should write 3 data points, got ${env.ODDKIT_TELEMETRY.writes.length}`);
+ assert.equal(env.ODDKIT_TELEMETRY.writes[0].blobs[2], "oddkit_time");
+ assert.equal(env.ODDKIT_TELEMETRY.writes[1].blobs[2], "oddkit_orient");
+ assert.equal(env.ODDKIT_TELEMETRY.writes[2].blobs[1], "tools/list");
+ assert.equal(env.ODDKIT_TELEMETRY.writes[2].blobs[2], "", "tools/list has no tool_name");
+
+ // All 3 messages get the same payload-shape attribution (per-request, not per-message)
+ for (const w of env.ODDKIT_TELEMETRY.writes) {
+ assert.equal(w.doubles[2], shape.bytes_in);
+ assert.equal(w.doubles[3], shape.bytes_out);
+ }
+});
+
+// ─── Test 5: Malformed JSON-RPC gets dropped silently ──────────────────────
+
+await test("malformed JSON-RPC is silently dropped (telemetry never throws)", async () => {
+ const env = mockEnv();
+ // Pass garbage as the "body" — recordTelemetry should swallow the parse error
+ const requestBody = "not valid json {{{";
+ const shape = await measurePayloadShape(requestBody, "ok");
+
+ // Should not throw
+ recordTelemetry(mockRequest(), requestBody, env, 10, "none", shape);
+ assert.equal(env.ODDKIT_TELEMETRY.writes.length, 0, "should not write anything for malformed input");
+});
+
+// ─── Test 6: No env.ODDKIT_TELEMETRY → graceful no-op ──────────────────────
+
+await test("missing env.ODDKIT_TELEMETRY is a graceful no-op", async () => {
+ const env = {}; // no ODDKIT_TELEMETRY
+ const requestBody = JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/list" });
+ const shape = await measurePayloadShape(requestBody, "{}");
+ // Should not throw
+ recordTelemetry(mockRequest(), requestBody, env, 5, "memory", shape);
+});
+
+// ─── Test 7: The tokenize_ms warm-vs-cold pattern ──────────────────────────
+
+await test("tokenize_ms cold-call > warm-call (encoder caches across calls)", async () => {
+ const reqA = JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/call",
+ params: { name: "oddkit_time", arguments: {} } });
+ const resA = JSON.stringify({ jsonrpc: "2.0", id: 1, result: { x: 1 } });
+
+ const cold = await measurePayloadShape(reqA, resA);
+ const warm = await measurePayloadShape(reqA, resA);
+ const warmer = await measurePayloadShape(reqA, resA);
+
+ console.log(` cold=${cold.tokenize_ms.toFixed(3)}ms ` +
+ `warm=${warm.tokenize_ms.toFixed(3)}ms ` +
+ `warmer=${warmer.tokenize_ms.toFixed(3)}ms`);
+
+ // The warm calls should be bounded — not asserting strict ordering
+ // because timing jitter can flip them, but the median should be tiny.
+ assert.ok(warm.tokenize_ms < 50,
+ `warm tokenize_ms should be < 50ms, got ${warm.tokenize_ms}`);
+ assert.ok(warmer.tokenize_ms < 50,
+ `warmer tokenize_ms should be < 50ms, got ${warmer.tokenize_ms}`);
+});
+
+console.log(`\n${pass} passed, ${fail} failed`);
+process.exit(fail > 0 ? 1 : 0);
diff --git a/workers/test/tokenize.test.mjs b/workers/test/tokenize.test.mjs
new file mode 100644
--- /dev/null
+++ b/workers/test/tokenize.test.mjs
@@ -1,0 +1,134 @@
+#!/usr/bin/env node
+/**
+ * Unit test for workers/src/tokenize.ts.
+ *
+ * Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports the
+ * compiled .js. The compile step exercises the same TypeScript surface
+ * that ships in the worker bundle.
+ */
+
+import assert from "node:assert/strict";
+import { spawnSync } from "node:child_process";
+import { mkdtempSync, writeFileSync, symlinkSync, existsSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const WORKERS_ROOT = join(__dirname, "..");
+const TOKENIZE_TS = join(WORKERS_ROOT, "src", "tokenize.ts");
+
+const tmp = mkdtempSync(join(tmpdir(), "oddkit-tokenize-test-"));
+const tsconfig = {
+ compilerOptions: {
+ target: "ES2022",
+ module: "ES2022",
+ moduleResolution: "bundler",
+ lib: ["ES2022", "DOM"],
+ types: [],
+ strict: false,
+ skipLibCheck: true,
+ resolveJsonModule: true,
+ allowSyntheticDefaultImports: true,
+ esModuleInterop: true,
+ rootDir: join(WORKERS_ROOT, "src"),
+ outDir: tmp,
+ },
+ include: [TOKENIZE_TS],
+};
+const tsconfigPath = join(tmp, "tsconfig.json");
+writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2));
+
+const tmpNodeModules = join(tmp, "node_modules");
+if (!existsSync(tmpNodeModules)) {
+ symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules);
+}
+
+const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], {
+ encoding: "utf8",
+});
+if (compile.status !== 0) {
+ console.error("TypeScript compile failed:");
+ console.error(compile.stdout);
+ console.error(compile.stderr);
+ process.exit(1);
+}
+
+const compiledPath = join(tmp, "tokenize.js");
+const { countTokensSafe, measurePayloadShape } = await import(compiledPath);
+
+let pass = 0;
+let fail = 0;
+
+async function test(name, fn) {
+ try {
+ await fn();
+ console.log(` \u2713 ${name}`);
+ pass++;
+ } catch (err) {
+ console.log(` \u2717 ${name}`);
+ console.log(` ${err.message}`);
+ fail++;
+ }
+}
... diff truncated: showing 800 of 861 linesYou can send follow-ups to the cloud agent here.
Two assertions that would have failed against the pre-fix code: 1. SSE response now asserts tokenize_ms=0 (was: only checked bytes_out/tokens_out, missed the spurious non-zero tokenize_ms that the original logic would record on every SSE response). 2. New test 'Bugbot invariant: tokenize_ms is 0 only when encoder did not actually run' explicitly covers the both-empty case (must be 0) and the request-only case (must be valid finite number). Both new assertions verify Bugbot's distinction: a 0 from countTokensSafe on empty input is a trivial short-circuit, not a real tokenization. Only non-null results on non-empty input prove the encoder ran. The pre-fix code conflated these and would have polluted the bench-vs-prod A/B comparison with spurious tokenize_ms readings on SSE traffic. Real-world tokenize_ms on the realistic 8KB integration test: 1.016ms (bench predicted 1.1ms — within 8%). 8/8 cases passing.
… JSON
CRITICAL FIX. A managed-agent smoke test against the preview deployment
caught that doubles 4 (bytes_out), 6 (tokens_out), and 7 (tokenize_ms)
were all zero across every recorded data point. Six telemetry rows
queried, six rows with bytes_out=0.
Root cause: the call site in workers/src/index.ts filtered the response
clone by Content-Type, only cloning when the type included
'application/json'. MCP's Streamable HTTP transport returns
'text/event-stream' (SSE) for tool calls, not JSON. The filter was
silently dropping almost every response, leaving responseClone null and
recording zeros for the entire response side.
This was the same performed-prudence pattern the new canon docs warn
about, applied in micro: I assumed MCP responses would be JSON without
measuring what the SDK actually returns. The smoke test caught it
because canon also prescribes verification before declaring done.
Fix:
1. New helper measureResponseShape(requestText, response) in tokenize.ts.
Clones the response, reads the body, runs measurePayloadShape. No
Content-Type filter — read everything. SSE protocol overhead (~10
bytes per event) is negligible against the actual payload size, and
oddkit's responses are bounded (no long-lived streams).
2. Call site in index.ts simplified to use the helper. Drops the
filter, drops the separate clone, drops the responseClone variable.
Cleaner code AND correct behavior.
3. Four new unit tests for measureResponseShape:
- measures application/json responses
- measures text/event-stream responses (this would have caught the
bug pre-merge)
- leaves the original response body intact (clone correctness)
- handles already-consumed body without throwing
12/12 unit tests pass, typecheck clean.
Methodology note: this fix exists because the smoke test (live MCP
calls + telemetry_public SQL) caught what unit tests missed. The
canon-prescribed verification gate worked exactly as designed —
release-validation-gate (E0008.3) at klappy://canon/constraints/release-validation-gate
mandates independent live smoke for load-bearing surface changes
before merge. The agent dispatch is that smoke.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Exported
measureResponseShapeis unused in production code- Removed the unused
measureResponseShapehelper fromworkers/src/tokenize.tsand its corresponding tests; the sole call site inindex.tsalready inlines the synchronous clone and body-read logic, and typecheck plus remaining unit tests pass.
- Removed the unused
Preview (8c91cebf85)
diff --git a/workers/package-lock.json b/workers/package-lock.json
--- a/workers/package-lock.json
+++ b/workers/package-lock.json
@@ -1,15 +1,16 @@
{
"name": "oddkit-mcp-worker",
- "version": "0.23.0",
+ "version": "0.23.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "oddkit-mcp-worker",
- "version": "0.23.0",
+ "version": "0.23.1",
"dependencies": {
"agents": "^0.4.1",
"fflate": "^0.8.2",
+ "gpt-tokenizer": "^3.0.0",
"zod": "^4.3.6"
},
"devDependencies": {
@@ -2149,6 +2150,12 @@
"url": "https://github.com/sponsors/ljharb"
}
},
+ "node_modules/gpt-tokenizer": {
+ "version": "3.4.0",
+ "resolved": "https://registry.npmjs.org/gpt-tokenizer/-/gpt-tokenizer-3.4.0.tgz",
+ "integrity": "sha512-wxFLnhIXTDjYebd9A9pGl3e31ZpSypbpIJSOswbgop5jLte/AsZVDvjlbEuVFlsqZixVKqbcoNmRlFDf6pz/UQ==",
+ "license": "MIT"
+ },
"node_modules/has-symbols": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
diff --git a/workers/package.json b/workers/package.json
--- a/workers/package.json
+++ b/workers/package.json
@@ -12,7 +12,8 @@
"dependencies": {
"agents": "^0.4.1",
"fflate": "^0.8.2",
- "zod": "^4.3.6"
+ "zod": "^4.3.6",
+ "gpt-tokenizer": "^3.0.0"
},
"devDependencies": {
"@cloudflare/workers-types": "^4.20250124.0",
diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -958,14 +958,35 @@
// Phase 1 telemetry — non-blocking, fire-and-forget (E0008)
// Phase 1.5: cache_tier from tracer feeds blob9 (E0008.1)
+ // Phase 2: payload shape (bytes_in/out, tokens_in/out, tokenize_ms) feeds
+ // doubles 3–7. All measurement happens inside waitUntil so the response
+ // returns to the caller with zero added latency. Response body is
+ // measured universally — MCP's Streamable HTTP transport returns SSE,
+ // not JSON, so a Content-Type filter would (and did) drop almost every
+ // response. The helper handles clone failures safely.
if (telemetryClone) {
const durationMs = Date.now() - startTime;
const cacheTier = tracer.indexSource;
+ // Clone the response synchronously before returning so the body is
+ // still available to read inside the deferred waitUntil callback.
+ const responseClone = response.clone();
+
ctx.waitUntil(
(async () => {
try {
+ const requestText = await telemetryClone.text();
+
+ const { measurePayloadShape } = await import("./tokenize");
const { recordTelemetry } = await import("./telemetry");
- await recordTelemetry(telemetryClone, env, durationMs, cacheTier);
+
+ let responseText = "";
+ try {
+ responseText = await responseClone.text();
+ } catch {
+ // Fall through with empty string; bytes_out / tokens_out will be 0.
+ }
+ const shape = await measurePayloadShape(requestText, responseText);
+ recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
} catch {
// Telemetry must never break MCP requests
}
diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts
--- a/workers/src/telemetry.ts
+++ b/workers/src/telemetry.ts
@@ -28,12 +28,30 @@
* handler's internal compute. Expect a long tail on
* cache-miss requests even for trivial actions like
* oddkit_time.
+ * double3: bytes_in — UTF-8 byte length of the JSON-RPC request body.
+ * 0 when telemetry was unable to read the body.
+ * Tokenizer-agnostic; exact wire size.
+ * double4: bytes_out — UTF-8 byte length of the response body. 0 for
+ * streamed responses (SSE) where the body cannot be
+ * measured without consuming the stream.
+ * double5: tokens_in — cl100k_base token count of the request body.
+ * See `tokenize.ts` for the tokenizer-choice rationale.
+ * 0 when tokenization was skipped or failed.
+ * double6: tokens_out — cl100k_base token count of the response body. 0 for
+ * streamed responses or tokenizer failure.
+ * double7: tokenize_ms — Total wall-clock time spent tokenizing both payloads
+ * in the waitUntil() background task. Distinct from
+ * the response trace — tokenization happens after the
+ * response is sent so it never adds user-facing latency.
+ * A value of 0 alongside non-zero bytes indicates the
+ * tokenizer was skipped (load failure or empty payload).
* index1: sampling_key — consumer label (for sampling consistency)
*
* See: klappy://canon/constraints/telemetry-governance
*/
import type { Env } from "./zip-baseline-fetcher";
+import type { PayloadShape } from "./tokenize";
import pkg from "../package.json";
// Build-time fallback for blob8 (worker_version). env.ODDKIT_VERSION is
@@ -198,55 +216,86 @@
* Record one telemetry data point per JSON-RPC message.
* Non-blocking — uses env.ODDKIT_TELEMETRY.writeDataPoint() which requires
* no await (fire-and-forget via Analytics Engine).
- * Called with a cloned request to avoid consuming the original body.
+ *
+ * Caller responsibilities:
+ * - Pass the raw request body as `requestBody` (string). Already-cloned and
+ * read; this function will parse it as JSON-RPC.
+ * - Pass the original `request` so consumer-label resolution can read URL
+ * params and headers.
+ * - Pass `shape` describing the payload byte and token shape, or null to
+ * write zeros for the shape doubles (e.g. when the response could not be
+ * measured because it was an SSE stream).
*/
-export function recordTelemetry(request: Request, env: Env, durationMs: number, cacheTier?: string): Promise<void> {
- if (!env.ODDKIT_TELEMETRY) return Promise.resolve();
+export function recordTelemetry(
+ request: Request,
+ requestBody: string,
+ env: Env,
+ durationMs: number,
+ cacheTier?: string,
+ shape?: PayloadShape | null,
+): void {
+ if (!env.ODDKIT_TELEMETRY) return;
- // Parse the request body to extract JSON-RPC details
- return request
- .json()
- .then((body: unknown) => {
- // Handle batch requests — process each message
- const messages = Array.isArray(body) ? body : [body];
+ let body: unknown;
+ try {
+ body = JSON.parse(requestBody);
+ } catch {
+ // Malformed JSON-RPC — silently drop, telemetry must never break MCP requests
+ return;
+ }
- for (const payload of messages) {
- const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
- request,
- payload,
- );
- const toolCall = parseToolCall(payload);
+ // Handle batch requests — process each message
+ const messages = Array.isArray(body) ? body : [body];
- const msg =
- typeof payload === "object" && payload !== null
- ? (payload as Record<string, unknown>)
- : {};
- const method = typeof msg.method === "string" ? msg.method : "unknown";
+ // Bytes/tokens are per-request (not per-message); for batches we attribute
+ // the full payload shape to each message rather than fabricating a split.
+ const bytesIn = shape?.bytes_in ?? 0;
+ const bytesOut = shape?.bytes_out ?? 0;
+ const tokensIn = shape?.tokens_in ?? 0;
+ const tokensOut = shape?.tokens_out ?? 0;
+ const tokenizeMs = shape?.tokenize_ms ?? 0;
- const eventType = toolCall ? "tool_call" : "mcp_request";
- const toolName = toolCall?.toolName ?? "";
- const documentUri = toolCall?.documentUri ?? "";
+ for (const payload of messages) {
+ const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
+ request,
+ payload,
+ );
+ const toolCall = parseToolCall(payload);
- env.ODDKIT_TELEMETRY!.writeDataPoint({
- blobs: [
- eventType,
- method,
- toolName,
- consumerLabel,
- consumerSource,
- toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
- documentUri,
- env.ODDKIT_VERSION || BUILD_VERSION,
- cacheTier || "none", // blob9: E0008.1 x-ray cache tier
- ],
- doubles: [1, durationMs],
- indexes: [consumerLabel],
- });
- }
- })
- .catch(() => {
- // Telemetry must never break MCP requests — silently drop parse failures
+ const msg =
+ typeof payload === "object" && payload !== null
+ ? (payload as Record<string, unknown>)
+ : {};
+ const method = typeof msg.method === "string" ? msg.method : "unknown";
+
+ const eventType = toolCall ? "tool_call" : "mcp_request";
+ const toolName = toolCall?.toolName ?? "";
+ const documentUri = toolCall?.documentUri ?? "";
+
+ env.ODDKIT_TELEMETRY!.writeDataPoint({
+ blobs: [
+ eventType,
+ method,
+ toolName,
+ consumerLabel,
+ consumerSource,
+ toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
+ documentUri,
+ env.ODDKIT_VERSION || BUILD_VERSION,
+ cacheTier || "none", // blob9: E0008.1 x-ray cache tier
+ ],
+ doubles: [
+ 1, // double1: count
+ durationMs, // double2: duration_ms
+ bytesIn, // double3: bytes_in
+ bytesOut, // double4: bytes_out
+ tokensIn, // double5: tokens_in
+ tokensOut, // double6: tokens_out
+ tokenizeMs, // double7: tokenize_ms
+ ],
+ indexes: [consumerLabel],
});
+ }
}
// ──────────────────────────────────────────────────────────────────────────────
diff --git a/workers/src/tokenize.ts b/workers/src/tokenize.ts
new file mode 100644
--- /dev/null
+++ b/workers/src/tokenize.ts
@@ -1,0 +1,117 @@
+/**
+ * Tokenizer module for oddkit MCP Worker telemetry (E0008).
+ *
+ * Provides cl100k_base token counts for request and response payloads.
+ * cl100k_base is GPT-4's tokenizer; we use it as a tokenizer-agnostic
+ * proxy for "payload token shape," not as a billing-accurate measure
+ * for any specific consumer model.
+ *
+ * Choice of cl100k_base over @anthropic-ai/tokenizer: the cl100k bundle
+ * benchmarks ~6x faster (median 0.05–1.3ms across 200B–50KB payloads on
+ * Node v8, the same engine as Cloudflare Workers) and has dramatically
+ * better p95 (no WASM memory-grow spikes). Token counts diverge from the
+ * Claude tokenizer by ~3–4% on English prose — acceptable noise floor
+ * for shape analysis. See `klappy://canon/constraints/measure-before-you-object`
+ * for the bench methodology that drove this choice.
+ *
+ * Bundle impact: ~432 KB gzipped via the `gpt-tokenizer/encoding/cl100k_base`
+ * subpath import. Loaded via dynamic import so cold paths that don't
+ * tokenize don't pay the parse cost.
+ *
+ * Failure mode: if the tokenizer fails to load or throws on a payload,
+ * `countTokensSafe` returns null. Telemetry treats null as "not measured"
+ * and writes `0` to keep the schema dense; the absence is visible in the
+ * tokenize_ms column being 0 alongside non-zero bytes.
+ *
+ * See: klappy://canon/constraints/telemetry-governance
+ */
+
+type CountTokensFn = (text: string) => number;
+
+let encoderPromise: Promise<CountTokensFn | null> | null = null;
+
+/**
+ * Lazily import gpt-tokenizer's cl100k_base encoder. Cached across requests
+ * via the module-level promise; the first call within a worker isolate pays
+ * the parse cost, all subsequent calls are warm.
+ */
+function getEncoder(): Promise<CountTokensFn | null> {
+ if (encoderPromise) return encoderPromise;
+
+ encoderPromise = import("gpt-tokenizer/encoding/cl100k_base")
+ .then((mod) => {
+ const fn = (mod as { countTokens?: CountTokensFn }).countTokens;
+ if (typeof fn !== "function") return null;
+ return fn;
+ })
+ .catch(() => null);
+
+ return encoderPromise;
+}
+
+/**
+ * Count cl100k_base tokens in `text`. Returns null on any failure
+ * (load failure, encoder throw, etc). Telemetry must never break MCP
+ * requests — this function never throws.
+ */
+export async function countTokensSafe(text: string): Promise<number | null> {
+ if (!text) return 0;
+ try {
+ const fn = await getEncoder();
+ if (!fn) return null;
+ return fn(text);
+ } catch {
+ return null;
+ }
+}
+
+/**
+ * Result of measuring a payload pair. All fields default to 0 on failure
+ * so the telemetry schema stays dense; the `tokenize_ms` field carries
+ * the signal — a value of 0 alongside non-zero bytes indicates the
+ * tokenizer was skipped or failed.
+ */
+export interface PayloadShape {
+ bytes_in: number;
+ bytes_out: number;
+ tokens_in: number;
+ tokens_out: number;
+ tokenize_ms: number;
+}
+
+/**
+ * Measure the byte and token shape of a request/response pair. Tokenization
+ * is performed once per payload using the lazy-loaded cl100k_base encoder.
+ * Bytes are measured via TextEncoder (UTF-8 byte length, the wire size).
+ */
+export async function measurePayloadShape(
+ requestText: string,
+ responseText: string,
+): Promise<PayloadShape> {
+ const encoder = new TextEncoder();
+ const bytes_in = requestText ? encoder.encode(requestText).length : 0;
+ const bytes_out = responseText ? encoder.encode(responseText).length : 0;
+
+ const start = performance.now();
+ const [tIn, tOut] = await Promise.all([
+ countTokensSafe(requestText),
+ countTokensSafe(responseText),
+ ]);
+ const tokenize_ms = Math.round((performance.now() - start) * 1000) / 1000;
+
+ // A `0` from countTokensSafe on empty text is a trivial short-circuit, not
+ // a real tokenization — only a non-null result on non-empty text proves the
+ // encoder ran. If neither payload was actually tokenized, zero out
+ // tokenize_ms to preserve the documented "skipped/failed" signal.
+ const tokenizerRan =
+ (requestText !== "" && tIn !== null) ||
+ (responseText !== "" && tOut !== null);
+
+ return {
+ bytes_in,
+ bytes_out,
+ tokens_in: tIn ?? 0,
+ tokens_out: tOut ?? 0,
+ tokenize_ms: tokenizerRan ? tokenize_ms : 0,
+ };
+}
diff --git a/workers/test/telemetry-integration.test.mjs b/workers/test/telemetry-integration.test.mjs
new file mode 100644
--- /dev/null
+++ b/workers/test/telemetry-integration.test.mjs
@@ -1,0 +1,356 @@
+#!/usr/bin/env node
+/**
+ * Integration test for the telemetry write path.
+ *
+ * Mocks env.ODDKIT_TELEMETRY with a writeDataPoint capture, then exercises
+ * recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads.
+ *
+ * Verifies end-to-end:
+ * - The full PayloadShape lands in doubles 3-7
+ * - bytes_in/out match TextEncoder UTF-8 byte length on the actual payloads
+ * - tokens_in/out are positive integers when payloads are non-empty
+ * - tokenize_ms is non-negative and finite
+ * - Batch JSON-RPC produces one data point per message
+ * - SSE simulation (responseText="") records zeros for the response side
+ * - Tool-call payloads correctly populate blob3 (tool_name)
+ * - The blob array is exactly 9 entries and the doubles array is exactly 7
+ *
+ * This is the verification that wrangler dev would have done — same code
+ * path, same schema, real tokenizer.
+ */
+
+import assert from "node:assert/strict";
+import { spawnSync } from "node:child_process";
+import { mkdtempSync, writeFileSync, symlinkSync, existsSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const WORKERS_ROOT = join(__dirname, "..");
+
+// Compile both telemetry.ts and tokenize.ts to a temp dir so we can import them
+const tmp = mkdtempSync(join(tmpdir(), "oddkit-telemetry-int-"));
+const tsconfig = {
+ compilerOptions: {
+ target: "ES2022",
+ module: "ES2022",
+ moduleResolution: "bundler",
+ lib: ["ES2022", "DOM"],
+ types: ["@cloudflare/workers-types"],
+ noEmitOnError: false,
+ strict: false,
+ skipLibCheck: true,
+ resolveJsonModule: true,
+ allowSyntheticDefaultImports: true,
+ esModuleInterop: true,
+ rootDir: join(WORKERS_ROOT, "src"),
+ outDir: join(tmp, "build"),
+ },
+ include: [
+ join(WORKERS_ROOT, "src", "tokenize.ts"),
+ join(WORKERS_ROOT, "src", "telemetry.ts"),
+ ],
+};
+const tsconfigPath = join(tmp, "tsconfig.json");
+writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2));
+
+const tmpNodeModules = join(tmp, "node_modules");
+if (!existsSync(tmpNodeModules)) {
+ symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules);
+}
+
+// telemetry.ts imports `../package.json` — symlink that too
+if (!existsSync(join(tmp, "package.json"))) {
+ symlinkSync(join(WORKERS_ROOT, "package.json"), join(tmp, "package.json"));
+}
+
+const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], {
+ encoding: "utf8",
+});
+
+// With noEmitOnError: false, tsc may exit non-zero on type errors elsewhere
+// in the dep graph (zip-baseline-fetcher.ts has some workers-types friction)
+// while still producing the .js files we need. Only bail if the files we
+// actually need weren't emitted.
+const tokenizeJs = join(tmp, "build", "tokenize.js");
+const telemetryJs = join(tmp, "build", "telemetry.js");
+if (!existsSync(tokenizeJs) || !existsSync(telemetryJs)) {
+ console.error("TypeScript compile failed (target files not emitted):");
+ console.error(compile.stdout);
+ console.error(compile.stderr);
+ process.exit(1);
+}
+if (compile.status !== 0 && process.env.DEBUG) {
+ console.error("Note: tsc reported errors but target .js files were emitted:");
+ console.error(compile.stdout);
+}
+
+// Newer Node requires `with { type: "json" }` on JSON imports in ESM.
+// TypeScript doesn't add this — patch it in.
+const { readFileSync, writeFileSync: wf } = await import("node:fs");
+let telemetrySrc = readFileSync(telemetryJs, "utf8");
+telemetrySrc = telemetrySrc.replace(
+ /from ["']\.\.\/package\.json["'];/g,
+ 'from "../package.json" with { type: "json" };',
+);
+wf(telemetryJs, telemetrySrc);
+
+const { measurePayloadShape } = await import(tokenizeJs);
+const { recordTelemetry } = await import(telemetryJs);
+
+// ─── Mock env with writeDataPoint capture ──────────────────────────────────
+
+class MockAnalyticsEngine {
+ constructor() {
+ this.writes = [];
+ }
+ writeDataPoint(point) {
+ this.writes.push(point);
+ }
+}
+
+function mockEnv() {
+ return {
+ ODDKIT_TELEMETRY: new MockAnalyticsEngine(),
+ DEFAULT_KNOWLEDGE_BASE_URL: "https://raw.githubusercontent.com/klappy/klappy.dev/main",
+ ODDKIT_VERSION: "0.23.1-test",
+ };
+}
+
+function mockRequest(consumerLabel = "integration-test") {
+ return new Request(`https://oddkit.klappy.dev/mcp?consumer=${consumerLabel}`, {
+ method: "POST",
+ headers: { "Content-Type": "application/json" },
+ });
+}
+
+let pass = 0;
+let fail = 0;
+
+async function test(name, fn) {
+ try {
+ await fn();
+ console.log(` \u2713 ${name}`);
+ pass++;
+ } catch (err) {
+ console.log(` \u2717 ${name}`);
+ console.log(` ${err.message}`);
+ if (err.stack && process.env.DEBUG) console.log(err.stack);
+ fail++;
+ }
+}
+
+console.log("telemetry integration tests (full write path)\n");
+
+// ─── Test 1: oddkit_time tool call ─────────────────────────────────────────
+
+await test("oddkit_time tool call lands a complete telemetry record", async () => {
+ const env = mockEnv();
+ const requestBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 1,
+ method: "tools/call",
+ params: { name: "oddkit_time", arguments: {} },
+ });
+ const responseBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 1,
+ result: {
+ content: [
+ { type: "text", text: "Current UTC time: 2026-04-23T19:30:00.000Z" },
+ ],
+ },
+ });
+
+ const shape = await measurePayloadShape(requestBody, responseBody);
+ recordTelemetry(mockRequest(), requestBody, env, 42, "memory", shape);
+
+ assert.equal(env.ODDKIT_TELEMETRY.writes.length, 1, "should write 1 data point");
+ const point = env.ODDKIT_TELEMETRY.writes[0];
+
+ // Schema shape
+ assert.equal(point.blobs.length, 9, `blobs should be 9, got ${point.blobs.length}`);
+ assert.equal(point.doubles.length, 7, `doubles should be 7, got ${point.doubles.length}`);
+ assert.equal(point.indexes.length, 1, "indexes should be 1");
+
+ // Blobs
+ assert.equal(point.blobs[0], "tool_call", "blob1 = event_type");
+ assert.equal(point.blobs[1], "tools/call", "blob2 = method");
+ assert.equal(point.blobs[2], "oddkit_time", "blob3 = tool_name");
+ assert.equal(point.blobs[3], "integration-test", "blob4 = consumer_label");
+ assert.equal(point.blobs[4], "query-param", "blob5 = consumer_source");
+ assert.equal(point.blobs[7], "0.23.1-test", "blob8 = worker_version");
+ assert.equal(point.blobs[8], "memory", "blob9 = cache_tier");
+
+ // Doubles
+ assert.equal(point.doubles[0], 1, "double1 = count");
+ assert.equal(point.doubles[1], 42, "double2 = duration_ms");
+ assert.equal(point.doubles[2], shape.bytes_in, "double3 = bytes_in");
+ assert.equal(point.doubles[3], shape.bytes_out, "double4 = bytes_out");
+ assert.equal(point.doubles[4], shape.tokens_in, "double5 = tokens_in");
+ assert.equal(point.doubles[5], shape.tokens_out, "double6 = tokens_out");
+ assert.equal(point.doubles[6], shape.tokenize_ms, "double7 = tokenize_ms");
+
+ console.log(` bytes_in=${shape.bytes_in} bytes_out=${shape.bytes_out} ` +
+ `tokens_in=${shape.tokens_in} tokens_out=${shape.tokens_out} ` +
+ `tokenize_ms=${shape.tokenize_ms.toFixed(3)}`);
+});
+
+// ─── Test 2: oddkit_search with realistic large response ───────────────────
+
+await test("oddkit_search with realistic ~8KB response — measurements are sane", async () => {
+ const env = mockEnv();
+ const requestBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 2,
+ method: "tools/call",
+ params: { name: "oddkit", arguments: { action: "search", input: "telemetry tokens payload" } },
+ });
+ const snippet = "Telemetry exists to make decisions informed instead of blind. " +
+ "Not to profile users, not to feed a roadmap. ";
+ const responseBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 2,
+ result: {
+ content: [{ type: "text", text: snippet.repeat(80) }],
+ },
+ });
+
+ const shape = await measurePayloadShape(requestBody, responseBody);
+ recordTelemetry(mockRequest("realistic-test"), requestBody, env, 215, "r2", shape);
+
+ const point = env.ODDKIT_TELEMETRY.writes[0];
+ assert.equal(point.blobs[2], "oddkit", "tool_name = oddkit (router)");
+
+ // Realistic-sized response should be measurable
+ assert.ok(shape.bytes_out > 5000, `bytes_out should be > 5000, got ${shape.bytes_out}`);
+ assert.ok(shape.tokens_out > 1000, `tokens_out should be > 1000, got ${shape.tokens_out}`);
+
+ // Tokenization cost should be in the bench-predicted range (1-5ms for ~8KB)
+ assert.ok(shape.tokenize_ms < 100,
+ `tokenize_ms should be < 100ms for ~8KB payload (bench predicted ~1ms), got ${shape.tokenize_ms}`);
+
+ console.log(` bytes_out=${shape.bytes_out} (~${(shape.bytes_out/1024).toFixed(1)}KB) ` +
+ `tokens_out=${shape.tokens_out} ` +
+ `tokenize_ms=${shape.tokenize_ms.toFixed(3)} (bench predicted ~1ms for 8KB)`);
+});
+
+// ─── Test 3: SSE response (empty body) records zeros ───────────────────────
+
+await test("SSE response (empty body) records bytes_out=0, tokens_out=0, tokenize_ms=0", async () => {
+ const env = mockEnv();
+ const requestBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 3,
+ method: "tools/call",
+ params: { name: "oddkit_orient", arguments: { input: "exploring telemetry" } },
+ });
+ // Simulating the call site path where Content-Type was not application/json
+ const shape = await measurePayloadShape(requestBody, "");
+ recordTelemetry(mockRequest(), requestBody, env, 50, "memory", shape);
+
+ const point = env.ODDKIT_TELEMETRY.writes[0];
+ assert.equal(point.doubles[3], 0, "bytes_out should be 0 for empty response");
+ assert.equal(point.doubles[5], 0, "tokens_out should be 0 for empty response");
+ assert.ok(point.doubles[2] > 0, "bytes_in should still be > 0");
+});
+
+// Bugbot's fix (commit c4f5752) — distinguish "encoder ran" from
+// "encoder short-circuited on empty input." If the response is empty (SSE)
+// AND the encoder only ran on the request, that still counts as "ran" and
+// tokenize_ms must reflect the real cost. But if BOTH sides are empty,
+// tokenize_ms must be 0. This case locks both halves of that invariant in.
+await test("Bugbot invariant: tokenize_ms is 0 only when encoder did not actually run", async () => {
+ // Case A: both empty → tokenize_ms must be 0 (no encoder call did meaningful work)
+ const bothEmpty = await measurePayloadShape("", "");
+ assert.equal(bothEmpty.tokenize_ms, 0,
+ `both empty: tokenize_ms must be 0, got ${bothEmpty.tokenize_ms}`);
+
+ // Case B: request only → tokenize_ms can be non-zero (encoder ran on request)
+ const requestOnly = await measurePayloadShape("hello world payload", "");
+ assert.ok(requestOnly.tokenize_ms >= 0, "tokenize_ms must be >= 0");
+ assert.ok(requestOnly.tokens_in > 0, "tokens_in should be > 0 when request has content");
+ // tokenize_ms may be 0 if the call was extremely fast, but it must NOT be
+ // forced to zero just because responseText is empty. Confirming only that
+ // the field is present and finite — the prior bug was a non-zero value
+ // being recorded when nothing ran, not the inverse.
+ assert.ok(Number.isFinite(requestOnly.tokenize_ms), "tokenize_ms must be finite");
+});
+
+// ─── Test 4: Batch JSON-RPC writes one point per message ───────────────────
+
+await test("batch JSON-RPC produces one data point per message", async () => {
+ const env = mockEnv();
+ const batch = [
+ { jsonrpc: "2.0", id: 1, method: "tools/call", params: { name: "oddkit_time", arguments: {} } },
+ { jsonrpc: "2.0", id: 2, method: "tools/call", params: { name: "oddkit_orient", arguments: { input: "x" } } },
+ { jsonrpc: "2.0", id: 3, method: "tools/list" },
+ ];
+ const requestBody = JSON.stringify(batch);
+ const responseBody = JSON.stringify(batch.map(m => ({ jsonrpc: "2.0", id: m.id, result: { ok: true } })));
+
+ const shape = await measurePayloadShape(requestBody, responseBody);
+ recordTelemetry(mockRequest(), requestBody, env, 30, "cache", shape);
+
+ assert.equal(env.ODDKIT_TELEMETRY.writes.length, 3, `should write 3 data points, got ${env.ODDKIT_TELEMETRY.writes.length}`);
+ assert.equal(env.ODDKIT_TELEMETRY.writes[0].blobs[2], "oddkit_time");
+ assert.equal(env.ODDKIT_TELEMETRY.writes[1].blobs[2], "oddkit_orient");
+ assert.equal(env.ODDKIT_TELEMETRY.writes[2].blobs[1], "tools/list");
+ assert.equal(env.ODDKIT_TELEMETRY.writes[2].blobs[2], "", "tools/list has no tool_name");
+
+ // All 3 messages get the same payload-shape attribution (per-request, not per-message)
+ for (const w of env.ODDKIT_TELEMETRY.writes) {
+ assert.equal(w.doubles[2], shape.bytes_in);
+ assert.equal(w.doubles[3], shape.bytes_out);
+ }
+});
+
+// ─── Test 5: Malformed JSON-RPC gets dropped silently ──────────────────────
+
+await test("malformed JSON-RPC is silently dropped (telemetry never throws)", async () => {
+ const env = mockEnv();
+ // Pass garbage as the "body" — recordTelemetry should swallow the parse error
+ const requestBody = "not valid json {{{";
+ const shape = await measurePayloadShape(requestBody, "ok");
+
+ // Should not throw
+ recordTelemetry(mockRequest(), requestBody, env, 10, "none", shape);
+ assert.equal(env.ODDKIT_TELEMETRY.writes.length, 0, "should not write anything for malformed input");
+});
+
+// ─── Test 6: No env.ODDKIT_TELEMETRY → graceful no-op ──────────────────────
+
+await test("missing env.ODDKIT_TELEMETRY is a graceful no-op", async () => {
+ const env = {}; // no ODDKIT_TELEMETRY
+ const requestBody = JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/list" });
+ const shape = await measurePayloadShape(requestBody, "{}");
+ // Should not throw
+ recordTelemetry(mockRequest(), requestBody, env, 5, "memory", shape);
+});
+
+// ─── Test 7: The tokenize_ms warm-vs-cold pattern ──────────────────────────
+
+await test("tokenize_ms cold-call > warm-call (encoder caches across calls)", async () => {
+ const reqA = JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/call",
+ params: { name: "oddkit_time", arguments: {} } });
+ const resA = JSON.stringify({ jsonrpc: "2.0", id: 1, result: { x: 1 } });
+
+ const cold = await measurePayloadShape(reqA, resA);
+ const warm = await measurePayloadShape(reqA, resA);
+ const warmer = await measurePayloadShape(reqA, resA);
+
+ console.log(` cold=${cold.tokenize_ms.toFixed(3)}ms ` +
+ `warm=${warm.tokenize_ms.toFixed(3)}ms ` +
+ `warmer=${warmer.tokenize_ms.toFixed(3)}ms`);
+
+ // The warm calls should be bounded — not asserting strict ordering
+ // because timing jitter can flip them, but the median should be tiny.
+ assert.ok(warm.tokenize_ms < 50,
+ `warm tokenize_ms should be < 50ms, got ${warm.tokenize_ms}`);
+ assert.ok(warmer.tokenize_ms < 50,
+ `warmer tokenize_ms should be < 50ms, got ${warmer.tokenize_ms}`);
+});
+
+console.log(`\n${pass} passed, ${fail} failed`);
+process.exit(fail > 0 ? 1 : 0);
diff --git a/workers/test/tokenize.test.mjs b/workers/test/tokenize.test.mjs
new file mode 100644
--- /dev/null
+++ b/workers/test/tokenize.test.mjs
@@ -1,0 +1,134 @@
+#!/usr/bin/env node
+/**
+ * Unit test for workers/src/tokenize.ts.
+ *
+ * Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports the
+ * compiled .js. The compile step exercises the same TypeScript surface
+ * that ships in the worker bundle.
+ */
+
+import assert from "node:assert/strict";
+import { spawnSync } from "node:child_process";
+import { mkdtempSync, writeFileSync, symlinkSync, existsSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const WORKERS_ROOT = join(__dirname, "..");
+const TOKENIZE_TS = join(WORKERS_ROOT, "src", "tokenize.ts");
+
+const tmp = mkdtempSync(join(tmpdir(), "oddkit-tokenize-test-"));
+const tsconfig = {
+ compilerOptions: {
+ target: "ES2022",
+ module: "ES2022",
+ moduleResolution: "bundler",
+ lib: ["ES2022", "DOM"],
+ types: [],
+ strict: false,
+ skipLibCheck: true,
+ resolveJsonModule: true,
+ allowSyntheticDefaultImports: true,
+ esModuleInterop: true,
+ rootDir: join(WORKERS_ROOT, "src"),
+ outDir: tmp,
+ },
+ include: [TOKENIZE_TS],
+};
+const tsconfigPath = join(tmp, "tsconfig.json");
+writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2));
+
+const tmpNodeModules = join(tmp, "node_modules");
+if (!existsSync(tmpNodeModules)) {
+ symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules);
+}
+
+const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], {
+ encoding: "utf8",
+});
+if (compile.status !== 0) {
+ console.error("TypeScript compile failed:");
+ console.error(compile.stdout);
+ console.error(compile.stderr);
+ process.exit(1);
+}
... diff truncated: showing 800 of 879 linesYou can send follow-ups to the cloud agent here.
…Workers Third smoke confirmed bytes_in/out and tokens_in/out now populate correctly (357-21319 bytes_out, 142-5398 tokens_out across varied payloads). But double7 (tokenize_ms) is still 0 across every row. Root cause: Cloudflare Workers' performance.now() is a deterministic timer — it does NOT advance during synchronous CPU work. The mitigation prevents timing-side-channel attacks. The timer only ticks on I/O. Tokenization (countTokensSafe) is pure CPU work. The encoder runs between two reads of performance.now() with no I/O in between, so both reads return the same value and tokenize_ms is always 0. Tests passed in Node because Node's performance.now() is a real high-resolution timer. Fix: switch to Date.now(). Always advances, at 1ms resolution. The bench-vs-prod comparison loses sub-millisecond precision (sub-ms tokenizations round to 0) but gains a working signal for any payload above ~5KB where bench timing exceeded 1ms. Updated the telemetry.ts schema doc comment to document the 1ms resolution and the Workers-specific reason. Methodology: this is the third Cloudflare Workers gotcha caught in prod that unit tests can't catch — Workers Runtime != Node: 1. b94aaa6 (mine): assumed MCP responses are application/json (they're SSE) 2. 1a555df (mine): assumed clone() inside waitUntil works (body already drained) 3. THIS: assumed performance.now() advances in synchronous code (it doesn't) Each was caught by the live Managed Agent smoke + telemetry_public SQL, not by typecheck or unit tests. The release-validation-gate is the only thing standing between this branch and a quietly broken prod telemetry pipeline. 8 unit tests still pass. Typecheck clean.
Cloudflare Workers' performance.now() does not advance during synchronous CPU work (deterministic-timing mitigation against side-channel attacks). Tokenization is pure CPU work, so it cannot be measured at sub-millisecond resolution in Workers. The implementation uses Date.now(), which always advances at 1ms granularity. Sub-millisecond tokenizations therefore round to 0 in production. The bench-vs-prod comparison is lower-bounded at 1ms — payloads where the bench predicted >=1ms (8KB and up per the original bench) will show real values; smaller payloads will show 0 even when tokenization ran successfully (which can be confirmed by tokens_in/tokens_out being non-zero). This caveat was caught by live smoke against the preview deployment after three earlier fixes addressed the bytes_out / tokens_out path. The release-validation-gate (klappy://canon/constraints/release-validation-gate) caught what unit tests cannot: Workers Runtime != Node behavioral diffs. Companion code commit: klappy/oddkit#134279f761.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Resolved by another fix:
tokenize_mswill always be zero in production- Upstream commit 8153745 on feat/telemetry-tokenization already resolved this by dropping tokenize_ms from PayloadShape, the doubles array, and the schema documentation, since both Date.now() and performance.now() are frozen during pure CPU work in Cloudflare Workers — my local patch to force a setTimeout(0) I/O tick was superseded when I rebased and discovered the column had been removed outright; typecheck and all 13 tests pass on the remote head.
Preview (815374507c)
diff --git a/workers/package-lock.json b/workers/package-lock.json
--- a/workers/package-lock.json
+++ b/workers/package-lock.json
@@ -1,15 +1,16 @@
{
"name": "oddkit-mcp-worker",
- "version": "0.23.0",
+ "version": "0.23.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "oddkit-mcp-worker",
- "version": "0.23.0",
+ "version": "0.23.1",
"dependencies": {
"agents": "^0.4.1",
"fflate": "^0.8.2",
+ "gpt-tokenizer": "^3.0.0",
"zod": "^4.3.6"
},
"devDependencies": {
@@ -2149,6 +2150,12 @@
"url": "https://github.com/sponsors/ljharb"
}
},
+ "node_modules/gpt-tokenizer": {
+ "version": "3.4.0",
+ "resolved": "https://registry.npmjs.org/gpt-tokenizer/-/gpt-tokenizer-3.4.0.tgz",
+ "integrity": "sha512-wxFLnhIXTDjYebd9A9pGl3e31ZpSypbpIJSOswbgop5jLte/AsZVDvjlbEuVFlsqZixVKqbcoNmRlFDf6pz/UQ==",
+ "license": "MIT"
+ },
"node_modules/has-symbols": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
diff --git a/workers/package.json b/workers/package.json
--- a/workers/package.json
+++ b/workers/package.json
@@ -12,7 +12,8 @@
"dependencies": {
"agents": "^0.4.1",
"fflate": "^0.8.2",
- "zod": "^4.3.6"
+ "zod": "^4.3.6",
+ "gpt-tokenizer": "^3.0.0"
},
"devDependencies": {
"@cloudflare/workers-types": "^4.20250124.0",
diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -958,14 +958,36 @@
// Phase 1 telemetry — non-blocking, fire-and-forget (E0008)
// Phase 1.5: cache_tier from tracer feeds blob9 (E0008.1)
+ // Phase 2: payload shape (bytes_in/out, tokens_in/out) feeds doubles
+ // 3-6. tokenize_ms was tried and dropped — Workers freezes both
+ // performance.now() and Date.now() during synchronous CPU work, making
+ // sub-request timing of pure-CPU tokenization unmeasurable. Response body is
+ // measured universally — MCP's Streamable HTTP transport returns SSE,
+ // not JSON, so a Content-Type filter would (and did) drop almost every
+ // response. The helper handles clone failures safely.
if (telemetryClone) {
const durationMs = Date.now() - startTime;
const cacheTier = tracer.indexSource;
+ // Clone the response synchronously before returning so the body is
+ // still available to read inside the deferred waitUntil callback.
+ const responseClone = response.clone();
+
ctx.waitUntil(
(async () => {
try {
+ const requestText = await telemetryClone.text();
+
+ const { measurePayloadShape } = await import("./tokenize");
const { recordTelemetry } = await import("./telemetry");
- await recordTelemetry(telemetryClone, env, durationMs, cacheTier);
+
+ let responseText = "";
+ try {
+ responseText = await responseClone.text();
+ } catch {
+ // Fall through with empty string; bytes_out / tokens_out will be 0.
+ }
+ const shape = await measurePayloadShape(requestText, responseText);
+ recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
} catch {
// Telemetry must never break MCP requests
}
diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts
--- a/workers/src/telemetry.ts
+++ b/workers/src/telemetry.ts
@@ -28,12 +28,34 @@
* handler's internal compute. Expect a long tail on
* cache-miss requests even for trivial actions like
* oddkit_time.
+ * double3: bytes_in — UTF-8 byte length of the JSON-RPC request body.
+ * 0 when telemetry was unable to read the body.
+ * Tokenizer-agnostic; exact wire size.
+ * double4: bytes_out — UTF-8 byte length of the response body. 0 for
+ * streamed responses (SSE) where the body cannot be
+ * measured without consuming the stream.
+ * double5: tokens_in — cl100k_base token count of the request body.
+ * See `tokenize.ts` for the tokenizer-choice rationale.
+ * 0 when tokenization was skipped or failed.
+ * double6: tokens_out — cl100k_base token count of the response body. 0 for
+ * streamed responses or tokenizer failure.
+ *
+ * NOTE: a previous iteration shipped a `double7: tokenize_ms` field intended
+ * to capture the wall-clock cost of tokenization for bench-vs-prod
+ * comparison. It is gone. Cloudflare Workers freezes both
+ * `performance.now()` and `Date.now()` between network I/O events as a
+ * timing-side-channel mitigation, so any timing of pure CPU work always
+ * reads 0 in production. The cost was characterized in the bench (workers/
+ * test/tokenize.test.mjs) and bytes_in/out + tokens_in/out are sufficient
+ * to predict per-call cost from that bench curve.
+ *
* index1: sampling_key — consumer label (for sampling consistency)
*
* See: klappy://canon/constraints/telemetry-governance
*/
import type { Env } from "./zip-baseline-fetcher";
+import type { PayloadShape } from "./tokenize";
import pkg from "../package.json";
// Build-time fallback for blob8 (worker_version). env.ODDKIT_VERSION is
@@ -198,55 +220,84 @@
* Record one telemetry data point per JSON-RPC message.
* Non-blocking — uses env.ODDKIT_TELEMETRY.writeDataPoint() which requires
* no await (fire-and-forget via Analytics Engine).
- * Called with a cloned request to avoid consuming the original body.
+ *
+ * Caller responsibilities:
+ * - Pass the raw request body as `requestBody` (string). Already-cloned and
+ * read; this function will parse it as JSON-RPC.
+ * - Pass the original `request` so consumer-label resolution can read URL
+ * params and headers.
+ * - Pass `shape` describing the payload byte and token shape, or null to
+ * write zeros for the shape doubles (e.g. when the response could not be
+ * measured because it was an SSE stream).
*/
-export function recordTelemetry(request: Request, env: Env, durationMs: number, cacheTier?: string): Promise<void> {
- if (!env.ODDKIT_TELEMETRY) return Promise.resolve();
+export function recordTelemetry(
+ request: Request,
+ requestBody: string,
+ env: Env,
+ durationMs: number,
+ cacheTier?: string,
+ shape?: PayloadShape | null,
+): void {
+ if (!env.ODDKIT_TELEMETRY) return;
- // Parse the request body to extract JSON-RPC details
- return request
- .json()
- .then((body: unknown) => {
- // Handle batch requests — process each message
- const messages = Array.isArray(body) ? body : [body];
+ let body: unknown;
+ try {
+ body = JSON.parse(requestBody);
+ } catch {
+ // Malformed JSON-RPC — silently drop, telemetry must never break MCP requests
+ return;
+ }
- for (const payload of messages) {
- const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
- request,
- payload,
- );
- const toolCall = parseToolCall(payload);
+ // Handle batch requests — process each message
+ const messages = Array.isArray(body) ? body : [body];
- const msg =
- typeof payload === "object" && payload !== null
- ? (payload as Record<string, unknown>)
- : {};
- const method = typeof msg.method === "string" ? msg.method : "unknown";
+ // Bytes/tokens are per-request (not per-message); for batches we attribute
+ // the full payload shape to each message rather than fabricating a split.
+ const bytesIn = shape?.bytes_in ?? 0;
+ const bytesOut = shape?.bytes_out ?? 0;
+ const tokensIn = shape?.tokens_in ?? 0;
+ const tokensOut = shape?.tokens_out ?? 0;
- const eventType = toolCall ? "tool_call" : "mcp_request";
- const toolName = toolCall?.toolName ?? "";
- const documentUri = toolCall?.documentUri ?? "";
+ for (const payload of messages) {
+ const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
+ request,
+ payload,
+ );
+ const toolCall = parseToolCall(payload);
- env.ODDKIT_TELEMETRY!.writeDataPoint({
- blobs: [
- eventType,
- method,
- toolName,
- consumerLabel,
- consumerSource,
- toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
- documentUri,
- env.ODDKIT_VERSION || BUILD_VERSION,
- cacheTier || "none", // blob9: E0008.1 x-ray cache tier
- ],
- doubles: [1, durationMs],
- indexes: [consumerLabel],
- });
- }
- })
- .catch(() => {
- // Telemetry must never break MCP requests — silently drop parse failures
+ const msg =
+ typeof payload === "object" && payload !== null
+ ? (payload as Record<string, unknown>)
+ : {};
+ const method = typeof msg.method === "string" ? msg.method : "unknown";
+
+ const eventType = toolCall ? "tool_call" : "mcp_request";
+ const toolName = toolCall?.toolName ?? "";
+ const documentUri = toolCall?.documentUri ?? "";
+
+ env.ODDKIT_TELEMETRY!.writeDataPoint({
+ blobs: [
+ eventType,
+ method,
+ toolName,
+ consumerLabel,
+ consumerSource,
+ toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
+ documentUri,
+ env.ODDKIT_VERSION || BUILD_VERSION,
+ cacheTier || "none", // blob9: E0008.1 x-ray cache tier
+ ],
+ doubles: [
+ 1, // double1: count
+ durationMs, // double2: duration_ms
+ bytesIn, // double3: bytes_in
+ bytesOut, // double4: bytes_out
+ tokensIn, // double5: tokens_in
+ tokensOut, // double6: tokens_out
+ ],
+ indexes: [consumerLabel],
});
+ }
}
// ──────────────────────────────────────────────────────────────────────────────
diff --git a/workers/src/tokenize.ts b/workers/src/tokenize.ts
new file mode 100644
--- /dev/null
+++ b/workers/src/tokenize.ts
@@ -1,0 +1,115 @@
+/**
+ * Tokenizer module for oddkit MCP Worker telemetry (E0008).
+ *
+ * Provides cl100k_base token counts for request and response payloads.
+ * cl100k_base is GPT-4's tokenizer; we use it as a tokenizer-agnostic
+ * proxy for "payload token shape," not as a billing-accurate measure
+ * for any specific consumer model.
+ *
+ * Choice of cl100k_base over @anthropic-ai/tokenizer: the cl100k bundle
+ * benchmarks ~6x faster (median 0.05–1.3ms across 200B–50KB payloads on
+ * Node v8, the same engine as Cloudflare Workers) and has dramatically
+ * better p95 (no WASM memory-grow spikes). Token counts diverge from the
+ * Claude tokenizer by ~3–4% on English prose — acceptable noise floor
+ * for shape analysis. See `klappy://canon/constraints/measure-before-you-object`
+ * for the bench methodology that drove this choice.
+ *
+ * Bundle impact: ~432 KB gzipped via the `gpt-tokenizer/encoding/cl100k_base`
+ * subpath import. Loaded via dynamic import so cold paths that don't
+ * tokenize don't pay the parse cost.
+ *
+ * Failure mode: if the tokenizer fails to load or throws on a payload,
+ * `countTokensSafe` returns null. Telemetry treats null as "not measured"
+ * and writes `0` to keep the schema dense; the absence is visible in the
+ * tokens columns being 0 alongside non-zero bytes.
+ *
+ * See: klappy://canon/constraints/telemetry-governance
+ */
+
+type CountTokensFn = (text: string) => number;
+
+let encoderPromise: Promise<CountTokensFn | null> | null = null;
+
+/**
+ * Lazily import gpt-tokenizer's cl100k_base encoder. Cached across requests
+ * via the module-level promise; the first call within a worker isolate pays
+ * the parse cost, all subsequent calls are warm.
+ */
+function getEncoder(): Promise<CountTokensFn | null> {
+ if (encoderPromise) return encoderPromise;
+
+ encoderPromise = import("gpt-tokenizer/encoding/cl100k_base")
+ .then((mod) => {
+ const fn = (mod as { countTokens?: CountTokensFn }).countTokens;
+ if (typeof fn !== "function") return null;
+ return fn;
+ })
+ .catch(() => null);
+
+ return encoderPromise;
+}
+
+/**
+ * Count cl100k_base tokens in `text`. Returns null on any failure
+ * (load failure, encoder throw, etc). Telemetry must never break MCP
+ * requests — this function never throws.
+ */
+export async function countTokensSafe(text: string): Promise<number | null> {
+ if (!text) return 0;
+ try {
+ const fn = await getEncoder();
+ if (!fn) return null;
+ return fn(text);
+ } catch {
+ return null;
+ }
+}
+
+/**
+ * Result of measuring a payload pair. All fields default to 0 on failure
+ * so the telemetry schema stays dense; the absence of a real value is
+ * encoded by tokens_in / tokens_out being 0 alongside non-zero bytes
+ * (encoder skipped or failed).
+ *
+ * Note: this struct does NOT carry a tokenize_ms field. Cloudflare Workers
+ * freezes both `performance.now()` and `Date.now()` during synchronous
+ * CPU work as a timing-side-channel mitigation — neither timer advances
+ * unless a network I/O event occurs between reads. Tokenization is pure
+ * CPU work, so any sub-request timing of it would always read 0 in
+ * production. The cost was already characterized in the bench (bench
+ * file at workers/test/tokenize.test.mjs and integration test). We keep
+ * the bytes/tokens shape and trust the bench for the per-payload cost
+ * curve.
+ */
+export interface PayloadShape {
+ bytes_in: number;
+ bytes_out: number;
+ tokens_in: number;
+ tokens_out: number;
+}
+
+/**
+ * Measure the byte and token shape of a request/response pair. Tokenization
+ * is performed once per payload using the lazy-loaded cl100k_base encoder.
+ * Bytes are measured via TextEncoder (UTF-8 byte length, the wire size).
+ */
+export async function measurePayloadShape(
+ requestText: string,
+ responseText: string,
+): Promise<PayloadShape> {
+ const encoder = new TextEncoder();
+ const bytes_in = requestText ? encoder.encode(requestText).length : 0;
+ const bytes_out = responseText ? encoder.encode(responseText).length : 0;
+
+ const [tIn, tOut] = await Promise.all([
+ countTokensSafe(requestText),
+ countTokensSafe(responseText),
+ ]);
+
+ return {
+ bytes_in,
+ bytes_out,
+ tokens_in: tIn ?? 0,
+ tokens_out: tOut ?? 0,
+ };
+}
diff --git a/workers/test/telemetry-integration.test.mjs b/workers/test/telemetry-integration.test.mjs
new file mode 100644
--- /dev/null
+++ b/workers/test/telemetry-integration.test.mjs
@@ -1,0 +1,303 @@
+#!/usr/bin/env node
+/**
+ * Integration test for the telemetry write path.
+ *
+ * Mocks env.ODDKIT_TELEMETRY with a writeDataPoint capture, then exercises
+ * recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads.
+ *
+ * Verifies end-to-end:
+ * - The full PayloadShape lands in doubles 3-6
+ * - bytes_in/out match TextEncoder UTF-8 byte length on the actual payloads
+ * - tokens_in/out are positive integers when payloads are non-empty
+ * - Batch JSON-RPC produces one data point per message
+ * - SSE simulation (responseText="") records zeros for the response side
+ * - Tool-call payloads correctly populate blob3 (tool_name)
+ * - The blob array is exactly 9 entries and the doubles array is exactly 6
+ *
+ * This is the verification that wrangler dev would have done — same code
+ * path, same schema, real tokenizer.
+ */
+
+import assert from "node:assert/strict";
+import { spawnSync } from "node:child_process";
+import { mkdtempSync, writeFileSync, symlinkSync, existsSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const WORKERS_ROOT = join(__dirname, "..");
+
+// Compile both telemetry.ts and tokenize.ts to a temp dir so we can import them
+const tmp = mkdtempSync(join(tmpdir(), "oddkit-telemetry-int-"));
+const tsconfig = {
+ compilerOptions: {
+ target: "ES2022",
+ module: "ES2022",
+ moduleResolution: "bundler",
+ lib: ["ES2022", "DOM"],
+ types: ["@cloudflare/workers-types"],
+ noEmitOnError: false,
+ strict: false,
+ skipLibCheck: true,
+ resolveJsonModule: true,
+ allowSyntheticDefaultImports: true,
+ esModuleInterop: true,
+ rootDir: join(WORKERS_ROOT, "src"),
+ outDir: join(tmp, "build"),
+ },
+ include: [
+ join(WORKERS_ROOT, "src", "tokenize.ts"),
+ join(WORKERS_ROOT, "src", "telemetry.ts"),
+ ],
+};
+const tsconfigPath = join(tmp, "tsconfig.json");
+writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2));
+
+const tmpNodeModules = join(tmp, "node_modules");
+if (!existsSync(tmpNodeModules)) {
+ symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules);
+}
+
+// telemetry.ts imports `../package.json` — symlink that too
+if (!existsSync(join(tmp, "package.json"))) {
+ symlinkSync(join(WORKERS_ROOT, "package.json"), join(tmp, "package.json"));
+}
+
+const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], {
+ encoding: "utf8",
+});
+
+// With noEmitOnError: false, tsc may exit non-zero on type errors elsewhere
+// in the dep graph (zip-baseline-fetcher.ts has some workers-types friction)
+// while still producing the .js files we need. Only bail if the files we
+// actually need weren't emitted.
+const tokenizeJs = join(tmp, "build", "tokenize.js");
+const telemetryJs = join(tmp, "build", "telemetry.js");
+if (!existsSync(tokenizeJs) || !existsSync(telemetryJs)) {
+ console.error("TypeScript compile failed (target files not emitted):");
+ console.error(compile.stdout);
+ console.error(compile.stderr);
+ process.exit(1);
+}
+if (compile.status !== 0 && process.env.DEBUG) {
+ console.error("Note: tsc reported errors but target .js files were emitted:");
+ console.error(compile.stdout);
+}
+
+// Newer Node requires `with { type: "json" }` on JSON imports in ESM.
+// TypeScript doesn't add this — patch it in.
+const { readFileSync, writeFileSync: wf } = await import("node:fs");
+let telemetrySrc = readFileSync(telemetryJs, "utf8");
+telemetrySrc = telemetrySrc.replace(
+ /from ["']\.\.\/package\.json["'];/g,
+ 'from "../package.json" with { type: "json" };',
+);
+wf(telemetryJs, telemetrySrc);
+
+const { measurePayloadShape } = await import(tokenizeJs);
+const { recordTelemetry } = await import(telemetryJs);
+
+// ─── Mock env with writeDataPoint capture ──────────────────────────────────
+
+class MockAnalyticsEngine {
+ constructor() {
+ this.writes = [];
+ }
+ writeDataPoint(point) {
+ this.writes.push(point);
+ }
+}
+
+function mockEnv() {
+ return {
+ ODDKIT_TELEMETRY: new MockAnalyticsEngine(),
+ DEFAULT_KNOWLEDGE_BASE_URL: "https://raw.githubusercontent.com/klappy/klappy.dev/main",
+ ODDKIT_VERSION: "0.23.1-test",
+ };
+}
+
+function mockRequest(consumerLabel = "integration-test") {
+ return new Request(`https://oddkit.klappy.dev/mcp?consumer=${consumerLabel}`, {
+ method: "POST",
+ headers: { "Content-Type": "application/json" },
+ });
+}
+
+let pass = 0;
+let fail = 0;
+
+async function test(name, fn) {
+ try {
+ await fn();
+ console.log(` \u2713 ${name}`);
+ pass++;
+ } catch (err) {
+ console.log(` \u2717 ${name}`);
+ console.log(` ${err.message}`);
+ if (err.stack && process.env.DEBUG) console.log(err.stack);
+ fail++;
+ }
+}
+
+console.log("telemetry integration tests (full write path)\n");
+
+// ─── Test 1: oddkit_time tool call ─────────────────────────────────────────
+
+await test("oddkit_time tool call lands a complete telemetry record", async () => {
+ const env = mockEnv();
+ const requestBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 1,
+ method: "tools/call",
+ params: { name: "oddkit_time", arguments: {} },
+ });
+ const responseBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 1,
+ result: {
+ content: [
+ { type: "text", text: "Current UTC time: 2026-04-23T19:30:00.000Z" },
+ ],
+ },
+ });
+
+ const shape = await measurePayloadShape(requestBody, responseBody);
+ recordTelemetry(mockRequest(), requestBody, env, 42, "memory", shape);
+
+ assert.equal(env.ODDKIT_TELEMETRY.writes.length, 1, "should write 1 data point");
+ const point = env.ODDKIT_TELEMETRY.writes[0];
+
+ // Schema shape
+ assert.equal(point.blobs.length, 9, `blobs should be 9, got ${point.blobs.length}`);
+ assert.equal(point.doubles.length, 6, `doubles should be 6, got ${point.doubles.length}`);
+ assert.equal(point.indexes.length, 1, "indexes should be 1");
+
+ // Blobs
+ assert.equal(point.blobs[0], "tool_call", "blob1 = event_type");
+ assert.equal(point.blobs[1], "tools/call", "blob2 = method");
+ assert.equal(point.blobs[2], "oddkit_time", "blob3 = tool_name");
+ assert.equal(point.blobs[3], "integration-test", "blob4 = consumer_label");
+ assert.equal(point.blobs[4], "query-param", "blob5 = consumer_source");
+ assert.equal(point.blobs[7], "0.23.1-test", "blob8 = worker_version");
+ assert.equal(point.blobs[8], "memory", "blob9 = cache_tier");
+
+ // Doubles
+ assert.equal(point.doubles[0], 1, "double1 = count");
+ assert.equal(point.doubles[1], 42, "double2 = duration_ms");
+ assert.equal(point.doubles[2], shape.bytes_in, "double3 = bytes_in");
+ assert.equal(point.doubles[3], shape.bytes_out, "double4 = bytes_out");
+ assert.equal(point.doubles[4], shape.tokens_in, "double5 = tokens_in");
+ assert.equal(point.doubles[5], shape.tokens_out, "double6 = tokens_out");
+
+ console.log(` bytes_in=${shape.bytes_in} bytes_out=${shape.bytes_out} ` +
+ `tokens_in=${shape.tokens_in} tokens_out=${shape.tokens_out}`);
+});
+
+// ─── Test 2: oddkit_search with realistic large response ───────────────────
+
+await test("oddkit_search with realistic ~8KB response — measurements are sane", async () => {
+ const env = mockEnv();
+ const requestBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 2,
+ method: "tools/call",
+ params: { name: "oddkit", arguments: { action: "search", input: "telemetry tokens payload" } },
+ });
+ const snippet = "Telemetry exists to make decisions informed instead of blind. " +
+ "Not to profile users, not to feed a roadmap. ";
+ const responseBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 2,
+ result: {
+ content: [{ type: "text", text: snippet.repeat(80) }],
+ },
+ });
+
+ const shape = await measurePayloadShape(requestBody, responseBody);
+ recordTelemetry(mockRequest("realistic-test"), requestBody, env, 215, "r2", shape);
+
+ const point = env.ODDKIT_TELEMETRY.writes[0];
+ assert.equal(point.blobs[2], "oddkit", "tool_name = oddkit (router)");
+
+ // Realistic-sized response should be measurable
+ assert.ok(shape.bytes_out > 5000, `bytes_out should be > 5000, got ${shape.bytes_out}`);
+ assert.ok(shape.tokens_out > 1000, `tokens_out should be > 1000, got ${shape.tokens_out}`);
+
+ console.log(` bytes_out=${shape.bytes_out} (~${(shape.bytes_out/1024).toFixed(1)}KB) ` +
+ `tokens_out=${shape.tokens_out}`);
+});
+
+// ─── Test 3: SSE response (empty body) records zeros ───────────────────────
+
+await test("SSE response (empty body) records bytes_out=0 and tokens_out=0", async () => {
+ const env = mockEnv();
+ const requestBody = JSON.stringify({
+ jsonrpc: "2.0",
+ id: 3,
+ method: "tools/call",
+ params: { name: "oddkit_orient", arguments: { input: "exploring telemetry" } },
+ });
+ // Simulating the call site path where Content-Type was not application/json
+ const shape = await measurePayloadShape(requestBody, "");
+ recordTelemetry(mockRequest(), requestBody, env, 50, "memory", shape);
+
+ const point = env.ODDKIT_TELEMETRY.writes[0];
+ assert.equal(point.doubles[3], 0, "bytes_out should be 0 for empty response");
+ assert.equal(point.doubles[5], 0, "tokens_out should be 0 for empty response");
+ assert.ok(point.doubles[2] > 0, "bytes_in should still be > 0");
+});
+
+// ─── Test 4: Batch JSON-RPC writes one point per message ───────────────────
+
+await test("batch JSON-RPC produces one data point per message", async () => {
+ const env = mockEnv();
+ const batch = [
+ { jsonrpc: "2.0", id: 1, method: "tools/call", params: { name: "oddkit_time", arguments: {} } },
+ { jsonrpc: "2.0", id: 2, method: "tools/call", params: { name: "oddkit_orient", arguments: { input: "x" } } },
+ { jsonrpc: "2.0", id: 3, method: "tools/list" },
+ ];
+ const requestBody = JSON.stringify(batch);
+ const responseBody = JSON.stringify(batch.map(m => ({ jsonrpc: "2.0", id: m.id, result: { ok: true } })));
+
+ const shape = await measurePayloadShape(requestBody, responseBody);
+ recordTelemetry(mockRequest(), requestBody, env, 30, "cache", shape);
+
+ assert.equal(env.ODDKIT_TELEMETRY.writes.length, 3, `should write 3 data points, got ${env.ODDKIT_TELEMETRY.writes.length}`);
+ assert.equal(env.ODDKIT_TELEMETRY.writes[0].blobs[2], "oddkit_time");
+ assert.equal(env.ODDKIT_TELEMETRY.writes[1].blobs[2], "oddkit_orient");
+ assert.equal(env.ODDKIT_TELEMETRY.writes[2].blobs[1], "tools/list");
+ assert.equal(env.ODDKIT_TELEMETRY.writes[2].blobs[2], "", "tools/list has no tool_name");
+
+ // All 3 messages get the same payload-shape attribution (per-request, not per-message)
+ for (const w of env.ODDKIT_TELEMETRY.writes) {
+ assert.equal(w.doubles[2], shape.bytes_in);
+ assert.equal(w.doubles[3], shape.bytes_out);
+ }
+});
+
+// ─── Test 5: Malformed JSON-RPC gets dropped silently ──────────────────────
+
+await test("malformed JSON-RPC is silently dropped (telemetry never throws)", async () => {
+ const env = mockEnv();
+ // Pass garbage as the "body" — recordTelemetry should swallow the parse error
+ const requestBody = "not valid json {{{";
+ const shape = await measurePayloadShape(requestBody, "ok");
+
+ // Should not throw
+ recordTelemetry(mockRequest(), requestBody, env, 10, "none", shape);
+ assert.equal(env.ODDKIT_TELEMETRY.writes.length, 0, "should not write anything for malformed input");
+});
+
+// ─── Test 6: No env.ODDKIT_TELEMETRY → graceful no-op ──────────────────────
+
+await test("missing env.ODDKIT_TELEMETRY is a graceful no-op", async () => {
+ const env = {}; // no ODDKIT_TELEMETRY
+ const requestBody = JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/list" });
+ const shape = await measurePayloadShape(requestBody, "{}");
+ // Should not throw
+ recordTelemetry(mockRequest(), requestBody, env, 5, "memory", shape);
+});
+
+console.log(`\n${pass} passed, ${fail} failed`);
+process.exit(fail > 0 ? 1 : 0);
diff --git a/workers/test/tokenize.test.mjs b/workers/test/tokenize.test.mjs
new file mode 100644
--- /dev/null
+++ b/workers/test/tokenize.test.mjs
@@ -1,0 +1,128 @@
+#!/usr/bin/env node
+/**
+ * Unit test for workers/src/tokenize.ts.
+ *
+ * Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports the
+ * compiled .js. The compile step exercises the same TypeScript surface
+ * that ships in the worker bundle.
+ */
+
+import assert from "node:assert/strict";
+import { spawnSync } from "node:child_process";
+import { mkdtempSync, writeFileSync, symlinkSync, existsSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const WORKERS_ROOT = join(__dirname, "..");
+const TOKENIZE_TS = join(WORKERS_ROOT, "src", "tokenize.ts");
+
+const tmp = mkdtempSync(join(tmpdir(), "oddkit-tokenize-test-"));
+const tsconfig = {
+ compilerOptions: {
+ target: "ES2022",
+ module: "ES2022",
+ moduleResolution: "bundler",
+ lib: ["ES2022", "DOM"],
+ types: [],
+ strict: false,
+ skipLibCheck: true,
+ resolveJsonModule: true,
+ allowSyntheticDefaultImports: true,
+ esModuleInterop: true,
+ rootDir: join(WORKERS_ROOT, "src"),
+ outDir: tmp,
+ },
+ include: [TOKENIZE_TS],
+};
+const tsconfigPath = join(tmp, "tsconfig.json");
+writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2));
+
+const tmpNodeModules = join(tmp, "node_modules");
+if (!existsSync(tmpNodeModules)) {
+ symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules);
+}
+
+const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], {
+ encoding: "utf8",
+});
+if (compile.status !== 0) {
+ console.error("TypeScript compile failed:");
+ console.error(compile.stdout);
+ console.error(compile.stderr);
+ process.exit(1);
+}
+
+const compiledPath = join(tmp, "tokenize.js");
+const { countTokensSafe, measurePayloadShape } = await import(compiledPath);
+
+let pass = 0;
+let fail = 0;
+
+async function test(name, fn) {
+ try {
+ await fn();
+ console.log(` \u2713 ${name}`);
+ pass++;
+ } catch (err) {
+ console.log(` \u2717 ${name}`);
+ console.log(` ${err.message}`);
+ fail++;
+ }
+}
+
+console.log("tokenize.ts unit tests");
+
+await test("countTokensSafe returns 0 for empty string", async () => {
+ const n = await countTokensSafe("");
+ assert.equal(n, 0);
+});
+
+await test("countTokensSafe returns a positive integer for normal text", async () => {
+ const n = await countTokensSafe("hello world this is a test");
+ assert.equal(typeof n, "number");
+ assert.ok(n > 0, `expected > 0, got ${n}`);
+ assert.equal(n, Math.floor(n), "must be an integer");
+});
+
+await test("countTokensSafe scales with text length", async () => {
+ const small = await countTokensSafe("hello world");
+ const big = await countTokensSafe("hello world ".repeat(100));
+ assert.ok(big > small * 50, `big (${big}) should be much larger than small (${small})`);
+});
+
+await test("measurePayloadShape returns all required fields as numbers", async () => {
+ const s = await measurePayloadShape("request", "response");
+ for (const field of ["bytes_in", "bytes_out", "tokens_in", "tokens_out"]) {
+ assert.ok(field in s, `missing field: ${field}`);
+ assert.equal(typeof s[field], "number", `${field} must be number, got ${typeof s[field]}`);
+ }
+});
+
+await test("measurePayloadShape bytes match UTF-8 byte length", async () => {
+ const req = "hello"; // 5 bytes
+ const res = "caf\u00e9"; // 4 chars, 5 UTF-8 bytes (\u00e9 = 2 bytes)
+ const s = await measurePayloadShape(req, res);
+ assert.equal(s.bytes_in, 5, `bytes_in: expected 5, got ${s.bytes_in}`);
... diff truncated: showing 800 of 821 linesYou can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 279f761. Configure here.
Fourth smoke confirmed bytes_in/out and tokens_in/out work in production (357-21319 bytes_out, 142-5398 tokens_out across varied payload sizes). But tokenize_ms remained 0 across every row even with the Date.now() fix from 279f761. Root cause discovered by the agent: Cloudflare Workers freezes BOTH performance.now() AND Date.now() during synchronous CPU work. Both timers only advance on network I/O events as a side-channel mitigation (documented at developers.cloudflare.com/workers/runtime-apis/web-standards/). Tokenization is pure CPU work, so any sub-request timing of it always reads 0 in production. This is a structural runtime constraint, not a bug we can patch. Workarounds considered and rejected: - Force artificial I/O between reads (KV.list, fetch) — adds real latency to telemetry-only paths, grotesque - Two writeDataPoint calls with start/end timestamps — over-engineered, doubles write count, complicates queries - Keep the column as always-0 — actively misleading Decision: drop tokenize_ms entirely from PayloadShape, the doubles array, schema doc, and tests. The bench at workers/test/tokenize.test.mjs already characterized the cost curve (cl100k handles 50 KB in ~1.3 ms on Node v22). Bytes_out + tokens_out are sufficient signal — a future maintainer can predict tokenize_ms from the bench curve given the observed payload sizes. Schema before: doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out, tokenize_ms] // 7 fields Schema after: doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out] // 6 fields Companion canon update at klappy/klappy.dev coming in next commit on that branch — drops tokenize_ms row from the doubles table and removes the tokenize_ms mention in 'What This Enables'. Methodology: this is the fourth Workers Runtime != Node behavioral diff caught by live smoke on this branch. Each was unmeasurable from unit tests because Node behaves differently: 1. b94aaa6 (mine, broken): Content-Type filter (MCP returns SSE) 2. 1a555df (mine, broken): clone in waitUntil (body already drained) 3. 279f761 (mine, broken): Date.now() in Workers (frozen too) 4. THIS: drop the unmeasurable column entirely The release-validation-gate canon doc is the only thing that surfaced each of these — the live preview smoke + telemetry_public SQL caught what no test setup I could ship would have caught. The Workers-runtime gap was real and the gate worked. Tests: - 7/7 unit tests pass (workers/test/tokenize.test.mjs) - 6/6 integration tests pass (workers/test/telemetry-integration.test.mjs) - typecheck clean
Live smoke against the preview confirmed tokenize_ms always reads 0 in production, even after switching from performance.now() to Date.now(). Cloudflare Workers freezes BOTH timers between network I/O events as a side-channel mitigation. Tokenization is pure CPU work, so any sub-request timing of it is structurally unmeasurable from inside a Worker request handler. Two changes: 1. Doubles table now ends at row 6 (tokens_out). New 'Why no tokenize_ms' subsection explains the runtime constraint and points to the bench file (workers/test/tokenize.test.mjs) as the characterization of cost-per-payload-size. 2. 'What This Enables' loses the 'Tokenization cost in production' bullet. The bench-vs-prod comparison story ends at bytes_out and tokens_out — the cost curve is known from the bench, and prod payload sizes feed back into that curve to predict per-call cost. Companion code commit: klappy/oddkit#1348153745. This is the fourth Workers Runtime != Node behavioral diff caught by live smoke on this branch (after Content-Type filter, body-stream consumption timing, performance.now(), and now Date.now()). The release-validation-gate canon doc earned its keep all four times.
Minor bump for payload-shape telemetry (PR #134). Bumps: package.json 0.23.1 -> 0.24.0 workers/package.json 0.23.1 -> 0.24.0 package-lock.json 0.23.0 -> 0.24.0 (root drifted one release behind) workers/package-lock.json 0.23.1 -> 0.24.0 CHANGELOG.md gains the [0.24.0] entry above [0.23.1] documenting: - Added: bytes_in/out, tokens_in/out telemetry doubles + helpers - Changed: drop the Content-Type filter (MCP responses are SSE) - Removed: tokenize_ms — Workers freezes both perf.now and Date.now - Fixed: root package-lock.json version drift back-fill The four Workers Runtime != Node behavioral diffs caught by the five Managed Agent smoke sessions on this branch are listed in the Refs trailer for forensic record. Tests: 7/7 unit + 6/6 integration pass on bumped state. Typecheck clean (reports as oddkit-mcp-worker@0.24.0). Per workflow: dedicated chore/release-x.y.z PR. Branch is off feat/telemetry-tokenization HEAD, so it carries the feature commits + the bump together. After merge, feat/telemetry-tokenization can be closed (its commits are already in main via this release branch).

Adds payload-shape instrumentation to MCP telemetry. New doubles 3–7 capture wire size and cl100k_base token counts for every request and response, plus the wall-clock cost of tokenization itself.
What changed
workers/src/tokenize.ts(new) — lazy-loaded gpt-tokenizer/cl100k_base wrapper. Module-level singleton encoder amortizes parse cost across requests within a worker isolate. Safe-failure surface (countTokensSafe,measurePayloadShape) — telemetry must never break MCP requests.workers/src/telemetry.ts— schema doc updated for doubles 3–7.recordTelemetrysignature refactored to take a pre-read body string + optionalPayloadShape. Now synchronous (caller's measurement work happens insidewaitUntil).workers/src/index.ts— call site clones the response (whenContent-Type: application/json), reads request and response bodies inwaitUntil, callsmeasurePayloadShape, thenrecordTelemetry. Zero user-facing latency added — measurement happens after the response is sent. SSE responses skip body measurement (zeros recorded).workers/package.json— addsgpt-tokenizer ^3.0.0.workers/test/tokenize.test.mjs(new) — 8 unit tests, all passing.Tokenizer choice — measured, not assumed
gpt-tokenizer/encoding/cl100k_baseover@anthropic-ai/tokenizer. Empirical bench on Node v22 (same V8 as Workers):p95 was dramatically more predictable for cl100k (no WASM memory-grow spikes). Token count diverges ~3–4% from Claude's tokenizer on English prose — acceptable noise floor for shape analysis since we are measuring payload trends, not billing.
Bundle delta measured empirically via esbuild: 432 KB gzipped (993 KB minified). Comfortably within paid-tier Workers limits.
A/B against the bench
The whole point of shipping this is to confirm bench predictions against real prod numbers. After deploy, the
tokenize_msdouble is queryable viatelemetry_public. Suggested query for the comparison:Once 24 hours of prod data lands, we'll know whether the bench was right.
Failure handling
Any tokenizer load or encode failure →
countTokensSafereturnsnull, treated as0in telemetry.tokenize_ms = 0alongside non-zerobytes_in/outsignals a measurement skip in the data. All measurement code is wrapped intry/catchwithin thewaitUntilblock.Methodology note
This PR exists because three theoretical objections (bundle bloat, vodka-architecture violation, tokenizer-choice as domain opinion) were falsified by a five-minute Node bench. See drafts pending merge into klappy.dev:
klappy://canon/constraints/measure-before-you-objectklappy://canon/observations/performed-prudence-anti-patternCompanion canon update
A separate PR to
klappy/klappy.devwill updatecanon/constraints/telemetry-governance.mdto document the new doubles and the tokenizer-choice rationale.Verification
npm run typecheckcleannode workers/test/tokenize.test.mjs— 8/8 passinghttps://feat-telemetry-tokenization-oddkit.klappy.workers.dev/mcpper the branch-preview pattern🤖 Drafted by Claude. Per
klappy://canon/decisions/models-do-not-mutate-canon— review and merge when ready.Note
Medium Risk
Adds new Analytics Engine telemetry doubles (bytes/tokens in/out) and changes the telemetry write path to clone/read request+response bodies asynchronously, which could affect observability correctness and add CPU/bundle overhead in the Worker despite being wrapped in
waitUntil/try-catch.Overview
Adds payload-shape telemetry to the MCP Worker by recording
bytes_in,bytes_out,tokens_in, andtokens_outasdouble3–double6on each telemetry datapoint, using a newworkers/src/tokenize.tshelper backed by lazy-loadedgpt-tokenizer(cl100k).Refactors
recordTelemetryto accept a pre-read request body string plus optionalPayloadShape, and updates the MCP handler to synchronouslyclone()the response before returning, then measure request/response bodies insidectx.waitUntil(no Content-Type gating, to cover SSE responses) before writing telemetry.Bumps versions to
0.24.0, syncs bothpackage-lock.jsonfiles, adds thegpt-tokenizerdependency, and introduces new unit + integration tests covering token/byte measurement and telemetry schema writes.Reviewed by Cursor Bugbot for commit d023ad6. Bugbot is set up for automated code reviews on this repo. Configure here.