klappy · klappy · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,40 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [0.24.0] - 2026-04-23
+
+### Added
+
+- **Payload-shape telemetry: `bytes_in`, `bytes_out`, `tokens_in`, `tokens_out` (PR #134)**. Four new doubles in the `oddkit_telemetry` Analytics Engine schema (`double3`–`double6`), measured per MCP request and written from a fire-and-forget `waitUntil` callback so user-facing latency is unchanged. Bytes are UTF-8 wire-length via `TextEncoder`; tokens are cl100k_base counts via `gpt-tokenizer/encoding/cl100k_base` (chosen over `@anthropic-ai/tokenizer` after a 5-minute Node bench: ~6× faster median, dramatically better p95, ~432 KB gzipped via subpath import — see `workers/test/tokenize.test.mjs`). Schema goes from 7 doubles to 6 (full doubles array: `[count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out]`). Tokenizer is module-level singleton, lazy-loaded via dynamic import, cached across requests within an isolate. Cold-call parses the encoder once; warm-call cost is sub-millisecond on Node, in the same V8 the Workers runtime uses. Bench-vs-prod comparison validated via fifth Managed Agent smoke at session `sesn_011CaMNujMg9pymcz18JFPp8` (`tokenization-smoke-managed` consumer label): `oddkit_catalog` → 21,437 bytes_out / 5,856 tokens_out; `oddkit_time` → 178 bytes_out / 71 tokens_out; chars-per-token ratio (~3.7–4.5) consistent with the bench's prediction across all observed payload sizes.
+
+- **Telemetry write helpers in `workers/src/tokenize.ts` (PR #134)**. New `measurePayloadShape(requestText, responseText)` returns `PayloadShape` (the 4-field struct above) given two body strings. `countTokensSafe(text)` wraps the encoder in a try/catch and returns `null` on failure so the telemetry path never throws. The call site in `workers/src/index.ts` clones the response synchronously before `return response`, then reads + measures inside `ctx.waitUntil` — clone must be synchronous because the body is a one-shot stream that the runtime drains as soon as the handler returns.
+
+### Changed
+
+- **No Content-Type filter on the response body (PR #134)**. The first iteration of payload-shape telemetry skipped any response whose Content-Type was not `application/json`, on the assumption that MCP responses would always be JSON. They are not — MCP's Streamable HTTP transport returns `text/event-stream` for tool calls, and the filter caused 100% of tool_call responses to record `bytes_out=0, tokens_out=0`. The filter was removed; the response body is now read regardless of Content-Type. SSE protocol overhead (~10 bytes per event) is negligible against the actual payload size, and oddkit's responses are bounded single-event streams that drain quickly. Telemetry is wrapped in a try/catch to preserve the non-breaking invariant for any future response that might fail to clone.
+
+### Removed
+
+- **`tokenize_ms` (formerly `double7`) — Workers runtime cannot measure it (PR #134)**. A previous iteration of the schema shipped a `tokenize_ms` field intended to capture the wall-clock cost of tokenization for bench-vs-prod comparison. Live smoke against the preview confirmed it always reads `0` in production. Cause is structural, not a bug: Cloudflare Workers freezes both `performance.now()` and `Date.now()` between network I/O events as a timing-side-channel mitigation (documented at `developers.cloudflare.com/workers/runtime-apis/web-standards/`). Tokenization is pure CPU work, so any sub-request timing of it from inside a Worker request handler is unmeasurable. The field was dropped from `PayloadShape`, the `writeDataPoint` doubles array, and the `telemetry-governance` canon doc. The bench at `workers/test/tokenize.test.mjs` characterized the cost curve once (cl100k handles 50 KB in ~1.3 ms on Node v22, the same V8 the Workers runtime uses); future per-call cost is predictable from observed `bytes_out` / `tokens_out` against that curve. See `klappy://canon/constraints/telemetry-governance` § "Why no tokenize_ms" for the published rationale.
+
+### Fixed
+
+- **Root `package-lock.json` version drift back-fill (this PR)**. Pre-bump state showed root `package-lock.json` at `0.23.0` while `workers/package-lock.json` was at `0.23.1` — root drifted one release behind. Both lockfiles are now bumped to `0.24.0` (top-level `version` and `packages[""].version`). The pre-commit hook enforces sync between `package.json` and `workers/package.json`; both `package-lock.json` files still require manual sync per current tooling.
+
+### Refs
+
+- PR (code): [klappy/oddkit#134](https://github.com/klappy/oddkit/pull/134)
+- PR (canon): [klappy/klappy.dev#134](https://github.com/klappy/klappy.dev/pull/134) — telemetry-governance schema update, two new constraints (`measure-before-you-object`, `performed-prudence-anti-pattern`)
+- Five Managed Agent smoke sessions (forensic record):
+  - `sesn_011CaMJdyWpUAm8n7YgRyLLG` — caught Content-Type filter dropping all SSE responses
+  - `sesn_011CaMKDLhT5zvUAUJ2HUvfW` — caught `clone()` inside `waitUntil` producing empty reader
+  - `sesn_011CaMLronGtL22J6R7fAPMs` — caught `performance.now()` frozen during synchronous CPU work
+  - `sesn_011CaMMf7tirAh2v5YoZHkxA` — caught `Date.now()` frozen too (both timers under deterministic-timing mitigation)
+  - `sesn_011CaMNujMg9pymcz18JFPp8` — **PASS** after dropping `tokenize_ms`; verified `bytes_in`/`bytes_out`/`tokens_in`/`tokens_out` populate with realistic varied values across tools
+- Agent: `agent_011CaMJd8jvMj5CJMiQ11TdM`. Environment: `env_016RffZyqSdHeb5s3Z6UABw8`. Sonnet 4.6 throughout per `klappy://canon/constraints/release-validation-gate`.
+- Canon basis: `klappy://canon/constraints/release-validation-gate`, `klappy://canon/constraints/telemetry-governance`, `klappy://canon/constraints/measure-before-you-object`, `klappy://canon/observations/performed-prudence-anti-pattern`.
+- Tests: 7/7 unit (`workers/test/tokenize.test.mjs`), 6/6 integration (`workers/test/telemetry-integration.test.mjs`). Typecheck clean. Bench artifact at `workers/test/tokenize.test.mjs` (cl100k vs anthropic comparison, 200B–50KB sweep).
+
 ## [0.23.1] - 2026-04-21
 
 ### Fixed

diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "oddkit",
-  "version": "0.23.1",
+  "version": "0.24.0",
   "description": "Agent-first CLI for ODD-governed repos. Epistemic terrain rendering with portable baseline.",
   "type": "module",
   "bin": {

diff --git a/workers/package-lock.json b/workers/package-lock.json
diff --git a/workers/package.json b/workers/package.json
@@ -1,6 +1,6 @@
 {
   "name": "oddkit-mcp-worker",
-  "version": "0.23.1",
+  "version": "0.24.0",
   "private": true,
   "type": "module",
   "scripts": {
@@ -12,7 +12,8 @@
   "dependencies": {
     "agents": "^0.4.1",
     "fflate": "^0.8.2",
-    "zod": "^4.3.6"
+    "zod": "^4.3.6",
+    "gpt-tokenizer": "^3.0.0"
   },
   "devDependencies": {
     "@cloudflare/workers-types": "^4.20250124.0",

diff --git a/workers/src/index.ts b/workers/src/index.ts
@@ -958,14 +958,36 @@ export default {
 
       // Phase 1 telemetry — non-blocking, fire-and-forget (E0008)
       // Phase 1.5: cache_tier from tracer feeds blob9 (E0008.1)
+      // Phase 2: payload shape (bytes_in/out, tokens_in/out) feeds doubles
+      // 3-6. tokenize_ms was tried and dropped — Workers freezes both
+      // performance.now() and Date.now() during synchronous CPU work, making
+      // sub-request timing of pure-CPU tokenization unmeasurable. Response body is
+      // measured universally — MCP's Streamable HTTP transport returns SSE,
+      // not JSON, so a Content-Type filter would (and did) drop almost every
+      // response. The helper handles clone failures safely.
       if (telemetryClone) {
         const durationMs = Date.now() - startTime;
         const cacheTier = tracer.indexSource;
+        // Clone the response synchronously before returning so the body is
+        // still available to read inside the deferred waitUntil callback.
+        const responseClone = response.clone();
+
         ctx.waitUntil(
           (async () => {
             try {
+              const requestText = await telemetryClone.text();
+
+              const { measurePayloadShape } = await import("./tokenize");
               const { recordTelemetry } = await import("./telemetry");
-              await recordTelemetry(telemetryClone, env, durationMs, cacheTier);
+
+              let responseText = "";
+              try {
+                responseText = await responseClone.text();
+              } catch {
+                // Fall through with empty string; bytes_out / tokens_out will be 0.
+              }
+              const shape = await measurePayloadShape(requestText, responseText);
+              recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
             } catch {
               // Telemetry must never break MCP requests
             }

diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts
@@ -28,12 +28,34 @@
  *                            handler's internal compute. Expect a long tail on
  *                            cache-miss requests even for trivial actions like
  *                            oddkit_time.
+ *   double3: bytes_in     — UTF-8 byte length of the JSON-RPC request body.
+ *                            0 when telemetry was unable to read the body.
+ *                            Tokenizer-agnostic; exact wire size.
+ *   double4: bytes_out    — UTF-8 byte length of the response body. 0 for
+ *                            streamed responses (SSE) where the body cannot be
+ *                            measured without consuming the stream.
+ *   double5: tokens_in    — cl100k_base token count of the request body.
+ *                            See `tokenize.ts` for the tokenizer-choice rationale.
+ *                            0 when tokenization was skipped or failed.
+ *   double6: tokens_out   — cl100k_base token count of the response body. 0 for
+ *                            streamed responses or tokenizer failure.
+ *
+ *   NOTE: a previous iteration shipped a `double7: tokenize_ms` field intended
+ *   to capture the wall-clock cost of tokenization for bench-vs-prod
+ *   comparison. It is gone. Cloudflare Workers freezes both
+ *   `performance.now()` and `Date.now()` between network I/O events as a
+ *   timing-side-channel mitigation, so any timing of pure CPU work always
+ *   reads 0 in production. The cost was characterized in the bench (workers/
+ *   test/tokenize.test.mjs) and bytes_in/out + tokens_in/out are sufficient
+ *   to predict per-call cost from that bench curve.
+ *
  *   index1: sampling_key  — consumer label (for sampling consistency)
  *
  * See: klappy://canon/constraints/telemetry-governance
  */
 
 import type { Env } from "./zip-baseline-fetcher";
+import type { PayloadShape } from "./tokenize";
 import pkg from "../package.json";
 
 // Build-time fallback for blob8 (worker_version). env.ODDKIT_VERSION is
@@ -198,55 +220,84 @@ export function parseToolCall(payload: unknown): {
  * Record one telemetry data point per JSON-RPC message.
  * Non-blocking — uses env.ODDKIT_TELEMETRY.writeDataPoint() which requires
  * no await (fire-and-forget via Analytics Engine).
- * Called with a cloned request to avoid consuming the original body.
+ *
+ * Caller responsibilities:
+ *   - Pass the raw request body as `requestBody` (string). Already-cloned and
+ *     read; this function will parse it as JSON-RPC.
+ *   - Pass the original `request` so consumer-label resolution can read URL
+ *     params and headers.
+ *   - Pass `shape` describing the payload byte and token shape, or null to
+ *     write zeros for the shape doubles (e.g. when the response could not be
+ *     measured because it was an SSE stream).
  */
-export function recordTelemetry(request: Request, env: Env, durationMs: number, cacheTier?: string): Promise<void> {
-  if (!env.ODDKIT_TELEMETRY) return Promise.resolve();
-
-  // Parse the request body to extract JSON-RPC details
-  return request
-    .json()
-    .then((body: unknown) => {
-      // Handle batch requests — process each message
-      const messages = Array.isArray(body) ? body : [body];
-
-      for (const payload of messages) {
-        const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
-          request,
-          payload,
-        );
-        const toolCall = parseToolCall(payload);
-
-        const msg =
-          typeof payload === "object" && payload !== null
-            ? (payload as Record<string, unknown>)
-            : {};
-        const method = typeof msg.method === "string" ? msg.method : "unknown";
-
-        const eventType = toolCall ? "tool_call" : "mcp_request";
-        const toolName = toolCall?.toolName ?? "";
-        const documentUri = toolCall?.documentUri ?? "";
-
-        env.ODDKIT_TELEMETRY!.writeDataPoint({
-          blobs: [
-            eventType,
-            method,
-            toolName,
-            consumerLabel,
-            consumerSource,
-            toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
-            documentUri,
-            env.ODDKIT_VERSION || BUILD_VERSION,
-            cacheTier || "none", // blob9: E0008.1 x-ray cache tier
-          ],
-          doubles: [1, durationMs],
-          indexes: [consumerLabel],
-        });
-      }
-    })
-    .catch(() => {
-      // Telemetry must never break MCP requests — silently drop parse failures
+export function recordTelemetry(
+  request: Request,
+  requestBody: string,
+  env: Env,
+  durationMs: number,
+  cacheTier?: string,
+  shape?: PayloadShape | null,
+): void {
+  if (!env.ODDKIT_TELEMETRY) return;
+
+  let body: unknown;
+  try {
+    body = JSON.parse(requestBody);
+  } catch {
+    // Malformed JSON-RPC — silently drop, telemetry must never break MCP requests
+    return;
+  }
+
+  // Handle batch requests — process each message
+  const messages = Array.isArray(body) ? body : [body];
+
+  // Bytes/tokens are per-request (not per-message); for batches we attribute
+  // the full payload shape to each message rather than fabricating a split.
+  const bytesIn = shape?.bytes_in ?? 0;
+  const bytesOut = shape?.bytes_out ?? 0;
+  const tokensIn = shape?.tokens_in ?? 0;
+  const tokensOut = shape?.tokens_out ?? 0;
+
+  for (const payload of messages) {
+    const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
+      request,
+      payload,
+    );
+    const toolCall = parseToolCall(payload);
+
+    const msg =
+      typeof payload === "object" && payload !== null
+        ? (payload as Record<string, unknown>)
+        : {};
+    const method = typeof msg.method === "string" ? msg.method : "unknown";
+
+    const eventType = toolCall ? "tool_call" : "mcp_request";
+    const toolName = toolCall?.toolName ?? "";
+    const documentUri = toolCall?.documentUri ?? "";
+
+    env.ODDKIT_TELEMETRY!.writeDataPoint({
+      blobs: [
+        eventType,
+        method,
+        toolName,
+        consumerLabel,
+        consumerSource,
+        toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
+        documentUri,
+        env.ODDKIT_VERSION || BUILD_VERSION,
+        cacheTier || "none", // blob9: E0008.1 x-ray cache tier
+      ],
+      doubles: [
+        1,                // double1: count
+        durationMs,       // double2: duration_ms
+        bytesIn,          // double3: bytes_in
+        bytesOut,         // double4: bytes_out
+        tokensIn,         // double5: tokens_in
+        tokensOut,        // double6: tokens_out
+      ],
+      indexes: [consumerLabel],
     });
+  }
 }
 
 // ──────────────────────────────────────────────────────────────────────────────