Conversation
…-tokenizer Adds payload-shape instrumentation to MCP telemetry. New doubles 3-7 capture wire size and cl100k_base token counts for every request and response, plus the wall-clock cost of tokenization itself. Implementation: - New module workers/src/tokenize.ts wraps gpt-tokenizer/encoding/cl100k_base with a lazy-loaded singleton encoder and a safe-failure surface (countTokensSafe, measurePayloadShape). Module-level promise caches the encoder across requests within a worker isolate; cold path pays parse once, all subsequent calls are warm. - Refactors workers/src/telemetry.ts recordTelemetry signature to accept a pre-read body string + optional PayloadShape rather than reading the request body itself. Schema doc comment expanded to describe doubles 3-7. Synchronous now (no longer returns a Promise) since the callers measurement work happens in waitUntil. - Updates workers/src/index.ts call site: clones the response (when Content-Type is application/json), reads request and response bodies in the waitUntil background task, calls measurePayloadShape, then recordTelemetry. Zero user-facing latency added — measurement happens after the response is sent. SSE responses skip body measurement. Tokenizer choice: - gpt-tokenizer/encoding/cl100k_base over @anthropic-ai/tokenizer. Empirical bench (Node v22, same V8 as Workers): cl100k median 0.05-1.3ms across 200B-50KB payloads vs 0.30-7.4ms for Anthropic WASM. p95 dramatically better (no WASM memory-grow spikes). - Token count diverges ~3-4% from Claude tokenizer on English prose; acceptable noise floor for shape analysis (we are not billing). - Bundle delta measured empirically via esbuild: 432KB gzipped (993KB minified). Comfortably within paid-tier Workers limits. Failure handling: - Any tokenizer load or encode failure → countTokensSafe returns null, treated as 0 in telemetry. tokenize_ms = 0 alongside non-zero bytes signals a measurement skip in the data. - Telemetry must never break MCP requests — all measurement code wrapped in try/catch within the waitUntil block. Tests: - New workers/test/tokenize.test.mjs (8 cases, all pass): empty input, positive integer output, scaling with length, full PayloadShape contract, UTF-8 byte length correctness, JSON-RPC payload tokenization, tokenize_ms finiteness, empty-response (SSE) skip path. - Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports; exercises the same TypeScript surface that ships in the worker bundle. - npm run typecheck clean. Methodology note: - This change exists because three theoretical objections (bundle bloat, vodka violation, tokenizer-choice domain opinion) were falsified by a five-minute bench. See klappy://canon/constraints/measure-before-you-object and klappy://canon/observations/performed-prudence-anti-pattern (drafts pending merge into klappy.dev).
Mocks env.ODDKIT_TELEMETRY with a writeDataPoint capture, then exercises recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads. Verifies end-to-end that the full PayloadShape lands in doubles 3-7, that bytes match TextEncoder UTF-8 length, that batch JSON-RPC produces one point per message, and that malformed input is silently dropped. 7/7 cases pass. Notable: the realistic ~8KB response measured tokenize_ms=0.948ms — within 14% of the bench prediction (~1.1ms median for 8KB on Node). The dream-home walkthrough was accurate; real prod will differ but the order of magnitude is locked. Compiles tokenize.ts + telemetry.ts via tsc into a temp dir, post-patches the JSON import to add Node 22's required attribute syntax, then dynamic-imports. Same code path that ships in the worker bundle. This is the verification that wrangler dev would have done if workerd ran in this nested sandbox (it doesn't — workerd dies after declaring ready, likely a Linux capability issue with the container).
Two assertions that would have failed against the pre-fix code: 1. SSE response now asserts tokenize_ms=0 (was: only checked bytes_out/tokens_out, missed the spurious non-zero tokenize_ms that the original logic would record on every SSE response). 2. New test 'Bugbot invariant: tokenize_ms is 0 only when encoder did not actually run' explicitly covers the both-empty case (must be 0) and the request-only case (must be valid finite number). Both new assertions verify Bugbot's distinction: a 0 from countTokensSafe on empty input is a trivial short-circuit, not a real tokenization. Only non-null results on non-empty input prove the encoder ran. The pre-fix code conflated these and would have polluted the bench-vs-prod A/B comparison with spurious tokenize_ms readings on SSE traffic. Real-world tokenize_ms on the realistic 8KB integration test: 1.016ms (bench predicted 1.1ms — within 8%). 8/8 cases passing.
… JSON
CRITICAL FIX. A managed-agent smoke test against the preview deployment
caught that doubles 4 (bytes_out), 6 (tokens_out), and 7 (tokenize_ms)
were all zero across every recorded data point. Six telemetry rows
queried, six rows with bytes_out=0.
Root cause: the call site in workers/src/index.ts filtered the response
clone by Content-Type, only cloning when the type included
'application/json'. MCP's Streamable HTTP transport returns
'text/event-stream' (SSE) for tool calls, not JSON. The filter was
silently dropping almost every response, leaving responseClone null and
recording zeros for the entire response side.
This was the same performed-prudence pattern the new canon docs warn
about, applied in micro: I assumed MCP responses would be JSON without
measuring what the SDK actually returns. The smoke test caught it
because canon also prescribes verification before declaring done.
Fix:
1. New helper measureResponseShape(requestText, response) in tokenize.ts.
Clones the response, reads the body, runs measurePayloadShape. No
Content-Type filter — read everything. SSE protocol overhead (~10
bytes per event) is negligible against the actual payload size, and
oddkit's responses are bounded (no long-lived streams).
2. Call site in index.ts simplified to use the helper. Drops the
filter, drops the separate clone, drops the responseClone variable.
Cleaner code AND correct behavior.
3. Four new unit tests for measureResponseShape:
- measures application/json responses
- measures text/event-stream responses (this would have caught the
bug pre-merge)
- leaves the original response body intact (clone correctness)
- handles already-consumed body without throwing
12/12 unit tests pass, typecheck clean.
Methodology note: this fix exists because the smoke test (live MCP
calls + telemetry_public SQL) caught what unit tests missed. The
canon-prescribed verification gate worked exactly as designed —
release-validation-gate (E0008.3) at klappy://canon/constraints/release-validation-gate
mandates independent live smoke for load-bearing surface changes
before merge. The agent dispatch is that smoke.
…Workers Third smoke confirmed bytes_in/out and tokens_in/out now populate correctly (357-21319 bytes_out, 142-5398 tokens_out across varied payloads). But double7 (tokenize_ms) is still 0 across every row. Root cause: Cloudflare Workers' performance.now() is a deterministic timer — it does NOT advance during synchronous CPU work. The mitigation prevents timing-side-channel attacks. The timer only ticks on I/O. Tokenization (countTokensSafe) is pure CPU work. The encoder runs between two reads of performance.now() with no I/O in between, so both reads return the same value and tokenize_ms is always 0. Tests passed in Node because Node's performance.now() is a real high-resolution timer. Fix: switch to Date.now(). Always advances, at 1ms resolution. The bench-vs-prod comparison loses sub-millisecond precision (sub-ms tokenizations round to 0) but gains a working signal for any payload above ~5KB where bench timing exceeded 1ms. Updated the telemetry.ts schema doc comment to document the 1ms resolution and the Workers-specific reason. Methodology: this is the third Cloudflare Workers gotcha caught in prod that unit tests can't catch — Workers Runtime != Node: 1. b94aaa6 (mine): assumed MCP responses are application/json (they're SSE) 2. 1a555df (mine): assumed clone() inside waitUntil works (body already drained) 3. THIS: assumed performance.now() advances in synchronous code (it doesn't) Each was caught by the live Managed Agent smoke + telemetry_public SQL, not by typecheck or unit tests. The release-validation-gate is the only thing standing between this branch and a quietly broken prod telemetry pipeline. 8 unit tests still pass. Typecheck clean.
Fourth smoke confirmed bytes_in/out and tokens_in/out work in production (357-21319 bytes_out, 142-5398 tokens_out across varied payload sizes). But tokenize_ms remained 0 across every row even with the Date.now() fix from 279f761. Root cause discovered by the agent: Cloudflare Workers freezes BOTH performance.now() AND Date.now() during synchronous CPU work. Both timers only advance on network I/O events as a side-channel mitigation (documented at developers.cloudflare.com/workers/runtime-apis/web-standards/). Tokenization is pure CPU work, so any sub-request timing of it always reads 0 in production. This is a structural runtime constraint, not a bug we can patch. Workarounds considered and rejected: - Force artificial I/O between reads (KV.list, fetch) — adds real latency to telemetry-only paths, grotesque - Two writeDataPoint calls with start/end timestamps — over-engineered, doubles write count, complicates queries - Keep the column as always-0 — actively misleading Decision: drop tokenize_ms entirely from PayloadShape, the doubles array, schema doc, and tests. The bench at workers/test/tokenize.test.mjs already characterized the cost curve (cl100k handles 50 KB in ~1.3 ms on Node v22). Bytes_out + tokens_out are sufficient signal — a future maintainer can predict tokenize_ms from the bench curve given the observed payload sizes. Schema before: doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out, tokenize_ms] // 7 fields Schema after: doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out] // 6 fields Companion canon update at klappy/klappy.dev coming in next commit on that branch — drops tokenize_ms row from the doubles table and removes the tokenize_ms mention in 'What This Enables'. Methodology: this is the fourth Workers Runtime != Node behavioral diff caught by live smoke on this branch. Each was unmeasurable from unit tests because Node behaves differently: 1. b94aaa6 (mine, broken): Content-Type filter (MCP returns SSE) 2. 1a555df (mine, broken): clone in waitUntil (body already drained) 3. 279f761 (mine, broken): Date.now() in Workers (frozen too) 4. THIS: drop the unmeasurable column entirely The release-validation-gate canon doc is the only thing that surfaced each of these — the live preview smoke + telemetry_public SQL caught what no test setup I could ship would have caught. The Workers-runtime gap was real and the gate worked. Tests: - 7/7 unit tests pass (workers/test/tokenize.test.mjs) - 6/6 integration tests pass (workers/test/telemetry-integration.test.mjs) - typecheck clean
Minor bump for payload-shape telemetry (PR #134). Bumps: package.json 0.23.1 -> 0.24.0 workers/package.json 0.23.1 -> 0.24.0 package-lock.json 0.23.0 -> 0.24.0 (root drifted one release behind) workers/package-lock.json 0.23.1 -> 0.24.0 CHANGELOG.md gains the [0.24.0] entry above [0.23.1] documenting: - Added: bytes_in/out, tokens_in/out telemetry doubles + helpers - Changed: drop the Content-Type filter (MCP responses are SSE) - Removed: tokenize_ms — Workers freezes both perf.now and Date.now - Fixed: root package-lock.json version drift back-fill The four Workers Runtime != Node behavioral diffs caught by the five Managed Agent smoke sessions on this branch are listed in the Refs trailer for forensic record. Tests: 7/7 unit + 6/6 integration pass on bumped state. Typecheck clean (reports as oddkit-mcp-worker@0.24.0). Per workflow: dedicated chore/release-x.y.z PR. Branch is off feat/telemetry-tokenization HEAD, so it carries the feature commits + the bump together. After merge, feat/telemetry-tokenization can be closed (its commits are already in main via this release branch).
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
oddkit | e2912f3 | Commit Preview URL Branch Preview URL |
Apr 26 2026, 03:04 AM |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Unprotected
response.clone()can crash MCP handler- Wrapped the synchronous
response.clone()call in a try/catch so a clone failure falls back to skipping telemetry and letsreturn responseexecute, upholding the invariant that telemetry must never break MCP requests.
- Wrapped the synchronous
Preview (be41e0c37f)
diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -970,29 +970,37 @@
const cacheTier = tracer.indexSource;
// Clone the response synchronously before returning so the body is
// still available to read inside the deferred waitUntil callback.
- const responseClone = response.clone();
+ // Guard the clone itself — telemetry must never break MCP requests.
+ let responseClone: Response | null = null;
+ try {
+ responseClone = response.clone();
+ } catch {
+ // If cloning fails, skip telemetry for this request.
+ }
- ctx.waitUntil(
- (async () => {
- try {
- const requestText = await telemetryClone.text();
+ if (responseClone) {
+ ctx.waitUntil(
+ (async () => {
+ try {
+ const requestText = await telemetryClone.text();
- const { measurePayloadShape } = await import("./tokenize");
- const { recordTelemetry } = await import("./telemetry");
+ const { measurePayloadShape } = await import("./tokenize");
+ const { recordTelemetry } = await import("./telemetry");
- let responseText = "";
- try {
- responseText = await responseClone.text();
+ let responseText = "";
+ try {
+ responseText = await responseClone!.text();
+ } catch {
+ // Fall through with empty string; bytes_out / tokens_out will be 0.
+ }
+ const shape = await measurePayloadShape(requestText, responseText);
+ recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
} catch {
- // Fall through with empty string; bytes_out / tokens_out will be 0.
+ // Telemetry must never break MCP requests
}
- const shape = await measurePayloadShape(requestText, responseText);
- recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
- } catch {
- // Telemetry must never break MCP requests
- }
- })(),
- );
+ })(),
+ );
+ }
}
return response;You can send follow-ups to the cloud agent here.
…ble* slots (#137) Consumers of telemetry_public now write SQL using semantic names only. Raw slot names (blob1-blob9, double1-double6) are hidden from the consumer interface entirely. - Raw slot names in queries are rejected with a helpful error naming the semantic equivalent - Semantic names are transparently rewritten to raw slots before forwarding to Cloudflare Analytics Engine - Result columns are rewritten back from raw slots to semantic names before returning - Tool docstring shows semantic names only The schema mapping is sourced from canon at runtime via KnowledgeBaseFetcher.getFile(canon/constraints/telemetry-governance.md) with a hardcoded baseline as safety net per vodka architecture. Per maintainer directive: no deprecation, nobody uses the raw names yet. Phase 1 audit findings (telemetry-governance compliance): bytes_in: populated (sum=41,545 over 7 days) bytes_out: populated (sum=1,316,480) tokens_in: populated (sum=12,602) tokens_out: populated (sum=325,627) cache_tier: always "none" — surfaced as separate open issue Bug fixes from Cursor Bugbot review: - count() SQL aggregate collision (negative lookahead guard) - String literal corruption in rewrite (literal-skip splitting) - Detection inconsistency vs rewrite (literal strip in detection) Test coverage: 17/17 passing including 3 dedicated regression tests for each Bugbot fix family. Open issue surfaced for follow-up: cache_tier always reads "none" in production — recordTelemetry receives the parameter correctly, something upstream is always passing "none".
…nsumption (#138) The MCP handler from agents/mcp returns a streaming Response. `await handler(request, env, ctx)` resolves with the Response object before the tool handler closure has finished populating the tracer. Reading `tracer.indexSource` immediately after that await yields "none" because the "index" / "index-build" span has not been recorded yet. This caused EVERY tool call in production telemetry to record cache_tier="none" across the entire E0008.1 lifetime of the field — including oddkit_search and oddkit_orient which definitely call getIndex(). Fix: move the cacheTier read INSIDE the ctx.waitUntil callback, AFTER `await responseClone.text()` resolves. Reading the response body to completion forces the streaming tool handler to have finished, which means the tracer is fully populated. Diagnostic evidence (same session that found the bug): oddkit_search → response.debug.trace.index_source = "cache" (correct — read inside handleUnifiedAction at orchestrate.ts:2806, AFTER action ran) telemetry blob9 = "none" (incorrect — read at index.ts:981 too early) Same tracer instance, two read sites, two different values. Proves the race. Regression test added: uses setImmediate to model a streaming tool handler that hasn't yet recorded its index access at the moment the outer handler's await resolves. Asserts both the OLD-pattern bug reproduces and the FIXED-pattern recovers the value. Test count: 17 → 18 passing. Open issue (separate PR): getFile emits `file:${path}` spans, not `index` / `index-build`. The tracer's _indexSource setter only recognizes the latter two labels. Whether oddkit_get's blob9 should track the index tier (current behavior post-fix), the file tier (new span label needed), or both is a follow-up scope decision.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Canon-derived names used in regex without escaping
- Added an escapeRegExp helper and used it on the semantic name when building the rewrite regex in rewriteSqlToRaw, so canon-derived names containing regex metacharacters can no longer corrupt the pattern.
Preview (838d97cd27)
diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts
--- a/workers/src/telemetry.ts
+++ b/workers/src/telemetry.ts
@@ -456,6 +456,16 @@
const RAW_SLOT_PATTERN = /\b(blob[1-9]|double[1-9])\b/gi;
/**
+ * Escape regex metacharacters in a string so it can be safely interpolated
+ * into a `new RegExp(...)` pattern. Used for canon-derived semantic names,
+ * which are sourced from a maintainer-controlled markdown table but could
+ * in principle contain characters like `.` that have special regex meaning.
+ */
+function escapeRegExp(s: string): string {
+ return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+}
+
+/**
* Reject queries that contain raw slot names (blob1..9 / double1..6).
* Returns an error message string, or null if the query is clean.
*
@@ -522,7 +532,10 @@
for (const [semantic, raw] of entries) {
// \b word-boundary anchors prevent partial matches inside longer identifiers.
// Negative lookahead (?!\s*\() skips function-call positions (e.g. count(*)).
- const pattern = new RegExp(`\\b${semantic}\\b(?!\\s*\\()`, "g");
+ const pattern = new RegExp(
+ `\\b${escapeRegExp(semantic)}\\b(?!\\s*\\()`,
+ "g",
+ );
out = out.replace(pattern, raw);
}
return out;You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 838d97c. Configure here.
| // \b word-boundary anchors prevent partial matches inside longer identifiers. | ||
| // Negative lookahead (?!\s*\() skips function-call positions (e.g. count(*)). | ||
| const pattern = new RegExp(`\\b${semantic}\\b(?!\\s*\\()`, "g"); | ||
| out = out.replace(pattern, raw); |
There was a problem hiding this comment.
Canon-derived names used in regex without escaping
Low Severity
In rewriteSqlToRaw, semantic names are interpolated directly into new RegExp() without escaping regex metacharacters. The baseline names are safe (only [a-z_]), but parseDoublesFromCanon accepts backtick-quoted identifiers containing arbitrary characters (e.g., ., (, +). If canon ever provides such a name, it would corrupt the regex and produce incorrect or broken SQL rewriting.
Reviewed by Cursor Bugbot for commit 838d97c. Configure here.
…th (#139) Follow-up to #138. The streaming-race fix corrected the timing bug, but oddkit_get for klappy:// URIs still recorded cache_tier="none" because of a separate defect: the tracer's _indexSource setter only recognized 'index' and 'index-build' labels. runGet for klappy:// URIs takes a fast path that skips getIndex entirely and calls getFile directly. The fetcher emits 'file:\${path}' spans which the setter ignored. ~95% of oddkit_get calls hit this path, so even after #138, ~95% of get calls had cache_tier='none'. Fix: extend the setter to also recognize 'file:*' labels alongside 'index' / 'index-build'. First-wins guard preserved: - runSearch: 'index' fires first → index tier wins - runGet (klappy://): only 'file:*' fires → file tier wins - runGet (kb://, odd://): 'index' first → index tier wins (~5%) - 'file-r2:*' (R2 miss with source='miss') excluded by guard Internal field/getter names stay unchanged. Public envelope key 'index_source' in tracer.toJSON() is part of the response contract. Doc-comment updated to reflect broader semantic. 4 regression tests added covering: file:* recognition, index-wins preserved, miss-spans excluded, original behavior unchanged. Test count: 18 → 22 passing. This completes the cache_tier dashboard: every tool with a real data fetch now records its actual tier.


Note
Medium Risk
Adds new telemetry fields and rewrites the telemetry query path (semantic-to-raw SQL translation plus result rewriting), which could affect observability accuracy and query behavior if parsing/rewrite edge cases slip through.
Overview
Adds payload-shape telemetry to the Worker by measuring request/response
bytes_*andtokens_*(via lazy-loadedgpt-tokenizercl100k) and writing them asdouble3–double6inoddkit_telemetry, with response cloning + measurement deferred inctx.waitUntil(and no Content-Type filtering).Refactors telemetry recording to accept the already-read request body and payload shape, fixes a streaming race by reading
tracer.indexSourceonly after response body consumption, and expands cache-tier detection to treatfile:*spans as primary fetches.Upgrades
telemetry_publicquerying to a semantic schema contract: loads a canon-derived mapping (baseline fallback), rejects rawblobN/doubleNreferences, rewrites consumer SQL to raw slots for Cloudflare, and rewrites result column names back to semantic. Adds new unit/integration tests for tokenization, telemetry write path, and SQL rewriting, and bumps root/worker versions to0.24.0(plus addsgpt-tokenizerdependency).Reviewed by Cursor Bugbot for commit e2912f3. Bugbot is set up for automated code reviews on this repo. Configure here.