Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,40 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.24.0] - 2026-04-23

### Added

- **Payload-shape telemetry: `bytes_in`, `bytes_out`, `tokens_in`, `tokens_out` (PR #134)**. Four new doubles in the `oddkit_telemetry` Analytics Engine schema (`double3`–`double6`), measured per MCP request and written from a fire-and-forget `waitUntil` callback so user-facing latency is unchanged. Bytes are UTF-8 wire-length via `TextEncoder`; tokens are cl100k_base counts via `gpt-tokenizer/encoding/cl100k_base` (chosen over `@anthropic-ai/tokenizer` after a 5-minute Node bench: ~6× faster median, dramatically better p95, ~432 KB gzipped via subpath import — see `workers/test/tokenize.test.mjs`). Schema goes from 7 doubles to 6 (full doubles array: `[count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out]`). Tokenizer is module-level singleton, lazy-loaded via dynamic import, cached across requests within an isolate. Cold-call parses the encoder once; warm-call cost is sub-millisecond on Node, in the same V8 the Workers runtime uses. Bench-vs-prod comparison validated via fifth Managed Agent smoke at session `sesn_011CaMNujMg9pymcz18JFPp8` (`tokenization-smoke-managed` consumer label): `oddkit_catalog` → 21,437 bytes_out / 5,856 tokens_out; `oddkit_time` → 178 bytes_out / 71 tokens_out; chars-per-token ratio (~3.7–4.5) consistent with the bench's prediction across all observed payload sizes.

- **Telemetry write helpers in `workers/src/tokenize.ts` (PR #134)**. New `measurePayloadShape(requestText, responseText)` returns `PayloadShape` (the 4-field struct above) given two body strings. `countTokensSafe(text)` wraps the encoder in a try/catch and returns `null` on failure so the telemetry path never throws. The call site in `workers/src/index.ts` clones the response synchronously before `return response`, then reads + measures inside `ctx.waitUntil` — clone must be synchronous because the body is a one-shot stream that the runtime drains as soon as the handler returns.

### Changed

- **No Content-Type filter on the response body (PR #134)**. The first iteration of payload-shape telemetry skipped any response whose Content-Type was not `application/json`, on the assumption that MCP responses would always be JSON. They are not — MCP's Streamable HTTP transport returns `text/event-stream` for tool calls, and the filter caused 100% of tool_call responses to record `bytes_out=0, tokens_out=0`. The filter was removed; the response body is now read regardless of Content-Type. SSE protocol overhead (~10 bytes per event) is negligible against the actual payload size, and oddkit's responses are bounded single-event streams that drain quickly. Telemetry is wrapped in a try/catch to preserve the non-breaking invariant for any future response that might fail to clone.

### Removed

- **`tokenize_ms` (formerly `double7`) — Workers runtime cannot measure it (PR #134)**. A previous iteration of the schema shipped a `tokenize_ms` field intended to capture the wall-clock cost of tokenization for bench-vs-prod comparison. Live smoke against the preview confirmed it always reads `0` in production. Cause is structural, not a bug: Cloudflare Workers freezes both `performance.now()` and `Date.now()` between network I/O events as a timing-side-channel mitigation (documented at `developers.cloudflare.com/workers/runtime-apis/web-standards/`). Tokenization is pure CPU work, so any sub-request timing of it from inside a Worker request handler is unmeasurable. The field was dropped from `PayloadShape`, the `writeDataPoint` doubles array, and the `telemetry-governance` canon doc. The bench at `workers/test/tokenize.test.mjs` characterized the cost curve once (cl100k handles 50 KB in ~1.3 ms on Node v22, the same V8 the Workers runtime uses); future per-call cost is predictable from observed `bytes_out` / `tokens_out` against that curve. See `klappy://canon/constraints/telemetry-governance` § "Why no tokenize_ms" for the published rationale.

### Fixed

- **Root `package-lock.json` version drift back-fill (this PR)**. Pre-bump state showed root `package-lock.json` at `0.23.0` while `workers/package-lock.json` was at `0.23.1` — root drifted one release behind. Both lockfiles are now bumped to `0.24.0` (top-level `version` and `packages[""].version`). The pre-commit hook enforces sync between `package.json` and `workers/package.json`; both `package-lock.json` files still require manual sync per current tooling.

### Refs

- PR (code): [klappy/oddkit#134](https://github.com/klappy/oddkit/pull/134)
- PR (canon): [klappy/klappy.dev#134](https://github.com/klappy/klappy.dev/pull/134) — telemetry-governance schema update, two new constraints (`measure-before-you-object`, `performed-prudence-anti-pattern`)
- Five Managed Agent smoke sessions (forensic record):
- `sesn_011CaMJdyWpUAm8n7YgRyLLG` — caught Content-Type filter dropping all SSE responses
- `sesn_011CaMKDLhT5zvUAUJ2HUvfW` — caught `clone()` inside `waitUntil` producing empty reader
- `sesn_011CaMLronGtL22J6R7fAPMs` — caught `performance.now()` frozen during synchronous CPU work
- `sesn_011CaMMf7tirAh2v5YoZHkxA` — caught `Date.now()` frozen too (both timers under deterministic-timing mitigation)
- `sesn_011CaMNujMg9pymcz18JFPp8` — **PASS** after dropping `tokenize_ms`; verified `bytes_in`/`bytes_out`/`tokens_in`/`tokens_out` populate with realistic varied values across tools
- Agent: `agent_011CaMJd8jvMj5CJMiQ11TdM`. Environment: `env_016RffZyqSdHeb5s3Z6UABw8`. Sonnet 4.6 throughout per `klappy://canon/constraints/release-validation-gate`.
- Canon basis: `klappy://canon/constraints/release-validation-gate`, `klappy://canon/constraints/telemetry-governance`, `klappy://canon/constraints/measure-before-you-object`, `klappy://canon/observations/performed-prudence-anti-pattern`.
- Tests: 7/7 unit (`workers/test/tokenize.test.mjs`), 6/6 integration (`workers/test/telemetry-integration.test.mjs`). Typecheck clean. Bench artifact at `workers/test/tokenize.test.mjs` (cl100k vs anthropic comparison, 200B–50KB sweep).

## [0.23.1] - 2026-04-21

### Fixed
Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "oddkit",
"version": "0.23.1",
"version": "0.24.0",
"description": "Agent-first CLI for ODD-governed repos. Epistemic terrain rendering with portable baseline.",
"type": "module",
"bin": {
Expand Down
11 changes: 9 additions & 2 deletions workers/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions workers/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "oddkit-mcp-worker",
"version": "0.23.1",
"version": "0.24.0",
"private": true,
"type": "module",
"scripts": {
Expand All @@ -12,7 +12,8 @@
"dependencies": {
"agents": "^0.4.1",
"fflate": "^0.8.2",
"zod": "^4.3.6"
"zod": "^4.3.6",
"gpt-tokenizer": "^3.0.0"
},
"devDependencies": {
"@cloudflare/workers-types": "^4.20250124.0",
Expand Down
24 changes: 23 additions & 1 deletion workers/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -958,14 +958,36 @@ export default {

// Phase 1 telemetry — non-blocking, fire-and-forget (E0008)
// Phase 1.5: cache_tier from tracer feeds blob9 (E0008.1)
// Phase 2: payload shape (bytes_in/out, tokens_in/out) feeds doubles
// 3-6. tokenize_ms was tried and dropped — Workers freezes both
// performance.now() and Date.now() during synchronous CPU work, making
// sub-request timing of pure-CPU tokenization unmeasurable. Response body is
// measured universally — MCP's Streamable HTTP transport returns SSE,
// not JSON, so a Content-Type filter would (and did) drop almost every
// response. The helper handles clone failures safely.
if (telemetryClone) {
const durationMs = Date.now() - startTime;
const cacheTier = tracer.indexSource;
// Clone the response synchronously before returning so the body is
// still available to read inside the deferred waitUntil callback.
const responseClone = response.clone();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unprotected response.clone() can break MCP responses

Medium Severity

The response.clone() call sits outside any try/catch, while the ctx.waitUntil callback's catch block (line 991–993) explicitly upholds the invariant "Telemetry must never break MCP requests." If clone() throws (e.g., the SDK returns a response with an already-disturbed or locked body), the exception prevents return response from ever executing, turning a telemetry-only code path into a user-facing 500 error. The old code had no response.clone() at all, so this is a new risk. Moving the clone inside the existing try/catch (or wrapping it in its own) would preserve the stated safety guarantee.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d023ad6. Configure here.


ctx.waitUntil(
(async () => {
try {
const requestText = await telemetryClone.text();

const { measurePayloadShape } = await import("./tokenize");
const { recordTelemetry } = await import("./telemetry");
await recordTelemetry(telemetryClone, env, durationMs, cacheTier);

let responseText = "";
try {
responseText = await responseClone.text();
} catch {
// Fall through with empty string; bytes_out / tokens_out will be 0.
}
const shape = await measurePayloadShape(requestText, responseText);
recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
} catch {
// Telemetry must never break MCP requests
}
Expand Down
145 changes: 98 additions & 47 deletions workers/src/telemetry.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,34 @@
* handler's internal compute. Expect a long tail on
* cache-miss requests even for trivial actions like
* oddkit_time.
* double3: bytes_in — UTF-8 byte length of the JSON-RPC request body.
* 0 when telemetry was unable to read the body.
* Tokenizer-agnostic; exact wire size.
* double4: bytes_out — UTF-8 byte length of the response body. 0 for
* streamed responses (SSE) where the body cannot be
* measured without consuming the stream.
* double5: tokens_in — cl100k_base token count of the request body.
* See `tokenize.ts` for the tokenizer-choice rationale.
* 0 when tokenization was skipped or failed.
* double6: tokens_out — cl100k_base token count of the response body. 0 for
* streamed responses or tokenizer failure.
*
* NOTE: a previous iteration shipped a `double7: tokenize_ms` field intended
* to capture the wall-clock cost of tokenization for bench-vs-prod
* comparison. It is gone. Cloudflare Workers freezes both
* `performance.now()` and `Date.now()` between network I/O events as a
* timing-side-channel mitigation, so any timing of pure CPU work always
* reads 0 in production. The cost was characterized in the bench (workers/
* test/tokenize.test.mjs) and bytes_in/out + tokens_in/out are sufficient
* to predict per-call cost from that bench curve.
*
* index1: sampling_key — consumer label (for sampling consistency)
*
* See: klappy://canon/constraints/telemetry-governance
*/

import type { Env } from "./zip-baseline-fetcher";
import type { PayloadShape } from "./tokenize";
import pkg from "../package.json";

// Build-time fallback for blob8 (worker_version). env.ODDKIT_VERSION is
Expand Down Expand Up @@ -198,55 +220,84 @@ export function parseToolCall(payload: unknown): {
* Record one telemetry data point per JSON-RPC message.
* Non-blocking — uses env.ODDKIT_TELEMETRY.writeDataPoint() which requires
* no await (fire-and-forget via Analytics Engine).
* Called with a cloned request to avoid consuming the original body.
*
* Caller responsibilities:
* - Pass the raw request body as `requestBody` (string). Already-cloned and
* read; this function will parse it as JSON-RPC.
* - Pass the original `request` so consumer-label resolution can read URL
* params and headers.
* - Pass `shape` describing the payload byte and token shape, or null to
* write zeros for the shape doubles (e.g. when the response could not be
* measured because it was an SSE stream).
*/
export function recordTelemetry(request: Request, env: Env, durationMs: number, cacheTier?: string): Promise<void> {
if (!env.ODDKIT_TELEMETRY) return Promise.resolve();

// Parse the request body to extract JSON-RPC details
return request
.json()
.then((body: unknown) => {
// Handle batch requests — process each message
const messages = Array.isArray(body) ? body : [body];

for (const payload of messages) {
const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
request,
payload,
);
const toolCall = parseToolCall(payload);

const msg =
typeof payload === "object" && payload !== null
? (payload as Record<string, unknown>)
: {};
const method = typeof msg.method === "string" ? msg.method : "unknown";

const eventType = toolCall ? "tool_call" : "mcp_request";
const toolName = toolCall?.toolName ?? "";
const documentUri = toolCall?.documentUri ?? "";

env.ODDKIT_TELEMETRY!.writeDataPoint({
blobs: [
eventType,
method,
toolName,
consumerLabel,
consumerSource,
toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
documentUri,
env.ODDKIT_VERSION || BUILD_VERSION,
cacheTier || "none", // blob9: E0008.1 x-ray cache tier
],
doubles: [1, durationMs],
indexes: [consumerLabel],
});
}
})
.catch(() => {
// Telemetry must never break MCP requests — silently drop parse failures
export function recordTelemetry(
request: Request,
requestBody: string,
env: Env,
durationMs: number,
cacheTier?: string,
shape?: PayloadShape | null,
): void {
if (!env.ODDKIT_TELEMETRY) return;

let body: unknown;
try {
body = JSON.parse(requestBody);
} catch {
// Malformed JSON-RPC — silently drop, telemetry must never break MCP requests
return;
}

// Handle batch requests — process each message
const messages = Array.isArray(body) ? body : [body];

// Bytes/tokens are per-request (not per-message); for batches we attribute
// the full payload shape to each message rather than fabricating a split.
const bytesIn = shape?.bytes_in ?? 0;
const bytesOut = shape?.bytes_out ?? 0;
const tokensIn = shape?.tokens_in ?? 0;
const tokensOut = shape?.tokens_out ?? 0;

for (const payload of messages) {
const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
request,
payload,
);
const toolCall = parseToolCall(payload);

const msg =
typeof payload === "object" && payload !== null
? (payload as Record<string, unknown>)
: {};
const method = typeof msg.method === "string" ? msg.method : "unknown";

const eventType = toolCall ? "tool_call" : "mcp_request";
const toolName = toolCall?.toolName ?? "";
const documentUri = toolCall?.documentUri ?? "";

env.ODDKIT_TELEMETRY!.writeDataPoint({
blobs: [
eventType,
method,
toolName,
consumerLabel,
consumerSource,
toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
documentUri,
env.ODDKIT_VERSION || BUILD_VERSION,
cacheTier || "none", // blob9: E0008.1 x-ray cache tier
],
doubles: [
1, // double1: count
durationMs, // double2: duration_ms
bytesIn, // double3: bytes_in
bytesOut, // double4: bytes_out
tokensIn, // double5: tokens_in
tokensOut, // double6: tokens_out
],
indexes: [consumerLabel],
});
}
}

// ──────────────────────────────────────────────────────────────────────────────
Expand Down
Loading
Loading