diff --git a/CHANGELOG.md b/CHANGELOG.md index ffdea70..7468502 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,40 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.24.0] - 2026-04-23 + +### Added + +- **Payload-shape telemetry: `bytes_in`, `bytes_out`, `tokens_in`, `tokens_out` (PR #134)**. Four new doubles in the `oddkit_telemetry` Analytics Engine schema (`double3`–`double6`), measured per MCP request and written from a fire-and-forget `waitUntil` callback so user-facing latency is unchanged. Bytes are UTF-8 wire-length via `TextEncoder`; tokens are cl100k_base counts via `gpt-tokenizer/encoding/cl100k_base` (chosen over `@anthropic-ai/tokenizer` after a 5-minute Node bench: ~6× faster median, dramatically better p95, ~432 KB gzipped via subpath import — see `workers/test/tokenize.test.mjs`). Schema goes from 7 doubles to 6 (full doubles array: `[count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out]`). Tokenizer is module-level singleton, lazy-loaded via dynamic import, cached across requests within an isolate. Cold-call parses the encoder once; warm-call cost is sub-millisecond on Node, in the same V8 the Workers runtime uses. Bench-vs-prod comparison validated via fifth Managed Agent smoke at session `sesn_011CaMNujMg9pymcz18JFPp8` (`tokenization-smoke-managed` consumer label): `oddkit_catalog` → 21,437 bytes_out / 5,856 tokens_out; `oddkit_time` → 178 bytes_out / 71 tokens_out; chars-per-token ratio (~3.7–4.5) consistent with the bench's prediction across all observed payload sizes. + +- **Telemetry write helpers in `workers/src/tokenize.ts` (PR #134)**. New `measurePayloadShape(requestText, responseText)` returns `PayloadShape` (the 4-field struct above) given two body strings. `countTokensSafe(text)` wraps the encoder in a try/catch and returns `null` on failure so the telemetry path never throws. The call site in `workers/src/index.ts` clones the response synchronously before `return response`, then reads + measures inside `ctx.waitUntil` — clone must be synchronous because the body is a one-shot stream that the runtime drains as soon as the handler returns. + +### Changed + +- **No Content-Type filter on the response body (PR #134)**. The first iteration of payload-shape telemetry skipped any response whose Content-Type was not `application/json`, on the assumption that MCP responses would always be JSON. They are not — MCP's Streamable HTTP transport returns `text/event-stream` for tool calls, and the filter caused 100% of tool_call responses to record `bytes_out=0, tokens_out=0`. The filter was removed; the response body is now read regardless of Content-Type. SSE protocol overhead (~10 bytes per event) is negligible against the actual payload size, and oddkit's responses are bounded single-event streams that drain quickly. Telemetry is wrapped in a try/catch to preserve the non-breaking invariant for any future response that might fail to clone. + +### Removed + +- **`tokenize_ms` (formerly `double7`) — Workers runtime cannot measure it (PR #134)**. A previous iteration of the schema shipped a `tokenize_ms` field intended to capture the wall-clock cost of tokenization for bench-vs-prod comparison. Live smoke against the preview confirmed it always reads `0` in production. Cause is structural, not a bug: Cloudflare Workers freezes both `performance.now()` and `Date.now()` between network I/O events as a timing-side-channel mitigation (documented at `developers.cloudflare.com/workers/runtime-apis/web-standards/`). Tokenization is pure CPU work, so any sub-request timing of it from inside a Worker request handler is unmeasurable. The field was dropped from `PayloadShape`, the `writeDataPoint` doubles array, and the `telemetry-governance` canon doc. The bench at `workers/test/tokenize.test.mjs` characterized the cost curve once (cl100k handles 50 KB in ~1.3 ms on Node v22, the same V8 the Workers runtime uses); future per-call cost is predictable from observed `bytes_out` / `tokens_out` against that curve. See `klappy://canon/constraints/telemetry-governance` § "Why no tokenize_ms" for the published rationale. + +### Fixed + +- **Root `package-lock.json` version drift back-fill (this PR)**. Pre-bump state showed root `package-lock.json` at `0.23.0` while `workers/package-lock.json` was at `0.23.1` — root drifted one release behind. Both lockfiles are now bumped to `0.24.0` (top-level `version` and `packages[""].version`). The pre-commit hook enforces sync between `package.json` and `workers/package.json`; both `package-lock.json` files still require manual sync per current tooling. + +### Refs + +- PR (code): [klappy/oddkit#134](https://github.com/klappy/oddkit/pull/134) +- PR (canon): [klappy/klappy.dev#134](https://github.com/klappy/klappy.dev/pull/134) — telemetry-governance schema update, two new constraints (`measure-before-you-object`, `performed-prudence-anti-pattern`) +- Five Managed Agent smoke sessions (forensic record): + - `sesn_011CaMJdyWpUAm8n7YgRyLLG` — caught Content-Type filter dropping all SSE responses + - `sesn_011CaMKDLhT5zvUAUJ2HUvfW` — caught `clone()` inside `waitUntil` producing empty reader + - `sesn_011CaMLronGtL22J6R7fAPMs` — caught `performance.now()` frozen during synchronous CPU work + - `sesn_011CaMMf7tirAh2v5YoZHkxA` — caught `Date.now()` frozen too (both timers under deterministic-timing mitigation) + - `sesn_011CaMNujMg9pymcz18JFPp8` — **PASS** after dropping `tokenize_ms`; verified `bytes_in`/`bytes_out`/`tokens_in`/`tokens_out` populate with realistic varied values across tools +- Agent: `agent_011CaMJd8jvMj5CJMiQ11TdM`. Environment: `env_016RffZyqSdHeb5s3Z6UABw8`. Sonnet 4.6 throughout per `klappy://canon/constraints/release-validation-gate`. +- Canon basis: `klappy://canon/constraints/release-validation-gate`, `klappy://canon/constraints/telemetry-governance`, `klappy://canon/constraints/measure-before-you-object`, `klappy://canon/observations/performed-prudence-anti-pattern`. +- Tests: 7/7 unit (`workers/test/tokenize.test.mjs`), 6/6 integration (`workers/test/telemetry-integration.test.mjs`). Typecheck clean. Bench artifact at `workers/test/tokenize.test.mjs` (cl100k vs anthropic comparison, 200B–50KB sweep). + ## [0.23.1] - 2026-04-21 ### Fixed diff --git a/package-lock.json b/package-lock.json index b476280..91cdcd5 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "oddkit", - "version": "0.23.0", + "version": "0.24.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "oddkit", - "version": "0.23.0", + "version": "0.24.0", "license": "MIT", "dependencies": { "@modelcontextprotocol/sdk": "^1.0.0", diff --git a/package.json b/package.json index 151a50b..17d1946 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "oddkit", - "version": "0.23.1", + "version": "0.24.0", "description": "Agent-first CLI for ODD-governed repos. Epistemic terrain rendering with portable baseline.", "type": "module", "bin": { diff --git a/workers/package-lock.json b/workers/package-lock.json index 6a50a31..1c3e76a 100644 --- a/workers/package-lock.json +++ b/workers/package-lock.json @@ -1,15 +1,16 @@ { "name": "oddkit-mcp-worker", - "version": "0.23.0", + "version": "0.24.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "oddkit-mcp-worker", - "version": "0.23.0", + "version": "0.24.0", "dependencies": { "agents": "^0.4.1", "fflate": "^0.8.2", + "gpt-tokenizer": "^3.0.0", "zod": "^4.3.6" }, "devDependencies": { @@ -2149,6 +2150,12 @@ "url": "https://github.com/sponsors/ljharb" } }, + "node_modules/gpt-tokenizer": { + "version": "3.4.0", + "resolved": "https://registry.npmjs.org/gpt-tokenizer/-/gpt-tokenizer-3.4.0.tgz", + "integrity": "sha512-wxFLnhIXTDjYebd9A9pGl3e31ZpSypbpIJSOswbgop5jLte/AsZVDvjlbEuVFlsqZixVKqbcoNmRlFDf6pz/UQ==", + "license": "MIT" + }, "node_modules/has-symbols": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz", diff --git a/workers/package.json b/workers/package.json index 5d4b310..f2e7021 100644 --- a/workers/package.json +++ b/workers/package.json @@ -1,6 +1,6 @@ { "name": "oddkit-mcp-worker", - "version": "0.23.1", + "version": "0.24.0", "private": true, "type": "module", "scripts": { @@ -12,7 +12,8 @@ "dependencies": { "agents": "^0.4.1", "fflate": "^0.8.2", - "zod": "^4.3.6" + "zod": "^4.3.6", + "gpt-tokenizer": "^3.0.0" }, "devDependencies": { "@cloudflare/workers-types": "^4.20250124.0", diff --git a/workers/src/index.ts b/workers/src/index.ts index 045fd92..5c646c2 100644 --- a/workers/src/index.ts +++ b/workers/src/index.ts @@ -958,14 +958,36 @@ export default { // Phase 1 telemetry — non-blocking, fire-and-forget (E0008) // Phase 1.5: cache_tier from tracer feeds blob9 (E0008.1) + // Phase 2: payload shape (bytes_in/out, tokens_in/out) feeds doubles + // 3-6. tokenize_ms was tried and dropped — Workers freezes both + // performance.now() and Date.now() during synchronous CPU work, making + // sub-request timing of pure-CPU tokenization unmeasurable. Response body is + // measured universally — MCP's Streamable HTTP transport returns SSE, + // not JSON, so a Content-Type filter would (and did) drop almost every + // response. The helper handles clone failures safely. if (telemetryClone) { const durationMs = Date.now() - startTime; const cacheTier = tracer.indexSource; + // Clone the response synchronously before returning so the body is + // still available to read inside the deferred waitUntil callback. + const responseClone = response.clone(); + ctx.waitUntil( (async () => { try { + const requestText = await telemetryClone.text(); + + const { measurePayloadShape } = await import("./tokenize"); const { recordTelemetry } = await import("./telemetry"); - await recordTelemetry(telemetryClone, env, durationMs, cacheTier); + + let responseText = ""; + try { + responseText = await responseClone.text(); + } catch { + // Fall through with empty string; bytes_out / tokens_out will be 0. + } + const shape = await measurePayloadShape(requestText, responseText); + recordTelemetry(request, requestText, env, durationMs, cacheTier, shape); } catch { // Telemetry must never break MCP requests } diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts index 6a77881..e6baf02 100644 --- a/workers/src/telemetry.ts +++ b/workers/src/telemetry.ts @@ -28,12 +28,34 @@ * handler's internal compute. Expect a long tail on * cache-miss requests even for trivial actions like * oddkit_time. + * double3: bytes_in — UTF-8 byte length of the JSON-RPC request body. + * 0 when telemetry was unable to read the body. + * Tokenizer-agnostic; exact wire size. + * double4: bytes_out — UTF-8 byte length of the response body. 0 for + * streamed responses (SSE) where the body cannot be + * measured without consuming the stream. + * double5: tokens_in — cl100k_base token count of the request body. + * See `tokenize.ts` for the tokenizer-choice rationale. + * 0 when tokenization was skipped or failed. + * double6: tokens_out — cl100k_base token count of the response body. 0 for + * streamed responses or tokenizer failure. + * + * NOTE: a previous iteration shipped a `double7: tokenize_ms` field intended + * to capture the wall-clock cost of tokenization for bench-vs-prod + * comparison. It is gone. Cloudflare Workers freezes both + * `performance.now()` and `Date.now()` between network I/O events as a + * timing-side-channel mitigation, so any timing of pure CPU work always + * reads 0 in production. The cost was characterized in the bench (workers/ + * test/tokenize.test.mjs) and bytes_in/out + tokens_in/out are sufficient + * to predict per-call cost from that bench curve. + * * index1: sampling_key — consumer label (for sampling consistency) * * See: klappy://canon/constraints/telemetry-governance */ import type { Env } from "./zip-baseline-fetcher"; +import type { PayloadShape } from "./tokenize"; import pkg from "../package.json"; // Build-time fallback for blob8 (worker_version). env.ODDKIT_VERSION is @@ -198,55 +220,84 @@ export function parseToolCall(payload: unknown): { * Record one telemetry data point per JSON-RPC message. * Non-blocking — uses env.ODDKIT_TELEMETRY.writeDataPoint() which requires * no await (fire-and-forget via Analytics Engine). - * Called with a cloned request to avoid consuming the original body. + * + * Caller responsibilities: + * - Pass the raw request body as `requestBody` (string). Already-cloned and + * read; this function will parse it as JSON-RPC. + * - Pass the original `request` so consumer-label resolution can read URL + * params and headers. + * - Pass `shape` describing the payload byte and token shape, or null to + * write zeros for the shape doubles (e.g. when the response could not be + * measured because it was an SSE stream). */ -export function recordTelemetry(request: Request, env: Env, durationMs: number, cacheTier?: string): Promise { - if (!env.ODDKIT_TELEMETRY) return Promise.resolve(); - - // Parse the request body to extract JSON-RPC details - return request - .json() - .then((body: unknown) => { - // Handle batch requests — process each message - const messages = Array.isArray(body) ? body : [body]; - - for (const payload of messages) { - const { label: consumerLabel, source: consumerSource } = parseConsumerLabel( - request, - payload, - ); - const toolCall = parseToolCall(payload); - - const msg = - typeof payload === "object" && payload !== null - ? (payload as Record) - : {}; - const method = typeof msg.method === "string" ? msg.method : "unknown"; - - const eventType = toolCall ? "tool_call" : "mcp_request"; - const toolName = toolCall?.toolName ?? ""; - const documentUri = toolCall?.documentUri ?? ""; - - env.ODDKIT_TELEMETRY!.writeDataPoint({ - blobs: [ - eventType, - method, - toolName, - consumerLabel, - consumerSource, - toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "", - documentUri, - env.ODDKIT_VERSION || BUILD_VERSION, - cacheTier || "none", // blob9: E0008.1 x-ray cache tier - ], - doubles: [1, durationMs], - indexes: [consumerLabel], - }); - } - }) - .catch(() => { - // Telemetry must never break MCP requests — silently drop parse failures +export function recordTelemetry( + request: Request, + requestBody: string, + env: Env, + durationMs: number, + cacheTier?: string, + shape?: PayloadShape | null, +): void { + if (!env.ODDKIT_TELEMETRY) return; + + let body: unknown; + try { + body = JSON.parse(requestBody); + } catch { + // Malformed JSON-RPC — silently drop, telemetry must never break MCP requests + return; + } + + // Handle batch requests — process each message + const messages = Array.isArray(body) ? body : [body]; + + // Bytes/tokens are per-request (not per-message); for batches we attribute + // the full payload shape to each message rather than fabricating a split. + const bytesIn = shape?.bytes_in ?? 0; + const bytesOut = shape?.bytes_out ?? 0; + const tokensIn = shape?.tokens_in ?? 0; + const tokensOut = shape?.tokens_out ?? 0; + + for (const payload of messages) { + const { label: consumerLabel, source: consumerSource } = parseConsumerLabel( + request, + payload, + ); + const toolCall = parseToolCall(payload); + + const msg = + typeof payload === "object" && payload !== null + ? (payload as Record) + : {}; + const method = typeof msg.method === "string" ? msg.method : "unknown"; + + const eventType = toolCall ? "tool_call" : "mcp_request"; + const toolName = toolCall?.toolName ?? ""; + const documentUri = toolCall?.documentUri ?? ""; + + env.ODDKIT_TELEMETRY!.writeDataPoint({ + blobs: [ + eventType, + method, + toolName, + consumerLabel, + consumerSource, + toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "", + documentUri, + env.ODDKIT_VERSION || BUILD_VERSION, + cacheTier || "none", // blob9: E0008.1 x-ray cache tier + ], + doubles: [ + 1, // double1: count + durationMs, // double2: duration_ms + bytesIn, // double3: bytes_in + bytesOut, // double4: bytes_out + tokensIn, // double5: tokens_in + tokensOut, // double6: tokens_out + ], + indexes: [consumerLabel], }); + } } // ────────────────────────────────────────────────────────────────────────────── diff --git a/workers/src/tokenize.ts b/workers/src/tokenize.ts new file mode 100644 index 0000000..56392dd --- /dev/null +++ b/workers/src/tokenize.ts @@ -0,0 +1,115 @@ +/** + * Tokenizer module for oddkit MCP Worker telemetry (E0008). + * + * Provides cl100k_base token counts for request and response payloads. + * cl100k_base is GPT-4's tokenizer; we use it as a tokenizer-agnostic + * proxy for "payload token shape," not as a billing-accurate measure + * for any specific consumer model. + * + * Choice of cl100k_base over @anthropic-ai/tokenizer: the cl100k bundle + * benchmarks ~6x faster (median 0.05–1.3ms across 200B–50KB payloads on + * Node v8, the same engine as Cloudflare Workers) and has dramatically + * better p95 (no WASM memory-grow spikes). Token counts diverge from the + * Claude tokenizer by ~3–4% on English prose — acceptable noise floor + * for shape analysis. See `klappy://canon/constraints/measure-before-you-object` + * for the bench methodology that drove this choice. + * + * Bundle impact: ~432 KB gzipped via the `gpt-tokenizer/encoding/cl100k_base` + * subpath import. Loaded via dynamic import so cold paths that don't + * tokenize don't pay the parse cost. + * + * Failure mode: if the tokenizer fails to load or throws on a payload, + * `countTokensSafe` returns null. Telemetry treats null as "not measured" + * and writes `0` to keep the schema dense; the absence is visible in the + * tokens columns being 0 alongside non-zero bytes. + * + * See: klappy://canon/constraints/telemetry-governance + */ + +type CountTokensFn = (text: string) => number; + +let encoderPromise: Promise | null = null; + +/** + * Lazily import gpt-tokenizer's cl100k_base encoder. Cached across requests + * via the module-level promise; the first call within a worker isolate pays + * the parse cost, all subsequent calls are warm. + */ +function getEncoder(): Promise { + if (encoderPromise) return encoderPromise; + + encoderPromise = import("gpt-tokenizer/encoding/cl100k_base") + .then((mod) => { + const fn = (mod as { countTokens?: CountTokensFn }).countTokens; + if (typeof fn !== "function") return null; + return fn; + }) + .catch(() => null); + + return encoderPromise; +} + +/** + * Count cl100k_base tokens in `text`. Returns null on any failure + * (load failure, encoder throw, etc). Telemetry must never break MCP + * requests — this function never throws. + */ +export async function countTokensSafe(text: string): Promise { + if (!text) return 0; + try { + const fn = await getEncoder(); + if (!fn) return null; + return fn(text); + } catch { + return null; + } +} + +/** + * Result of measuring a payload pair. All fields default to 0 on failure + * so the telemetry schema stays dense; the absence of a real value is + * encoded by tokens_in / tokens_out being 0 alongside non-zero bytes + * (encoder skipped or failed). + * + * Note: this struct does NOT carry a tokenize_ms field. Cloudflare Workers + * freezes both `performance.now()` and `Date.now()` during synchronous + * CPU work as a timing-side-channel mitigation — neither timer advances + * unless a network I/O event occurs between reads. Tokenization is pure + * CPU work, so any sub-request timing of it would always read 0 in + * production. The cost was already characterized in the bench (bench + * file at workers/test/tokenize.test.mjs and integration test). We keep + * the bytes/tokens shape and trust the bench for the per-payload cost + * curve. + */ +export interface PayloadShape { + bytes_in: number; + bytes_out: number; + tokens_in: number; + tokens_out: number; +} + +/** + * Measure the byte and token shape of a request/response pair. Tokenization + * is performed once per payload using the lazy-loaded cl100k_base encoder. + * Bytes are measured via TextEncoder (UTF-8 byte length, the wire size). + */ +export async function measurePayloadShape( + requestText: string, + responseText: string, +): Promise { + const encoder = new TextEncoder(); + const bytes_in = requestText ? encoder.encode(requestText).length : 0; + const bytes_out = responseText ? encoder.encode(responseText).length : 0; + + const [tIn, tOut] = await Promise.all([ + countTokensSafe(requestText), + countTokensSafe(responseText), + ]); + + return { + bytes_in, + bytes_out, + tokens_in: tIn ?? 0, + tokens_out: tOut ?? 0, + }; +} diff --git a/workers/test/telemetry-integration.test.mjs b/workers/test/telemetry-integration.test.mjs new file mode 100644 index 0000000..03a9155 --- /dev/null +++ b/workers/test/telemetry-integration.test.mjs @@ -0,0 +1,303 @@ +#!/usr/bin/env node +/** + * Integration test for the telemetry write path. + * + * Mocks env.ODDKIT_TELEMETRY with a writeDataPoint capture, then exercises + * recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads. + * + * Verifies end-to-end: + * - The full PayloadShape lands in doubles 3-6 + * - bytes_in/out match TextEncoder UTF-8 byte length on the actual payloads + * - tokens_in/out are positive integers when payloads are non-empty + * - Batch JSON-RPC produces one data point per message + * - SSE simulation (responseText="") records zeros for the response side + * - Tool-call payloads correctly populate blob3 (tool_name) + * - The blob array is exactly 9 entries and the doubles array is exactly 6 + * + * This is the verification that wrangler dev would have done — same code + * path, same schema, real tokenizer. + */ + +import assert from "node:assert/strict"; +import { spawnSync } from "node:child_process"; +import { mkdtempSync, writeFileSync, symlinkSync, existsSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join, dirname } from "node:path"; +import { fileURLToPath } from "node:url"; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const WORKERS_ROOT = join(__dirname, ".."); + +// Compile both telemetry.ts and tokenize.ts to a temp dir so we can import them +const tmp = mkdtempSync(join(tmpdir(), "oddkit-telemetry-int-")); +const tsconfig = { + compilerOptions: { + target: "ES2022", + module: "ES2022", + moduleResolution: "bundler", + lib: ["ES2022", "DOM"], + types: ["@cloudflare/workers-types"], + noEmitOnError: false, + strict: false, + skipLibCheck: true, + resolveJsonModule: true, + allowSyntheticDefaultImports: true, + esModuleInterop: true, + rootDir: join(WORKERS_ROOT, "src"), + outDir: join(tmp, "build"), + }, + include: [ + join(WORKERS_ROOT, "src", "tokenize.ts"), + join(WORKERS_ROOT, "src", "telemetry.ts"), + ], +}; +const tsconfigPath = join(tmp, "tsconfig.json"); +writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2)); + +const tmpNodeModules = join(tmp, "node_modules"); +if (!existsSync(tmpNodeModules)) { + symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules); +} + +// telemetry.ts imports `../package.json` — symlink that too +if (!existsSync(join(tmp, "package.json"))) { + symlinkSync(join(WORKERS_ROOT, "package.json"), join(tmp, "package.json")); +} + +const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], { + encoding: "utf8", +}); + +// With noEmitOnError: false, tsc may exit non-zero on type errors elsewhere +// in the dep graph (zip-baseline-fetcher.ts has some workers-types friction) +// while still producing the .js files we need. Only bail if the files we +// actually need weren't emitted. +const tokenizeJs = join(tmp, "build", "tokenize.js"); +const telemetryJs = join(tmp, "build", "telemetry.js"); +if (!existsSync(tokenizeJs) || !existsSync(telemetryJs)) { + console.error("TypeScript compile failed (target files not emitted):"); + console.error(compile.stdout); + console.error(compile.stderr); + process.exit(1); +} +if (compile.status !== 0 && process.env.DEBUG) { + console.error("Note: tsc reported errors but target .js files were emitted:"); + console.error(compile.stdout); +} + +// Newer Node requires `with { type: "json" }` on JSON imports in ESM. +// TypeScript doesn't add this — patch it in. +const { readFileSync, writeFileSync: wf } = await import("node:fs"); +let telemetrySrc = readFileSync(telemetryJs, "utf8"); +telemetrySrc = telemetrySrc.replace( + /from ["']\.\.\/package\.json["'];/g, + 'from "../package.json" with { type: "json" };', +); +wf(telemetryJs, telemetrySrc); + +const { measurePayloadShape } = await import(tokenizeJs); +const { recordTelemetry } = await import(telemetryJs); + +// ─── Mock env with writeDataPoint capture ────────────────────────────────── + +class MockAnalyticsEngine { + constructor() { + this.writes = []; + } + writeDataPoint(point) { + this.writes.push(point); + } +} + +function mockEnv() { + return { + ODDKIT_TELEMETRY: new MockAnalyticsEngine(), + DEFAULT_KNOWLEDGE_BASE_URL: "https://raw.githubusercontent.com/klappy/klappy.dev/main", + ODDKIT_VERSION: "0.23.1-test", + }; +} + +function mockRequest(consumerLabel = "integration-test") { + return new Request(`https://oddkit.klappy.dev/mcp?consumer=${consumerLabel}`, { + method: "POST", + headers: { "Content-Type": "application/json" }, + }); +} + +let pass = 0; +let fail = 0; + +async function test(name, fn) { + try { + await fn(); + console.log(` \u2713 ${name}`); + pass++; + } catch (err) { + console.log(` \u2717 ${name}`); + console.log(` ${err.message}`); + if (err.stack && process.env.DEBUG) console.log(err.stack); + fail++; + } +} + +console.log("telemetry integration tests (full write path)\n"); + +// ─── Test 1: oddkit_time tool call ───────────────────────────────────────── + +await test("oddkit_time tool call lands a complete telemetry record", async () => { + const env = mockEnv(); + const requestBody = JSON.stringify({ + jsonrpc: "2.0", + id: 1, + method: "tools/call", + params: { name: "oddkit_time", arguments: {} }, + }); + const responseBody = JSON.stringify({ + jsonrpc: "2.0", + id: 1, + result: { + content: [ + { type: "text", text: "Current UTC time: 2026-04-23T19:30:00.000Z" }, + ], + }, + }); + + const shape = await measurePayloadShape(requestBody, responseBody); + recordTelemetry(mockRequest(), requestBody, env, 42, "memory", shape); + + assert.equal(env.ODDKIT_TELEMETRY.writes.length, 1, "should write 1 data point"); + const point = env.ODDKIT_TELEMETRY.writes[0]; + + // Schema shape + assert.equal(point.blobs.length, 9, `blobs should be 9, got ${point.blobs.length}`); + assert.equal(point.doubles.length, 6, `doubles should be 6, got ${point.doubles.length}`); + assert.equal(point.indexes.length, 1, "indexes should be 1"); + + // Blobs + assert.equal(point.blobs[0], "tool_call", "blob1 = event_type"); + assert.equal(point.blobs[1], "tools/call", "blob2 = method"); + assert.equal(point.blobs[2], "oddkit_time", "blob3 = tool_name"); + assert.equal(point.blobs[3], "integration-test", "blob4 = consumer_label"); + assert.equal(point.blobs[4], "query-param", "blob5 = consumer_source"); + assert.equal(point.blobs[7], "0.23.1-test", "blob8 = worker_version"); + assert.equal(point.blobs[8], "memory", "blob9 = cache_tier"); + + // Doubles + assert.equal(point.doubles[0], 1, "double1 = count"); + assert.equal(point.doubles[1], 42, "double2 = duration_ms"); + assert.equal(point.doubles[2], shape.bytes_in, "double3 = bytes_in"); + assert.equal(point.doubles[3], shape.bytes_out, "double4 = bytes_out"); + assert.equal(point.doubles[4], shape.tokens_in, "double5 = tokens_in"); + assert.equal(point.doubles[5], shape.tokens_out, "double6 = tokens_out"); + + console.log(` bytes_in=${shape.bytes_in} bytes_out=${shape.bytes_out} ` + + `tokens_in=${shape.tokens_in} tokens_out=${shape.tokens_out}`); +}); + +// ─── Test 2: oddkit_search with realistic large response ─────────────────── + +await test("oddkit_search with realistic ~8KB response — measurements are sane", async () => { + const env = mockEnv(); + const requestBody = JSON.stringify({ + jsonrpc: "2.0", + id: 2, + method: "tools/call", + params: { name: "oddkit", arguments: { action: "search", input: "telemetry tokens payload" } }, + }); + const snippet = "Telemetry exists to make decisions informed instead of blind. " + + "Not to profile users, not to feed a roadmap. "; + const responseBody = JSON.stringify({ + jsonrpc: "2.0", + id: 2, + result: { + content: [{ type: "text", text: snippet.repeat(80) }], + }, + }); + + const shape = await measurePayloadShape(requestBody, responseBody); + recordTelemetry(mockRequest("realistic-test"), requestBody, env, 215, "r2", shape); + + const point = env.ODDKIT_TELEMETRY.writes[0]; + assert.equal(point.blobs[2], "oddkit", "tool_name = oddkit (router)"); + + // Realistic-sized response should be measurable + assert.ok(shape.bytes_out > 5000, `bytes_out should be > 5000, got ${shape.bytes_out}`); + assert.ok(shape.tokens_out > 1000, `tokens_out should be > 1000, got ${shape.tokens_out}`); + + console.log(` bytes_out=${shape.bytes_out} (~${(shape.bytes_out/1024).toFixed(1)}KB) ` + + `tokens_out=${shape.tokens_out}`); +}); + +// ─── Test 3: SSE response (empty body) records zeros ─────────────────────── + +await test("SSE response (empty body) records bytes_out=0 and tokens_out=0", async () => { + const env = mockEnv(); + const requestBody = JSON.stringify({ + jsonrpc: "2.0", + id: 3, + method: "tools/call", + params: { name: "oddkit_orient", arguments: { input: "exploring telemetry" } }, + }); + // Simulating the call site path where Content-Type was not application/json + const shape = await measurePayloadShape(requestBody, ""); + recordTelemetry(mockRequest(), requestBody, env, 50, "memory", shape); + + const point = env.ODDKIT_TELEMETRY.writes[0]; + assert.equal(point.doubles[3], 0, "bytes_out should be 0 for empty response"); + assert.equal(point.doubles[5], 0, "tokens_out should be 0 for empty response"); + assert.ok(point.doubles[2] > 0, "bytes_in should still be > 0"); +}); + +// ─── Test 4: Batch JSON-RPC writes one point per message ─────────────────── + +await test("batch JSON-RPC produces one data point per message", async () => { + const env = mockEnv(); + const batch = [ + { jsonrpc: "2.0", id: 1, method: "tools/call", params: { name: "oddkit_time", arguments: {} } }, + { jsonrpc: "2.0", id: 2, method: "tools/call", params: { name: "oddkit_orient", arguments: { input: "x" } } }, + { jsonrpc: "2.0", id: 3, method: "tools/list" }, + ]; + const requestBody = JSON.stringify(batch); + const responseBody = JSON.stringify(batch.map(m => ({ jsonrpc: "2.0", id: m.id, result: { ok: true } }))); + + const shape = await measurePayloadShape(requestBody, responseBody); + recordTelemetry(mockRequest(), requestBody, env, 30, "cache", shape); + + assert.equal(env.ODDKIT_TELEMETRY.writes.length, 3, `should write 3 data points, got ${env.ODDKIT_TELEMETRY.writes.length}`); + assert.equal(env.ODDKIT_TELEMETRY.writes[0].blobs[2], "oddkit_time"); + assert.equal(env.ODDKIT_TELEMETRY.writes[1].blobs[2], "oddkit_orient"); + assert.equal(env.ODDKIT_TELEMETRY.writes[2].blobs[1], "tools/list"); + assert.equal(env.ODDKIT_TELEMETRY.writes[2].blobs[2], "", "tools/list has no tool_name"); + + // All 3 messages get the same payload-shape attribution (per-request, not per-message) + for (const w of env.ODDKIT_TELEMETRY.writes) { + assert.equal(w.doubles[2], shape.bytes_in); + assert.equal(w.doubles[3], shape.bytes_out); + } +}); + +// ─── Test 5: Malformed JSON-RPC gets dropped silently ────────────────────── + +await test("malformed JSON-RPC is silently dropped (telemetry never throws)", async () => { + const env = mockEnv(); + // Pass garbage as the "body" — recordTelemetry should swallow the parse error + const requestBody = "not valid json {{{"; + const shape = await measurePayloadShape(requestBody, "ok"); + + // Should not throw + recordTelemetry(mockRequest(), requestBody, env, 10, "none", shape); + assert.equal(env.ODDKIT_TELEMETRY.writes.length, 0, "should not write anything for malformed input"); +}); + +// ─── Test 6: No env.ODDKIT_TELEMETRY → graceful no-op ────────────────────── + +await test("missing env.ODDKIT_TELEMETRY is a graceful no-op", async () => { + const env = {}; // no ODDKIT_TELEMETRY + const requestBody = JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/list" }); + const shape = await measurePayloadShape(requestBody, "{}"); + // Should not throw + recordTelemetry(mockRequest(), requestBody, env, 5, "memory", shape); +}); + +console.log(`\n${pass} passed, ${fail} failed`); +process.exit(fail > 0 ? 1 : 0); diff --git a/workers/test/tokenize.test.mjs b/workers/test/tokenize.test.mjs new file mode 100644 index 0000000..4186084 --- /dev/null +++ b/workers/test/tokenize.test.mjs @@ -0,0 +1,128 @@ +#!/usr/bin/env node +/** + * Unit test for workers/src/tokenize.ts. + * + * Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports the + * compiled .js. The compile step exercises the same TypeScript surface + * that ships in the worker bundle. + */ + +import assert from "node:assert/strict"; +import { spawnSync } from "node:child_process"; +import { mkdtempSync, writeFileSync, symlinkSync, existsSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join, dirname } from "node:path"; +import { fileURLToPath } from "node:url"; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const WORKERS_ROOT = join(__dirname, ".."); +const TOKENIZE_TS = join(WORKERS_ROOT, "src", "tokenize.ts"); + +const tmp = mkdtempSync(join(tmpdir(), "oddkit-tokenize-test-")); +const tsconfig = { + compilerOptions: { + target: "ES2022", + module: "ES2022", + moduleResolution: "bundler", + lib: ["ES2022", "DOM"], + types: [], + strict: false, + skipLibCheck: true, + resolveJsonModule: true, + allowSyntheticDefaultImports: true, + esModuleInterop: true, + rootDir: join(WORKERS_ROOT, "src"), + outDir: tmp, + }, + include: [TOKENIZE_TS], +}; +const tsconfigPath = join(tmp, "tsconfig.json"); +writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2)); + +const tmpNodeModules = join(tmp, "node_modules"); +if (!existsSync(tmpNodeModules)) { + symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules); +} + +const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], { + encoding: "utf8", +}); +if (compile.status !== 0) { + console.error("TypeScript compile failed:"); + console.error(compile.stdout); + console.error(compile.stderr); + process.exit(1); +} + +const compiledPath = join(tmp, "tokenize.js"); +const { countTokensSafe, measurePayloadShape } = await import(compiledPath); + +let pass = 0; +let fail = 0; + +async function test(name, fn) { + try { + await fn(); + console.log(` \u2713 ${name}`); + pass++; + } catch (err) { + console.log(` \u2717 ${name}`); + console.log(` ${err.message}`); + fail++; + } +} + +console.log("tokenize.ts unit tests"); + +await test("countTokensSafe returns 0 for empty string", async () => { + const n = await countTokensSafe(""); + assert.equal(n, 0); +}); + +await test("countTokensSafe returns a positive integer for normal text", async () => { + const n = await countTokensSafe("hello world this is a test"); + assert.equal(typeof n, "number"); + assert.ok(n > 0, `expected > 0, got ${n}`); + assert.equal(n, Math.floor(n), "must be an integer"); +}); + +await test("countTokensSafe scales with text length", async () => { + const small = await countTokensSafe("hello world"); + const big = await countTokensSafe("hello world ".repeat(100)); + assert.ok(big > small * 50, `big (${big}) should be much larger than small (${small})`); +}); + +await test("measurePayloadShape returns all required fields as numbers", async () => { + const s = await measurePayloadShape("request", "response"); + for (const field of ["bytes_in", "bytes_out", "tokens_in", "tokens_out"]) { + assert.ok(field in s, `missing field: ${field}`); + assert.equal(typeof s[field], "number", `${field} must be number, got ${typeof s[field]}`); + } +}); + +await test("measurePayloadShape bytes match UTF-8 byte length", async () => { + const req = "hello"; // 5 bytes + const res = "caf\u00e9"; // 4 chars, 5 UTF-8 bytes (\u00e9 = 2 bytes) + const s = await measurePayloadShape(req, res); + assert.equal(s.bytes_in, 5, `bytes_in: expected 5, got ${s.bytes_in}`); + assert.equal(s.bytes_out, 5, `bytes_out: expected 5, got ${s.bytes_out}`); +}); + +await test("measurePayloadShape produces positive token counts for non-empty input", async () => { + const s = await measurePayloadShape( + JSON.stringify({ jsonrpc: "2.0", method: "tools/call", id: 1 }), + JSON.stringify({ jsonrpc: "2.0", id: 1, result: { ok: true } }), + ); + assert.ok(s.tokens_in > 0, "tokens_in should be > 0"); + assert.ok(s.tokens_out > 0, "tokens_out should be > 0"); +}); + +await test("measurePayloadShape handles empty response (SSE skipped)", async () => { + const s = await measurePayloadShape("hello", ""); + assert.equal(s.bytes_out, 0); + assert.equal(s.tokens_out, 0); + assert.ok(s.bytes_in > 0); +}); + +console.log(`\n${pass} passed, ${fail} failed`); +process.exit(fail > 0 ? 1 : 0);