feat(challenge): governance-driven runChallengeAction (E0008) by klappy · Pull Request #100 · klappy/oddkit

klappy · 2026-04-17T05:33:58Z

feat(challenge): governance-driven runChallengeAction (E0008)

Sequence

PR feat: governance-driven encode architecture #96 (merged, bef6c83) — governance-driven encode architecture (precedent)
PR promote: main → prod (E0008 encode architecture + context handling) #99 (merged, 2026-04-17) — 11 governance articles for challenge landed in klappy.dev canon
This PR — code refactor: runChallengeAction now reads those articles at runtime

What Changed

New functions in workers/src/orchestrate.ts:

Function	Purpose
`extractKeywordsFromCheck`	Extracts quoted strings from prerequisite check descriptions
`extractPrereqTable`	Shared helper — parses `## Prerequisite Overlays` table from any article
`discoverChallengeTypes`	Per-canonUrl cached discovery of `challenge-type`-tagged articles
`fetchBasePrerequisites`	Reads `odd/challenge/base-prerequisites.md` prerequisite table
`fetchNormativeVocabulary`	Reads `odd/challenge/normative-vocabulary.md` RFC 2119 + arch terms
`fetchStakesCalibration`	Reads `odd/challenge/stakes-calibration.md` mode→depth table

Modified:

runChallengeAction — body fully replaced with governance-driven implementation (multi-match, calibration-filtered output, voice-dump suppression invariant)
runCleanupStorage — extended to clear all four new caches

New types/caches:

ChallengeTypeDef, PrereqOverlay, NormativeVocabulary, StakesCalibration interfaces
8 new cache variables (4 data caches + 4 canonUrl guards), mirroring the encode pattern

Voice-Dump Suppression Invariant

This is load-bearing. When calibration.questionTiers.length === 0 (e.g., a voice-dump mode row in the stakes-calibration table sets tiers to "none"), runChallengeAction returns status: "SUPPRESSED" immediately — before aggregating types, testing prerequisites, or surfacing reframings.

Some modes exist for raw thought capture. Pressure-testing at that stage damages the mode's function. The suppression is not advisory and does not surface a "reduced" set.

if (calibration.questionTiers.length === 0) {
  // short-circuit — returns SUPPRESSED, no further processing
}

Cache Invalidation

runCleanupStorage now clears all four new caches:

cachedChallengeTypes = null;
cachedChallengeTypesCanonUrl = undefined;
cachedBasePrerequisites = null;
cachedBasePrerequisitesCanonUrl = undefined;
cachedNormativeVocabulary = null;
cachedNormativeVocabularyCanonUrl = undefined;
cachedStakesCalibration = null;
cachedStakesCalibrationCanonUrl = undefined;

Graceful Degradation

All four fetch functions degrade gracefully if governance articles are missing or malformed:

fetchBasePrerequisites → empty array
fetchNormativeVocabulary → minimal built-in set (MUST/MUST NOT/SHOULD/SHOULD NOT)
fetchStakesCalibration → "surface everything" fallback for exploration/planning/execution
discoverChallengeTypes → empty array (challenge then produces empty output until articles are authored)

Backward Compatibility

Response envelope gains matched_types, mode_used, and governance fields. The legacy claim_type field is preserved as a backward-compat alias returning the first matched type slug.

Build & Type-Check Evidence

cd workers && npx tsc --noEmit
EXIT:0

TypeScript compilation clean. No errors. Full output in evidence doc.

Root-level npm test has a pre-existing failure (missing commander dep for src/cli.js) unrelated to this change. The workers package has no test script; correctness validated via tsc + preview deploy.

Evidence doc: docs/oddkit/evidence/challenge-governance-code-refactor.md

Open Risks

Detection-pattern overlap noise — if multiple challenge-type articles share trigger words, matchedTypes.length > 1 may fire frequently in practice. Design handles this correctly via aggregation; governance authors manage scope via specific trigger words.
Descriptive-only prerequisite checks skipped — prereqs with no quoted keywords (prose-only check descriptions) cannot be mechanically tested and are silently skipped. By design per spec.
claim_type alias string-value drift — old detectClaimType returned "strong_claim", "proposal", "assumption", "observation". New claim_type returns the governance article slug (e.g., "strong-claim" with a hyphen). Callers doing string equality checks on this field will need to update.
Cold-start regex compilation — discoverChallengeTypes compiles regexes for all challenge types on first call. Mitigated by per-canonUrl caching across requests within the same isolate.

Note

Medium Risk
Replaces core oddkit_challenge worker behavior and response shape based on runtime-parsed governance markdown, so parsing drift or heuristic prerequisite evaluation could change outputs silently. BM25 stop-word overrides and new caches add moderate operational risk but are scoped to the challenge path.

Overview
oddkit_challenge in workers/src/orchestrate.ts is refactored from hardcoded claim-type logic into a governance-driven runtime that discovers challenge-type articles, base prerequisites, normative vocabulary, and stakes calibration from canon and applies them per-request (with per-canonUrl caching, multi-match aggregation, fallback type resolution, and voice-dump suppression).

Challenge-type detection pivots to BM25 + stemming over per-type detection text; workers/src/bm25.ts is extended (backward-compatibly) to support custom stop-word sets stored on the index so modal verbs/negation can remain signal. Tension detection is switched to governance-defined vocabulary regexes, and the action response now includes mode + matched-type governance details (plus block_until_addressed) while keeping claim_type as a compatibility alias.

Adds a new workers/test/governance-parser.test.mjs parser-fidelity test against live governance articles, and updates runCleanupStorage to clear the new in-memory caches; also adds evidence/journal/ledger documentation for the refactor.

^{Reviewed by Cursor Bugbot for commit 80416ac. Bugbot is set up for automated code reviews on this repo. Configure here.}

Mirrors the PR #96 encode pattern. Extracts challenge behavior from live governance articles (landed in klappy.dev canon via PR #99) rather than hardcoded source logic. New functions in workers/src/orchestrate.ts: - discoverChallengeTypes — per-canonUrl cached type discovery - fetchBasePrerequisites — universal prerequisite checks - fetchNormativeVocabulary — RFC 2119 + architectural load-bearing terms - fetchStakesCalibration — mode-to-depth filter - extractPrereqTable / extractKeywordsFromCheck — shared helpers Refactored: - runChallengeAction — replaces hardcoded detectClaimType / generateChallenges / findTensions / findMissingPrerequisites with governance extraction. Supports multi-match. Filters output by stakes calibration based on mode parameter. - runCleanupStorage — clears all four new caches on invalidation Invariant: voice-dump mode suppresses all challenge output regardless of matched types. Load-bearing per stakes-calibration governance — some modes exist for raw capture and pressure-testing at that stage damages the mode. Graceful degradation: missing governance articles fall back to minimal built-in behavior with warnings, rather than failing. Co-authored-by: Claude <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-04-17T05:34:03Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	oddkit	`80416ac`	Commit Preview URL Branch Preview URL	Apr 17 2026, 03:09 PM

…ength

…sponse

Refactor runChallengeAction in workers/src/orchestrate.ts to extract challenge-type behavior from canon governance articles at runtime rather than hardcoding claim-type detection, questions, prerequisites, and tension rules in source. Structural mirror of PR #96 (encode). Detection upgraded mid-implementation from regex-OR to BM25 + stemming after the gauntlet revealed that regex-based matching was morphologically brittle ("coin" doesn't match trigger "coining"). The pivot removed an entire class of bug and seeded a reusable pattern for future governance-driven tools. Changes in workers/src/orchestrate.ts: - New: ChallengeTypeDef, BasePrerequisite, NormativeVocabulary, StakesModeConfig, StakesCalibration - New: discoverChallengeTypes (builds per-canonUrl BM25 index over detection text), fetchBasePrerequisites, fetchNormativeVocabulary, fetchStakesCalibration — each with per-canonUrl cache and graceful degradation on missing articles - New: evaluatePrerequisiteCheck — interprets natural-language check strings from prerequisite overlay tables - Refactored runChallengeAction: multi-match via BM25 score > 0, base + overlay prerequisite aggregation, stakes calibration filtering, voice-dump suppression invariant, governance-driven tension detection - Extended runCleanupStorage with five new cache clears (types, type-index, base prerequisites, vocabulary, calibration) - Removed dead detectClaimType (legacy src/tasks/challenge.js retains its copy for CLI backward-compat) - Added CHALLENGE_STOP_WORDS set preserving modal verbs as signal Changes in workers/src/bm25.ts (backward-compatible extension): - tokenize(), buildBM25Index() accept optional stopWords: Set<string> - BM25Index gains optional stopWords field so searchBM25 tokenizes queries consistently with the index - Default behavior unchanged — existing callers unaffected - Motivation: default STOP_WORDS filters modals (must, should, shall, may, not) which are signal for challenge-type detection New tests: workers/test/governance-parser.test.mjs — 94 assertions against live governance articles fetched from klappy.dev raw. Covers type parsing, fallback resolution, BM25 detection, stemming regression cases (coin/coining, propose/proposed, principle/principles), multi- match, and the voice-dump suppression invariant. 94/94 pass. Bugs the gauntlet caught on this PR: 1. Voice-dump suppression invariant would have shipped broken — the calibration cell reads "none (suppress all challenge)" not bare "none". Strict-equality parser would have produced a single-element array, voice-dump mode would have surfaced all challenges in prod. 2. Morphological brittleness in regex detection (coin vs coining) — triggered the pivot to BM25 + stemming. 3. Default BM25 STOP_WORDS silently breaks strong-claim and proposal detection by filtering modal verbs. Fixed via custom stop word set. Verification: - npm run typecheck: clean - tests/smoke.sh: 6/6 pass (legacy CLI path — backward compat preserved) - workers/test/governance-parser.test.mjs: 94/94 pass - AI voice clichés audit on new comments: clean - oddkit_preflight, challenge, gate, validate: all run; gate NOT_READY due to same hardcoded-logic gap as challenge pre-refactor (flagged as follow-up) Response shape change: adds mode, matched_types, type_definitions, block_until_addressed; removes claim_type. Consumed programmatically, not rendered. Follow-ups flagged: - Encode parity PR — same regex-OR brittleness in runEncodeAction; pattern proven here, port will be near-mechanical - klappy.dev meta governance PR — "compiles into a case-insensitive word-boundary regex" is now stale language - Gate refactor candidate — same hardcoded-logic shape as challenge pre-refactor Refs: - Depends on: klappy/klappy.dev#99 (governance articles this code reads) - Structural mirror: #96 (governance-driven encode) - Evidence: docs/oddkit/evidence/challenge-governance-code-refactor.md

Re-applies the four review fixes from sibling commits (31f8134, e9ef2f9, 84932f0) and the dead-code removal that the bugbot review also flagged, on top of the BM25 + stemming detection swap. - Vocabulary regex sorted by length descending so 'MUST NOT' matches before 'MUST' (closes bugbot 'Regex alternation order') - Stakes calibration mode column lowercased at parse time AND mode normalized to lowercase at lookup time (closes bugbot 'Mode column not lowercased breaks voice-dump suppression') - first_1 reframings policy now surfaces a single reframing total across all matched types, not one per type (closes bugbot 'first_1 reframings surfaces multiple instead of one') - Detection runs BEFORE voice-dump suppression check, and SUPPRESSED response includes the governance field for shape parity with CHALLENGED (closes bugbot 'SUPPRESSED response missing governance') - Renames type_definitions to governance in CHALLENGED response so both statuses return the same shape under the same key - Dead detectClaimType already removed by the BM25 commit (closes bugbot 'Dead code: detectClaimType has zero callers') Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs: 94/94 pass - tests/smoke.sh: 6/6 pass

…ctor evidence Captures the fork-resolution and bugbot-review-driven fixes as a sixth layer of catch alongside the gauntlet bugs. Records the lesson: read PR review comments before treating divergent remote as unknown work.

klappy · 2026-04-17T07:13:57Z

Combine update — BM25 + stemming pivot ported on top of the regex version

Three new commits on top of 84932f0:

726e5ed feat(workers): governance-driven oddkit_challenge with BM25 + stemming — replaces regex-OR detection with BM25 + stemming over per-type detection text (trigger words + blockquote). Mid-implementation pivot triggered by a morphological-brittleness signal in the gauntlet (coining matched, coin didn't). Adds workers/test/governance-parser.test.mjs (94 assertions against live klappy.dev governance articles).
e82164b fix(challenge): port bugbot review fixes onto BM25 detection layer — re-applies the four polish fixes from 31f8134, e9ef2f9, 84932f0 onto the BM25 base. They touch unrelated regions so the port was mechanical.
fd14a60 docs(evidence): add bugbot review + combine section to challenge refactor evidence

All five bugbot items closed

Severity	Item	Closed by
High	Mode column not lowercased breaks voice-dump suppression	`e82164b` (also lowercased at lookup site)
Medium	Regex alternation order breaks multi-word directive matching	`e82164b` (sort by length descending)
Medium	`first_1` reframings surfaces multiple instead of one	`e82164b` (slice from aggregated list)
Medium	SUPPRESSED response missing `governance` field	`e82164b` (detection runs before suppression; both responses share `governance` key)
Low	Dead code: `detectClaimType` has zero callers	`726e5ed` (removed by BM25 commit)

What BM25 + stemming buys

Morphological resilience. coin ~ coining ~ coined ~ coinage all stem to the same root. The brittleness that made bugbot's first review meaningful is gone.
Phrase boost for free. bm25.ts already has exact-phrase and bigram boosts; multi-word triggers ("the principle is", "forcing function") get bonus scores on top of the BM25 base.
Resilience to insertions. "I'm trying to coin a brand new term" now matches pattern-coinage even though no exact regex would have caught it.
Score-based confidence signal available. matched_types is currently a list of slugs; trivial to upgrade to [{slug, score}] if a downstream consumer wants relative confidence.
Reusable pattern. Encode follow-up PR will be near-mechanical now; future gate refactor inherits the same approach.

bm25.ts extension (backward-compatible)

Added optional stopWords: Set<string> to tokenize, buildBM25Index, BM25Index interface, and searchBM25. Default behavior unchanged — existing callers (including oddkit_search) get the same filtering they always did. Challenge-type detection passes a custom CHALLENGE_STOP_WORDS set that preserves modal verbs (must, should, shall, may, not) since modals ARE the signal for strong-claim and proposal types. Discovered as bug #5 in the gauntlet — default STOP_WORDS would have silently broken those two type detections.

Verification

npm run typecheck — clean
node workers/test/governance-parser.test.mjs — 94/94 pass (covers parser fidelity, BM25 detection, stemming regression cases including coin/coining, propose/proposed, principle/principles, multi-match, voice-dump suppression invariant, and irrelevant-input zero-match)
bash tests/smoke.sh — 6/6 pass (legacy CLI path — backward compat preserved)
Cloudflare Workers preview deploys automatically on push

Bugs caught across this PR — six total

Voice-dump suppression invariant nearly shipped broken (parser strict-equality on "none" missed "none (suppress all challenge)")
Morphological brittleness in regex detection (the pivot trigger)
Default BM25 STOP_WORDS filters modal verbs that ARE signal
4–8. Bugbot's five review items (4 fixed by polish commits, 1 by BM25 commit's dead-code removal)

Follow-ups flagged

Encode parity PR — same regex-OR brittleness in runEncodeAction; pattern proven here, port will be near-mechanical
klappy.dev meta governance — how-to-write-challenge-types.md says "compiles into a case-insensitive word-boundary regex" which is now stale; small PR to update
Gate refactor candidate — same hardcoded-logic shape as challenge pre-refactor

…atch

…e cross-contamination

cursor · 2026-04-17T07:46:06Z

+          const dtype =
+            vocab.directiveTypes.get(phrase) ||
+            vocab.directiveTypes.get(phrase.toLowerCase()) ||
+            "load-bearing-claim";


Case-insensitive directive lookup misses non-lowercase map keys

Low Severity

The directiveTypes map stores keys in their original case from the article (cols[0]). The case-insensitive regex match captures text in the excerpt's case. The lookup tries phrase (excerpt case) then phrase.toLowerCase(), but this only succeeds if the article stored the term in lowercase. If the article uses title case (e.g., "Vodka Architecture"), and the excerpt has a different case, both lookups miss and silently fall through to the "load-bearing-claim" default, producing an incorrect tension type.

Additional Locations (1)

workers/src/orchestrate.ts#L625-L628

^{Reviewed by Cursor Bugbot for commit 6358d65. Configure here.}

Caught in PR #100 review by Klappy: the CHALLENGE_STOP_WORDS Set added mid-PR to fix a BM25 over-match was itself a Vodka Architecture violation in a refactor explicitly about removing such violations. The constant carried a domain opinion ('modals are signal, articles are filler in challenge detection') that belonged in canon, not in worker source. Anti-pattern fixed: - Drop the hardcoded CHALLENGE_STOP_WORDS Set from workers/src/orchestrate.ts - Drop the duplicate hardcoded copy from workers/test/governance-parser.test.mjs - Extend NormativeVocabulary interface with stopWords: Set<string> - Extend fetchNormativeVocabulary to extract '## Detection Noise' code block from odd/challenge/normative-vocabulary.md (lands in klappy.dev#100) - Move BM25 index build out of discoverChallengeTypes into a new lazy builder getOrBuildChallengeTypeIndex(types, vocab, canonUrl) so the index can use governance-sourced stop words rather than a constant - Update parser test to fetch Detection Noise the same way the worker does — no hardcoded duplicate, no drift risk. Test gains 3 new assertions: Detection Noise parses non-empty, excludes modal verbs, includes common filler Net hardcoded-constants delta: this PR removes ~6 classes of hardcoded domain opinion (claim type detection, questions, prereqs, tension regex, reframings, stop words) and adds zero. The remaining minimal RFC 2119 fallback ('MUST', 'MUST NOT', 'SHOULD', 'SHOULD NOT') and 'planning' default mode are server-availability fallbacks for when canon is unreachable, not domain governance. Test currently runs against the feature branch via KLAPPYDEV_RAW env override. After klappy.dev#100 merges, the override comes off and the test reads from main with no further changes. Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs (vs feature branch): 97/97 pass - tests/smoke.sh: 6/6 pass - grep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches Refs: - Caught in: this PR review by Klappy - Depends on: klappy/klappy.dev#100 (Detection Noise section) - Lesson: 'is this the right architectural shape' is a category the current gauntlet does not catch — the tools verify governance content, not whether new code is creating new ungoverned content. Possible future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in worker source and asks 'should this be in canon?'

klappy · 2026-04-17T13:49:27Z

Anti-pattern fix — `CHALLENGE_STOP_WORDS` moved out of source into governance

Caught in review: the CHALLENGE_STOP_WORDS constant added mid-PR to fix a BM25 over-match was itself a Vodka Architecture violation in a refactor explicitly about removing such violations. The list carried a domain opinion ("modals are signal, articles are filler in challenge detection") that belongs in canon, not in worker source.

Companion klappy.dev PR

klappy/klappy.dev#100 adds a new ## Detection Noise section to odd/challenge/normative-vocabulary.md. The article scope broadened to "two surfaces, one governance article" — signal in canon quotes (existing) plus noise in user input (new).

oddkit changes (commit `ce41b39`)

Drop the hardcoded CHALLENGE_STOP_WORDS Set from workers/src/orchestrate.ts
Drop the duplicate hardcoded copy from workers/test/governance-parser.test.mjs
Extend NormativeVocabulary interface with stopWords: Set<string>
Extend fetchNormativeVocabulary to extract the ## Detection Noise code block
Move BM25 index build out of discoverChallengeTypes into a new lazy builder getOrBuildChallengeTypeIndex(types, vocab, canonUrl) so the index can use governance-sourced stop words
Update parser test to fetch Detection Noise the same way the worker does — no hardcoded duplicate, no drift risk. Adds 3 new assertions: section parses non-empty, excludes modal verbs, includes common filler

Net hardcoded-constants delta vs main

Constant	Before	After
`detectClaimType` regex literals	hardcoded	governance
Per-claim-type `challenges[]` arrays	hardcoded	governance
Per-claim-type `reframings[]` arrays	hardcoded	governance
Missing-prereq regex checks	hardcoded	governance
Tension regex (`MUST`, `MUST NOT`)	hardcoded	governance
`CHALLENGE_STOP_WORDS` Set	added mid-PR	governance
RFC 2119 fallback	already present	unchanged (server-availability fallback)
`"planning"` default mode	already present	unchanged (server-availability fallback)

Net: ~6 classes of hardcoded domain opinion deleted, 0 added.

Verification

npm run typecheck: clean
KLAPPYDEV_RAW=...feat/challenge-detection-noise-vocabulary node workers/test/governance-parser.test.mjs: 97/97 pass
tests/smoke.sh: 6/6 pass
grep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches

Test currently runs against the feature branch via KLAPPYDEV_RAW env override. After klappy.dev#100 merges, the override comes off and the test reads from main with no further changes.

Lesson recorded

"Is this the right architectural shape" is a category the current gauntlet doesn't catch. The tools verify governance content; they don't audit whether new code is creating new ungoverned content. Possible future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in worker source and asks "should this be in canon?"

Bug count for this PR — now seven across two review surfaces

Voice-dump suppression invariant nearly shipped broken (parser strict-equality)
Morphological brittleness in regex detection (BM25 pivot trigger)
Default BM25 STOP_WORDS filters modal verbs that are signal
Vocabulary regex alternation order (bugbot — High)
Mode column case mismatch breaks calibration (bugbot — High)
first_1 reframings counted per-type (bugbot — Medium)
SUPPRESSED response missing governance (bugbot — Medium)
Dead detectClaimType (bugbot — Low; closed by BM25 commit)
BM25 index cache canonUrl guard (bugbot — Medium; landed as 997a50d)
Question tier case mismatch (bugbot — Medium; landed as 94990b5)
Stop words hardcoded in source — Vodka violation (this fix; not flagged by gauntlet or bugbot, caught by you)

The discipline is load-bearing across multiple review surfaces. None is sufficient alone.

…arser - Use matchAll and prefer prohibition directive type over leftmost requirement match so excerpts like 'You MUST X and MUST NOT Y' surface the prohibition. Regex switched to global flag to support matchAll. - Port fetchStakesCalibration toLowerCase fix to the fidelity test parser so byMode keys stay lowercase even if governance introduces capitalized mode names.

…st pickStrongest Two fixes: 1. Table row parser (6 call sites) was using .split('|').map(trim).filter(c => c.length > 0) which also drops legitimately-empty interior cells, silently collapsing the column count. In fetchStakesCalibration this would silently drop a voice-dump row with an empty tiers cell, breaking the suppression invariant with no error signal. Introduce parseTableRow helper that only strips the leading/trailing empties produced by surrounding pipes, preserving empty middle cells. 2. Hoist pickStrongest (now pickStrongestDirective) out of the per-entry loop in runChallengeAction. It captures no loop-scoped state, so defining it inside the loop needlessly re-allocated the closure on each iteration and misled readers about its scope. Matches the placement of evaluatePrerequisiteCheck.

…mpty cells

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

^{Reviewed by Cursor Bugbot for commit 477a213. Configure here.}

…and dead branch Two issues from bugbot's 14:29 review: 1. Reframing 'none' check applies same defensive pattern as the tiersRaw fix in fetchStakesCalibration. The cell may be 'none' or 'none (parenthetical reason)' — strict equality would silently surface all reframings via the 'all' fallback when authors include explanatory text. Same defect class as bug #3 in the evidence note; sweep applied. 2. Remove unreachable questionTiers.length === 0 branch in the question- surfacing condition. The SUPPRESSED early-return at line 1635 already handles that case, so the branch was dead code that misleadingly suggested 'surface all questions for empty tiers' semantics — the actual semantic is full suppression. Verified: typecheck clean, parser test 97/97 against main, smoke 6/6. Defect-class sweep on governance cell strict-equality checks: only two sites (tiersRaw, surfacing), both now defensive.

Production preview test caught a real show-stopper: the MCP tool's mode parameter Zod schema accepted only exploration/planning/execution but the calibration governance article (odd/challenge/stakes-calibration) defines 9 modes. The 6 writing-lifecycle modes (voice-dump, drafting, peer-review-ready, canon-tier-2, canon-tier-1, published-essay) were unreachable from the public API — schema validation rejected them before runtime ever saw them. Net effect of the bug: the voice-dump suppression invariant — the load-bearing feature of PR #100, named in evidence as load-bearing — could not be exercised through the public MCP tool. Internal tests worked because they bypassed the schema. The CI 'Test CF Preview' job failed on cold-start timeout (red herring); the real failure was discovered by manual curl against the preview. Two sites in workers/src/index.ts: - Line 170: unified oddkit tool - Line 235: dedicated oddkit_challenge tool Both expanded to the full 9-mode enum. Description text updated to explain the two mode families and call out voice-dump suppression. Longer-term direction (not this commit): drop the enum entirely and let canon be the validator. The runtime already validates against the calibration table at fetchStakesCalibration time — having the schema also enforce vocabulary is the same Vodka anti-pattern shape that PR #100 fixed for stop words. Tracked as a follow-up. Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs: 97/97 against main - Manual preview curl with mode=voice-dump pending after this deploys Lesson: testing the running preview is not optional. PR #100 had typecheck pass, parser test 97/97, smoke 6/6, AND production preview deploy succeed — and still shipped a feature the schema blocked from being called. Three layers of verification missed it because none of them exercised the public API contract.

Captures DOLCHE for the session that delivered PR #100 (governance-driven challenge refactor with BM25 + stemming) and the unresolved schema bug that made the voice-dump suppression invariant unreachable from the public API. Critical for next model picking up: - PR #101 (prod promotion) is BLOCKED — schema fix not yet merged - fix/challenge-mode-schema-includes-writing-modes has the fix - After fix lands, manually curl preview with mode=voice-dump before promoting Also records lessons for the next session: defect-class sweep discipline, public API contract verification, parser test flakiness, and three follow-up refactors carrying the same anti-pattern as challenge pre-refactor (encode, gate, orient).

Catalogs the Vodka anti-pattern — canon defines vocabulary, code hardcodes interpretation — across all 11 oddkit tools. PR #100's voice-dump schema bug (1h 39m prod breakage) was one instance of this class. Audit identifies 5 tools carrying the same shape. Findings: SEVERE: orient, gate, validate PARTIAL: encode, preflight CROSS-CUTTING: mode enum declared in 4 places CLEAN: challenge (post-refactor), search, get, catalog, version, time, cleanup_storage, telemetry_public Refactor priority ranked by impact × tractability. validate named as the most surprising: it gates 'done' but never reads canon/constraints/definition-of-done.md — which in fact does not exist in the repo despite three user-facing docs claiming it does. Companion PR on klappy/klappy.dev establishes canon/constraints/core-governance-baseline.md as the contract every sweep refactor will conform to.

Replaces the hardcoded self_report_headers dictionary with a runtime parse of canon/constraints/telemetry-governance.md #### Self-Report Fields table. Response envelope now declares governance_source: 'canon' when the fetch succeeds and the table parses, 'minimal' when it falls back to the shipped baseline. This is the canary refactor for the governance anti-pattern sweep (docs/oddkit/audit/governance-anti-pattern-sweep-2026-04-17). It conforms to the three-tier resolution contract drafted in klappy/klappy.dev#101 (canon/constraints/core-governance-baseline), exercising tiers 1 (live canon) and 3 (minimal baseline in code). Tier 2 (bundled baseline directory with manifest) and the build-time schema check arrive in follow-up work once the contract graduates from status:draft to status:active. Implementation: - New helper parseSelfReportHeadersTable in index.ts parses the '### Self-Report Fields' table section from the canon doc. - Parser is permissive (whitespace + backticks) and fails closed to null so the caller falls back to the minimal baseline rather than hiding the degradation. - Minimal baseline remains the 8 stable headers; canon controls the descriptions once live. Verified: - npm run typecheck: clean - Parser unit-tested against live canon content: 8/8 headers parsed - Parser degradation paths (no section, empty table) return null Refactor discipline this commit follows (from PR #100 post-mortem): - Single feature PR, single site touched - Public contract (MCP tool response) changes are additive (governance_source field added; self_report_headers keys unchanged) - Preview smoke against live prod will verify canon-tier response before promotion

…me (#106) * feat(telemetry_policy): canary refactor — headers from canon at runtime Replaces the hardcoded self_report_headers dictionary with a runtime parse of canon/constraints/telemetry-governance.md #### Self-Report Fields table. Response envelope now declares governance_source: 'canon' when the fetch succeeds and the table parses, 'minimal' when it falls back to the shipped baseline. This is the canary refactor for the governance anti-pattern sweep (docs/oddkit/audit/governance-anti-pattern-sweep-2026-04-17). It conforms to the three-tier resolution contract drafted in klappy/klappy.dev#101 (canon/constraints/core-governance-baseline), exercising tiers 1 (live canon) and 3 (minimal baseline in code). Tier 2 (bundled baseline directory with manifest) and the build-time schema check arrive in follow-up work once the contract graduates from status:draft to status:active. Implementation: - New helper parseSelfReportHeadersTable in index.ts parses the '### Self-Report Fields' table section from the canon doc. - Parser is permissive (whitespace + backticks) and fails closed to null so the caller falls back to the minimal baseline rather than hiding the degradation. - Minimal baseline remains the 8 stable headers; canon controls the descriptions once live. Verified: - npm run typecheck: clean - Parser unit-tested against live canon content: 8/8 headers parsed - Parser degradation paths (no section, empty table) return null Refactor discipline this commit follows (from PR #100 post-mortem): - Single feature PR, single site touched - Public contract (MCP tool response) changes are additive (governance_source field added; self_report_headers keys unchanged) - Preview smoke against live prod will verify canon-tier response before promotion * refactor(workers): extract parseTableRow to shared markdown-utils * fix(telemetry_policy canary): read Description column + add parser tests Addresses execution-mode challenge gaps on PR #106: 1. Information regression fixed: parser now reads canon's column 4 (Description) instead of column 0 (Field label). Canon was extended with richer per-header descriptions in klappy/klappy.dev#102; this commit updates the parser to consume that new column. 2. Tests committed: Test 8 added to workers/test/governance-parser.test.mjs covering 8/8 header extraction, non-trivial description lengths, and degradation paths (no section, empty table). All 105 tests pass against the unmerged klappy.dev branch via KLAPPYDEV_RAW override. Still outstanding (follow-up work, not blocking the canary): - parseTableRow duplicated across workers/src/index.ts and workers/src/orchestrate.ts. Accepted duplication for now, flagged in both sites; export-and-share refactor lands when the sweep surfaces more duplication candidates. - Preview smoke against Cloudflare preview with the extended canon loaded but no worker redeploy — run manually after this PR deploys. Companion PR: klappy/klappy.dev#102 (canon extension). This worker change is backward-compatible with the old 3-column table: the parser requires 4 cols, so against the old canon it falls through to the minimal baseline tier. Once klappy.dev#102 merges, canon tier takes over. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>

…moke Addresses three gaps found in live validation of canary PR #106: 1. Response envelope was missing server_time, assistant_text, debug. Every other oddkit tool returns {action, result, server_time, assistant_text, debug}; telemetry_policy returned only {action, result}. This breaks the time-discipline contract — project instructions require every oddkit response to carry server_time so models have a clock reading on every call. 2. canon_url parameter was silently ignored. The Zod schema was {}, so MCP stripped canon_url before the handler saw it, and the handler hardcoded the default baseline. The three-tier resolution contract in canon/constraints/core-governance-baseline assumes every canon-driven tool accepts canon_url for overrides — this is load-bearing for TruthKit / custom-canon consumers. 3. No live-smoke test for the envelope shape. Parser tests in governance-parser.test.mjs exercised parser logic only. The canary shipped with partial contract conformance because no test invoked the MCP tool end-to-end and asserted the envelope shape. Changes: - Add canon_url to the tool's Zod schema; thread through to fetcher.getFile(path, canon_url). - Expand response envelope to match convention: server_time, assistant_text (human-readable summary naming the tier), debug with duration_ms and canon_url echo. - New workers/test/canon-tool-envelope.smoke.mjs — live smoke script that curls the MCP endpoint and verifies envelope shape for oddkit_time (convention baseline), telemetry_policy default (canon tier), and telemetry_policy with canon_url override (minimal fallback). Verified: - npm run typecheck: clean - Smoke script structure matches PR #100's governance-parser test style and exits non-zero on any envelope violation. Lesson for the sweep: every canon-driven refactor must verify both the new governance_source signal AND full envelope conformance. The canary's partial completion was caught by live validation but should have been caught by pre-merge smoke. Follow-up to update the refactor template in docs/oddkit/audit/... separately.

Two orphaned artifacts recovered: 1. odd/handoffs/2026-04-19-fresh-session-continuation.md Tonight's fresh-session continuation handoff. Created during the E0008.3 session and pushed to canon/dolcheo-ledger-open-items AFTER #107 was squash-merged, leaving it orphaned on the branch. Content: self-contained doc naming priority 1–6 work, the governing prompt-over-code refactor arc, and recommended first actions. Companion to the E0008.3 session ledger. 2. odd/ledger/journal/2026-04-17-pr100-rage-quit-handoff.md Historical ledger from 2026-04-17 session that ended with Klappy handing off to another model after a 12-hour bugbot ping-pong cycle on klappy/oddkit#100. Lived on oddkit branch chore/handoff-journal-pr100, never merged. The branch itself is unsafe to merge (also deletes the audit doc and smoke test that are load-bearing on main), so the ledger file was salvaged in isolation. Frontmatter added for canon consistency; status: archival; salvaged_from field records the original location. Content worth preserving: 15+ bugs caught across three review surfaces (gauntlet, bugbot, Klappy), defect-class blindness as the consistent failure pattern, the voice-dump invariant story that directly motivated the canary refactor and live-smoke ship-blocker contract in canon/constraints/core-governance-baseline. This is the session that retrospectively justified E0008.3. Its loss would have removed the empirical anchor for several canon decisions that now live in main. Both files are now tracked and durably retrievable via klappy:// URIs. Related cleanup to follow: - oddkit branches chore/handoff-journal-pr100 and fix/challenge-mode-schema-includes-writing-modes can be deleted once this PR merges (their content is either superseded on main or salvaged here). - oddkit/docs/drafts/core-governance-baseline.md is a stale pre-rename draft; content lives in klappy.dev canon now. Separate tiny cleanup in oddkit repo.

….19.0) P1.3.1 canary retrofit mirroring P1.2's encode pattern. Challenge's response envelope now declares which tier served its governance vocabulary and exposes the four peer canon URIs it reads. Changes: - Four governance helpers return {<domainNoun>, source} tuples: * discoverChallengeTypes → {types, source} * fetchBasePrerequisites → {prerequisites, source} * fetchNormativeVocabulary → {vocabulary, source} * fetchStakesCalibration → {calibration, source} - runChallengeAction aggregates the four sources strictly: any helper falling through to hardcoded defaults → aggregate = "minimal". All four from canon → aggregate = "knowledge_base". - Response envelope (both SUPPRESSED and CHALLENGED paths): + result.governance_source: "knowledge_base" | "minimal" + result.governance_uris: array of 4 peer URIs (alphabetical) + debug.knowledge_base_url echoes override - Smoke test adds challenge envelope assertions + 9-mode parse integrity guard (PR #100 regression defense). Design decision: governance_uris (plural array) intentionally diverges from encode's singular governance_uri. Challenge reads four peer governance files with no hierarchy among them; a single anchor would misrepresent where base-prerequisites and normative-vocabulary live. Consumers that want a singular anchor can read governance_uris[0] — alphabetical ordering makes this stable. See CHANGELOG and PRD for full rationale. Two-tier cascade today ("knowledge_base" | "minimal") because workers/baseline/ is not yet shipped; the bundled tier from canon/constraints/core-governance-baseline is a contract aspiration. Union expands additively when baseline ships. Verification: - npm run typecheck: clean - node workers/test/governance-parser.test.mjs: 105/105 pass - Live smoke against preview URL: pending (next step) PRD: /home/claude/work/prd-p1-3-1-challenge-canary.md (local planning artifact) Closes: P1.3.1 (per klappy://odd/handoffs/2026-04-20-p1-3-challenge-canary)

chore: record E0008 challenge governance refactor decision in ledger

a88abf7