feat(challenge): governance-driven runChallengeAction (E0008)#100
feat(challenge): governance-driven runChallengeAction (E0008)#100
Conversation
Mirrors the PR #96 encode pattern. Extracts challenge behavior from live governance articles (landed in klappy.dev canon via PR #99) rather than hardcoded source logic. New functions in workers/src/orchestrate.ts: - discoverChallengeTypes — per-canonUrl cached type discovery - fetchBasePrerequisites — universal prerequisite checks - fetchNormativeVocabulary — RFC 2119 + architectural load-bearing terms - fetchStakesCalibration — mode-to-depth filter - extractPrereqTable / extractKeywordsFromCheck — shared helpers Refactored: - runChallengeAction — replaces hardcoded detectClaimType / generateChallenges / findTensions / findMissingPrerequisites with governance extraction. Supports multi-match. Filters output by stakes calibration based on mode parameter. - runCleanupStorage — clears all four new caches on invalidation Invariant: voice-dump mode suppresses all challenge output regardless of matched types. Load-bearing per stakes-calibration governance — some modes exist for raw capture and pressure-testing at that stage damages the mode. Graceful degradation: missing governance articles fall back to minimal built-in behavior with warnings, rather than failing. Co-authored-by: Claude <noreply@anthropic.com>
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
oddkit | 80416ac | Commit Preview URL Branch Preview URL |
Apr 17 2026, 03:09 PM |
Refactor runChallengeAction in workers/src/orchestrate.ts to extract challenge-type behavior from canon governance articles at runtime rather than hardcoding claim-type detection, questions, prerequisites, and tension rules in source. Structural mirror of PR #96 (encode). Detection upgraded mid-implementation from regex-OR to BM25 + stemming after the gauntlet revealed that regex-based matching was morphologically brittle ("coin" doesn't match trigger "coining"). The pivot removed an entire class of bug and seeded a reusable pattern for future governance-driven tools. Changes in workers/src/orchestrate.ts: - New: ChallengeTypeDef, BasePrerequisite, NormativeVocabulary, StakesModeConfig, StakesCalibration - New: discoverChallengeTypes (builds per-canonUrl BM25 index over detection text), fetchBasePrerequisites, fetchNormativeVocabulary, fetchStakesCalibration — each with per-canonUrl cache and graceful degradation on missing articles - New: evaluatePrerequisiteCheck — interprets natural-language check strings from prerequisite overlay tables - Refactored runChallengeAction: multi-match via BM25 score > 0, base + overlay prerequisite aggregation, stakes calibration filtering, voice-dump suppression invariant, governance-driven tension detection - Extended runCleanupStorage with five new cache clears (types, type-index, base prerequisites, vocabulary, calibration) - Removed dead detectClaimType (legacy src/tasks/challenge.js retains its copy for CLI backward-compat) - Added CHALLENGE_STOP_WORDS set preserving modal verbs as signal Changes in workers/src/bm25.ts (backward-compatible extension): - tokenize(), buildBM25Index() accept optional stopWords: Set<string> - BM25Index gains optional stopWords field so searchBM25 tokenizes queries consistently with the index - Default behavior unchanged — existing callers unaffected - Motivation: default STOP_WORDS filters modals (must, should, shall, may, not) which are signal for challenge-type detection New tests: workers/test/governance-parser.test.mjs — 94 assertions against live governance articles fetched from klappy.dev raw. Covers type parsing, fallback resolution, BM25 detection, stemming regression cases (coin/coining, propose/proposed, principle/principles), multi- match, and the voice-dump suppression invariant. 94/94 pass. Bugs the gauntlet caught on this PR: 1. Voice-dump suppression invariant would have shipped broken — the calibration cell reads "none (suppress all challenge)" not bare "none". Strict-equality parser would have produced a single-element array, voice-dump mode would have surfaced all challenges in prod. 2. Morphological brittleness in regex detection (coin vs coining) — triggered the pivot to BM25 + stemming. 3. Default BM25 STOP_WORDS silently breaks strong-claim and proposal detection by filtering modal verbs. Fixed via custom stop word set. Verification: - npm run typecheck: clean - tests/smoke.sh: 6/6 pass (legacy CLI path — backward compat preserved) - workers/test/governance-parser.test.mjs: 94/94 pass - AI voice clichés audit on new comments: clean - oddkit_preflight, challenge, gate, validate: all run; gate NOT_READY due to same hardcoded-logic gap as challenge pre-refactor (flagged as follow-up) Response shape change: adds mode, matched_types, type_definitions, block_until_addressed; removes claim_type. Consumed programmatically, not rendered. Follow-ups flagged: - Encode parity PR — same regex-OR brittleness in runEncodeAction; pattern proven here, port will be near-mechanical - klappy.dev meta governance PR — "compiles into a case-insensitive word-boundary regex" is now stale language - Gate refactor candidate — same hardcoded-logic shape as challenge pre-refactor Refs: - Depends on: klappy/klappy.dev#99 (governance articles this code reads) - Structural mirror: #96 (governance-driven encode) - Evidence: docs/oddkit/evidence/challenge-governance-code-refactor.md
Re-applies the four review fixes from sibling commits (31f8134, e9ef2f9, 84932f0) and the dead-code removal that the bugbot review also flagged, on top of the BM25 + stemming detection swap. - Vocabulary regex sorted by length descending so 'MUST NOT' matches before 'MUST' (closes bugbot 'Regex alternation order') - Stakes calibration mode column lowercased at parse time AND mode normalized to lowercase at lookup time (closes bugbot 'Mode column not lowercased breaks voice-dump suppression') - first_1 reframings policy now surfaces a single reframing total across all matched types, not one per type (closes bugbot 'first_1 reframings surfaces multiple instead of one') - Detection runs BEFORE voice-dump suppression check, and SUPPRESSED response includes the governance field for shape parity with CHALLENGED (closes bugbot 'SUPPRESSED response missing governance') - Renames type_definitions to governance in CHALLENGED response so both statuses return the same shape under the same key - Dead detectClaimType already removed by the BM25 commit (closes bugbot 'Dead code: detectClaimType has zero callers') Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs: 94/94 pass - tests/smoke.sh: 6/6 pass
…ctor evidence Captures the fork-resolution and bugbot-review-driven fixes as a sixth layer of catch alongside the gauntlet bugs. Records the lesson: read PR review comments before treating divergent remote as unknown work.
Combine update — BM25 + stemming pivot ported on top of the regex versionThree new commits on top of
All five bugbot items closed
What BM25 + stemming buys
bm25.ts extension (backward-compatible)Added optional Verification
Bugs caught across this PR — six total
Follow-ups flagged
|
…e cross-contamination
| const dtype = | ||
| vocab.directiveTypes.get(phrase) || | ||
| vocab.directiveTypes.get(phrase.toLowerCase()) || | ||
| "load-bearing-claim"; |
There was a problem hiding this comment.
Case-insensitive directive lookup misses non-lowercase map keys
Low Severity
The directiveTypes map stores keys in their original case from the article (cols[0]). The case-insensitive regex match captures text in the excerpt's case. The lookup tries phrase (excerpt case) then phrase.toLowerCase(), but this only succeeds if the article stored the term in lowercase. If the article uses title case (e.g., "Vodka Architecture"), and the excerpt has a different case, both lookups miss and silently fall through to the "load-bearing-claim" default, producing an incorrect tension type.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6358d65. Configure here.
Caught in PR #100 review by Klappy: the CHALLENGE_STOP_WORDS Set added mid-PR to fix a BM25 over-match was itself a Vodka Architecture violation in a refactor explicitly about removing such violations. The constant carried a domain opinion ('modals are signal, articles are filler in challenge detection') that belonged in canon, not in worker source. Anti-pattern fixed: - Drop the hardcoded CHALLENGE_STOP_WORDS Set from workers/src/orchestrate.ts - Drop the duplicate hardcoded copy from workers/test/governance-parser.test.mjs - Extend NormativeVocabulary interface with stopWords: Set<string> - Extend fetchNormativeVocabulary to extract '## Detection Noise' code block from odd/challenge/normative-vocabulary.md (lands in klappy.dev#100) - Move BM25 index build out of discoverChallengeTypes into a new lazy builder getOrBuildChallengeTypeIndex(types, vocab, canonUrl) so the index can use governance-sourced stop words rather than a constant - Update parser test to fetch Detection Noise the same way the worker does — no hardcoded duplicate, no drift risk. Test gains 3 new assertions: Detection Noise parses non-empty, excludes modal verbs, includes common filler Net hardcoded-constants delta: this PR removes ~6 classes of hardcoded domain opinion (claim type detection, questions, prereqs, tension regex, reframings, stop words) and adds zero. The remaining minimal RFC 2119 fallback ('MUST', 'MUST NOT', 'SHOULD', 'SHOULD NOT') and 'planning' default mode are server-availability fallbacks for when canon is unreachable, not domain governance. Test currently runs against the feature branch via KLAPPYDEV_RAW env override. After klappy.dev#100 merges, the override comes off and the test reads from main with no further changes. Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs (vs feature branch): 97/97 pass - tests/smoke.sh: 6/6 pass - grep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches Refs: - Caught in: this PR review by Klappy - Depends on: klappy/klappy.dev#100 (Detection Noise section) - Lesson: 'is this the right architectural shape' is a category the current gauntlet does not catch — the tools verify governance content, not whether new code is creating new ungoverned content. Possible future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in worker source and asks 'should this be in canon?'
Anti-pattern fix —
|
| Constant | Before | After |
|---|---|---|
detectClaimType regex literals |
hardcoded | governance |
Per-claim-type challenges[] arrays |
hardcoded | governance |
Per-claim-type reframings[] arrays |
hardcoded | governance |
| Missing-prereq regex checks | hardcoded | governance |
Tension regex (MUST, MUST NOT) |
hardcoded | governance |
CHALLENGE_STOP_WORDS Set |
added mid-PR | governance |
| RFC 2119 fallback | already present | unchanged (server-availability fallback) |
"planning" default mode |
already present | unchanged (server-availability fallback) |
Net: ~6 classes of hardcoded domain opinion deleted, 0 added.
Verification
npm run typecheck: cleanKLAPPYDEV_RAW=...feat/challenge-detection-noise-vocabulary node workers/test/governance-parser.test.mjs: 97/97 passtests/smoke.sh: 6/6 passgrep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches
Test currently runs against the feature branch via KLAPPYDEV_RAW env override. After klappy.dev#100 merges, the override comes off and the test reads from main with no further changes.
Lesson recorded
"Is this the right architectural shape" is a category the current gauntlet doesn't catch. The tools verify governance content; they don't audit whether new code is creating new ungoverned content. Possible future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in worker source and asks "should this be in canon?"
Bug count for this PR — now seven across two review surfaces
- Voice-dump suppression invariant nearly shipped broken (parser strict-equality)
- Morphological brittleness in regex detection (BM25 pivot trigger)
- Default BM25
STOP_WORDSfilters modal verbs that are signal - Vocabulary regex alternation order (bugbot — High)
- Mode column case mismatch breaks calibration (bugbot — High)
first_1reframings counted per-type (bugbot — Medium)- SUPPRESSED response missing
governance(bugbot — Medium) - Dead
detectClaimType(bugbot — Low; closed by BM25 commit) - BM25 index cache canonUrl guard (bugbot — Medium; landed as
997a50d) - Question tier case mismatch (bugbot — Medium; landed as
94990b5) - Stop words hardcoded in source — Vodka violation (this fix; not flagged by gauntlet or bugbot, caught by you)
The discipline is load-bearing across multiple review surfaces. None is sufficient alone.
…arser - Use matchAll and prefer prohibition directive type over leftmost requirement match so excerpts like 'You MUST X and MUST NOT Y' surface the prohibition. Regex switched to global flag to support matchAll. - Port fetchStakesCalibration toLowerCase fix to the fidelity test parser so byMode keys stay lowercase even if governance introduces capitalized mode names.
…st pickStrongest
Two fixes:
1. Table row parser (6 call sites) was using
.split('|').map(trim).filter(c => c.length > 0) which also drops
legitimately-empty interior cells, silently collapsing the column
count. In fetchStakesCalibration this would silently drop a
voice-dump row with an empty tiers cell, breaking the suppression
invariant with no error signal. Introduce parseTableRow helper that
only strips the leading/trailing empties produced by surrounding
pipes, preserving empty middle cells.
2. Hoist pickStrongest (now pickStrongestDirective) out of the
per-entry loop in runChallengeAction. It captures no loop-scoped
state, so defining it inside the loop needlessly re-allocated the
closure on each iteration and misled readers about its scope.
Matches the placement of evaluatePrerequisiteCheck.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
Reviewed by Cursor Bugbot for commit 477a213. Configure here.
…and dead branch Two issues from bugbot's 14:29 review: 1. Reframing 'none' check applies same defensive pattern as the tiersRaw fix in fetchStakesCalibration. The cell may be 'none' or 'none (parenthetical reason)' — strict equality would silently surface all reframings via the 'all' fallback when authors include explanatory text. Same defect class as bug #3 in the evidence note; sweep applied. 2. Remove unreachable questionTiers.length === 0 branch in the question- surfacing condition. The SUPPRESSED early-return at line 1635 already handles that case, so the branch was dead code that misleadingly suggested 'surface all questions for empty tiers' semantics — the actual semantic is full suppression. Verified: typecheck clean, parser test 97/97 against main, smoke 6/6. Defect-class sweep on governance cell strict-equality checks: only two sites (tiersRaw, surfacing), both now defensive.
Production preview test caught a real show-stopper: the MCP tool's mode parameter Zod schema accepted only exploration/planning/execution but the calibration governance article (odd/challenge/stakes-calibration) defines 9 modes. The 6 writing-lifecycle modes (voice-dump, drafting, peer-review-ready, canon-tier-2, canon-tier-1, published-essay) were unreachable from the public API — schema validation rejected them before runtime ever saw them. Net effect of the bug: the voice-dump suppression invariant — the load-bearing feature of PR #100, named in evidence as load-bearing — could not be exercised through the public MCP tool. Internal tests worked because they bypassed the schema. The CI 'Test CF Preview' job failed on cold-start timeout (red herring); the real failure was discovered by manual curl against the preview. Two sites in workers/src/index.ts: - Line 170: unified oddkit tool - Line 235: dedicated oddkit_challenge tool Both expanded to the full 9-mode enum. Description text updated to explain the two mode families and call out voice-dump suppression. Longer-term direction (not this commit): drop the enum entirely and let canon be the validator. The runtime already validates against the calibration table at fetchStakesCalibration time — having the schema also enforce vocabulary is the same Vodka anti-pattern shape that PR #100 fixed for stop words. Tracked as a follow-up. Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs: 97/97 against main - Manual preview curl with mode=voice-dump pending after this deploys Lesson: testing the running preview is not optional. PR #100 had typecheck pass, parser test 97/97, smoke 6/6, AND production preview deploy succeed — and still shipped a feature the schema blocked from being called. Three layers of verification missed it because none of them exercised the public API contract.
Captures DOLCHE for the session that delivered PR #100 (governance-driven challenge refactor with BM25 + stemming) and the unresolved schema bug that made the voice-dump suppression invariant unreachable from the public API. Critical for next model picking up: - PR #101 (prod promotion) is BLOCKED — schema fix not yet merged - fix/challenge-mode-schema-includes-writing-modes has the fix - After fix lands, manually curl preview with mode=voice-dump before promoting Also records lessons for the next session: defect-class sweep discipline, public API contract verification, parser test flakiness, and three follow-up refactors carrying the same anti-pattern as challenge pre-refactor (encode, gate, orient).
Catalogs the Vodka anti-pattern — canon defines vocabulary, code hardcodes interpretation — across all 11 oddkit tools. PR #100's voice-dump schema bug (1h 39m prod breakage) was one instance of this class. Audit identifies 5 tools carrying the same shape. Findings: SEVERE: orient, gate, validate PARTIAL: encode, preflight CROSS-CUTTING: mode enum declared in 4 places CLEAN: challenge (post-refactor), search, get, catalog, version, time, cleanup_storage, telemetry_public Refactor priority ranked by impact × tractability. validate named as the most surprising: it gates 'done' but never reads canon/constraints/definition-of-done.md — which in fact does not exist in the repo despite three user-facing docs claiming it does. Companion PR on klappy/klappy.dev establishes canon/constraints/core-governance-baseline.md as the contract every sweep refactor will conform to.
Replaces the hardcoded self_report_headers dictionary with a runtime parse of canon/constraints/telemetry-governance.md #### Self-Report Fields table. Response envelope now declares governance_source: 'canon' when the fetch succeeds and the table parses, 'minimal' when it falls back to the shipped baseline. This is the canary refactor for the governance anti-pattern sweep (docs/oddkit/audit/governance-anti-pattern-sweep-2026-04-17). It conforms to the three-tier resolution contract drafted in klappy/klappy.dev#101 (canon/constraints/core-governance-baseline), exercising tiers 1 (live canon) and 3 (minimal baseline in code). Tier 2 (bundled baseline directory with manifest) and the build-time schema check arrive in follow-up work once the contract graduates from status:draft to status:active. Implementation: - New helper parseSelfReportHeadersTable in index.ts parses the '### Self-Report Fields' table section from the canon doc. - Parser is permissive (whitespace + backticks) and fails closed to null so the caller falls back to the minimal baseline rather than hiding the degradation. - Minimal baseline remains the 8 stable headers; canon controls the descriptions once live. Verified: - npm run typecheck: clean - Parser unit-tested against live canon content: 8/8 headers parsed - Parser degradation paths (no section, empty table) return null Refactor discipline this commit follows (from PR #100 post-mortem): - Single feature PR, single site touched - Public contract (MCP tool response) changes are additive (governance_source field added; self_report_headers keys unchanged) - Preview smoke against live prod will verify canon-tier response before promotion
…me (#106) * feat(telemetry_policy): canary refactor — headers from canon at runtime Replaces the hardcoded self_report_headers dictionary with a runtime parse of canon/constraints/telemetry-governance.md #### Self-Report Fields table. Response envelope now declares governance_source: 'canon' when the fetch succeeds and the table parses, 'minimal' when it falls back to the shipped baseline. This is the canary refactor for the governance anti-pattern sweep (docs/oddkit/audit/governance-anti-pattern-sweep-2026-04-17). It conforms to the three-tier resolution contract drafted in klappy/klappy.dev#101 (canon/constraints/core-governance-baseline), exercising tiers 1 (live canon) and 3 (minimal baseline in code). Tier 2 (bundled baseline directory with manifest) and the build-time schema check arrive in follow-up work once the contract graduates from status:draft to status:active. Implementation: - New helper parseSelfReportHeadersTable in index.ts parses the '### Self-Report Fields' table section from the canon doc. - Parser is permissive (whitespace + backticks) and fails closed to null so the caller falls back to the minimal baseline rather than hiding the degradation. - Minimal baseline remains the 8 stable headers; canon controls the descriptions once live. Verified: - npm run typecheck: clean - Parser unit-tested against live canon content: 8/8 headers parsed - Parser degradation paths (no section, empty table) return null Refactor discipline this commit follows (from PR #100 post-mortem): - Single feature PR, single site touched - Public contract (MCP tool response) changes are additive (governance_source field added; self_report_headers keys unchanged) - Preview smoke against live prod will verify canon-tier response before promotion * refactor(workers): extract parseTableRow to shared markdown-utils * fix(telemetry_policy canary): read Description column + add parser tests Addresses execution-mode challenge gaps on PR #106: 1. Information regression fixed: parser now reads canon's column 4 (Description) instead of column 0 (Field label). Canon was extended with richer per-header descriptions in klappy/klappy.dev#102; this commit updates the parser to consume that new column. 2. Tests committed: Test 8 added to workers/test/governance-parser.test.mjs covering 8/8 header extraction, non-trivial description lengths, and degradation paths (no section, empty table). All 105 tests pass against the unmerged klappy.dev branch via KLAPPYDEV_RAW override. Still outstanding (follow-up work, not blocking the canary): - parseTableRow duplicated across workers/src/index.ts and workers/src/orchestrate.ts. Accepted duplication for now, flagged in both sites; export-and-share refactor lands when the sweep surfaces more duplication candidates. - Preview smoke against Cloudflare preview with the extended canon loaded but no worker redeploy — run manually after this PR deploys. Companion PR: klappy/klappy.dev#102 (canon extension). This worker change is backward-compatible with the old 3-column table: the parser requires 4 cols, so against the old canon it falls through to the minimal baseline tier. Once klappy.dev#102 merges, canon tier takes over. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>
…moke Addresses three gaps found in live validation of canary PR #106: 1. Response envelope was missing server_time, assistant_text, debug. Every other oddkit tool returns {action, result, server_time, assistant_text, debug}; telemetry_policy returned only {action, result}. This breaks the time-discipline contract — project instructions require every oddkit response to carry server_time so models have a clock reading on every call. 2. canon_url parameter was silently ignored. The Zod schema was {}, so MCP stripped canon_url before the handler saw it, and the handler hardcoded the default baseline. The three-tier resolution contract in canon/constraints/core-governance-baseline assumes every canon-driven tool accepts canon_url for overrides — this is load-bearing for TruthKit / custom-canon consumers. 3. No live-smoke test for the envelope shape. Parser tests in governance-parser.test.mjs exercised parser logic only. The canary shipped with partial contract conformance because no test invoked the MCP tool end-to-end and asserted the envelope shape. Changes: - Add canon_url to the tool's Zod schema; thread through to fetcher.getFile(path, canon_url). - Expand response envelope to match convention: server_time, assistant_text (human-readable summary naming the tier), debug with duration_ms and canon_url echo. - New workers/test/canon-tool-envelope.smoke.mjs — live smoke script that curls the MCP endpoint and verifies envelope shape for oddkit_time (convention baseline), telemetry_policy default (canon tier), and telemetry_policy with canon_url override (minimal fallback). Verified: - npm run typecheck: clean - Smoke script structure matches PR #100's governance-parser test style and exits non-zero on any envelope violation. Lesson for the sweep: every canon-driven refactor must verify both the new governance_source signal AND full envelope conformance. The canary's partial completion was caught by live validation but should have been caught by pre-merge smoke. Follow-up to update the refactor template in docs/oddkit/audit/... separately.
Two orphaned artifacts recovered: 1. odd/handoffs/2026-04-19-fresh-session-continuation.md Tonight's fresh-session continuation handoff. Created during the E0008.3 session and pushed to canon/dolcheo-ledger-open-items AFTER #107 was squash-merged, leaving it orphaned on the branch. Content: self-contained doc naming priority 1–6 work, the governing prompt-over-code refactor arc, and recommended first actions. Companion to the E0008.3 session ledger. 2. odd/ledger/journal/2026-04-17-pr100-rage-quit-handoff.md Historical ledger from 2026-04-17 session that ended with Klappy handing off to another model after a 12-hour bugbot ping-pong cycle on klappy/oddkit#100. Lived on oddkit branch chore/handoff-journal-pr100, never merged. The branch itself is unsafe to merge (also deletes the audit doc and smoke test that are load-bearing on main), so the ledger file was salvaged in isolation. Frontmatter added for canon consistency; status: archival; salvaged_from field records the original location. Content worth preserving: 15+ bugs caught across three review surfaces (gauntlet, bugbot, Klappy), defect-class blindness as the consistent failure pattern, the voice-dump invariant story that directly motivated the canary refactor and live-smoke ship-blocker contract in canon/constraints/core-governance-baseline. This is the session that retrospectively justified E0008.3. Its loss would have removed the empirical anchor for several canon decisions that now live in main. Both files are now tracked and durably retrievable via klappy:// URIs. Related cleanup to follow: - oddkit branches chore/handoff-journal-pr100 and fix/challenge-mode-schema-includes-writing-modes can be deleted once this PR merges (their content is either superseded on main or salvaged here). - oddkit/docs/drafts/core-governance-baseline.md is a stale pre-rename draft; content lives in klappy.dev canon now. Separate tiny cleanup in oddkit repo.
….19.0)
P1.3.1 canary retrofit mirroring P1.2's encode pattern. Challenge's response
envelope now declares which tier served its governance vocabulary and exposes
the four peer canon URIs it reads.
Changes:
- Four governance helpers return {<domainNoun>, source} tuples:
* discoverChallengeTypes → {types, source}
* fetchBasePrerequisites → {prerequisites, source}
* fetchNormativeVocabulary → {vocabulary, source}
* fetchStakesCalibration → {calibration, source}
- runChallengeAction aggregates the four sources strictly: any helper
falling through to hardcoded defaults → aggregate = "minimal".
All four from canon → aggregate = "knowledge_base".
- Response envelope (both SUPPRESSED and CHALLENGED paths):
+ result.governance_source: "knowledge_base" | "minimal"
+ result.governance_uris: array of 4 peer URIs (alphabetical)
+ debug.knowledge_base_url echoes override
- Smoke test adds challenge envelope assertions + 9-mode parse integrity
guard (PR #100 regression defense).
Design decision: governance_uris (plural array) intentionally diverges from
encode's singular governance_uri. Challenge reads four peer governance
files with no hierarchy among them; a single anchor would misrepresent
where base-prerequisites and normative-vocabulary live. Consumers that
want a singular anchor can read governance_uris[0] — alphabetical
ordering makes this stable. See CHANGELOG and PRD for full rationale.
Two-tier cascade today ("knowledge_base" | "minimal") because
workers/baseline/ is not yet shipped; the bundled tier from
canon/constraints/core-governance-baseline is a contract aspiration.
Union expands additively when baseline ships.
Verification:
- npm run typecheck: clean
- node workers/test/governance-parser.test.mjs: 105/105 pass
- Live smoke against preview URL: pending (next step)
PRD: /home/claude/work/prd-p1-3-1-challenge-canary.md (local planning artifact)
Closes: P1.3.1 (per klappy://odd/handoffs/2026-04-20-p1-3-challenge-canary)


feat(challenge): governance-driven runChallengeAction (E0008)
Sequence
bef6c83) — governance-driven encode architecture (precedent)runChallengeActionnow reads those articles at runtimeWhat Changed
New functions in
workers/src/orchestrate.ts:extractKeywordsFromCheckextractPrereqTable## Prerequisite Overlaystable from any articlediscoverChallengeTypeschallenge-type-tagged articlesfetchBasePrerequisitesodd/challenge/base-prerequisites.mdprerequisite tablefetchNormativeVocabularyodd/challenge/normative-vocabulary.mdRFC 2119 + arch termsfetchStakesCalibrationodd/challenge/stakes-calibration.mdmode→depth tableModified:
runChallengeAction— body fully replaced with governance-driven implementation (multi-match, calibration-filtered output, voice-dump suppression invariant)runCleanupStorage— extended to clear all four new cachesNew types/caches:
ChallengeTypeDef,PrereqOverlay,NormativeVocabulary,StakesCalibrationinterfacesVoice-Dump Suppression Invariant
This is load-bearing. When
calibration.questionTiers.length === 0(e.g., avoice-dumpmode row in the stakes-calibration table sets tiers to "none"),runChallengeActionreturnsstatus: "SUPPRESSED"immediately — before aggregating types, testing prerequisites, or surfacing reframings.Some modes exist for raw thought capture. Pressure-testing at that stage damages the mode's function. The suppression is not advisory and does not surface a "reduced" set.
Cache Invalidation
runCleanupStoragenow clears all four new caches:Graceful Degradation
All four fetch functions degrade gracefully if governance articles are missing or malformed:
fetchBasePrerequisites→ empty arrayfetchNormativeVocabulary→ minimal built-in set (MUST/MUST NOT/SHOULD/SHOULD NOT)fetchStakesCalibration→ "surface everything" fallback for exploration/planning/executiondiscoverChallengeTypes→ empty array (challenge then produces empty output until articles are authored)Backward Compatibility
Response envelope gains
matched_types,mode_used, andgovernancefields. The legacyclaim_typefield is preserved as a backward-compat alias returning the first matched type slug.Build & Type-Check Evidence
TypeScript compilation clean. No errors. Full output in evidence doc.
Root-level
npm testhas a pre-existing failure (missingcommanderdep forsrc/cli.js) unrelated to this change. The workers package has no test script; correctness validated via tsc + preview deploy.Evidence doc:
docs/oddkit/evidence/challenge-governance-code-refactor.mdOpen Risks
matchedTypes.length > 1may fire frequently in practice. Design handles this correctly via aggregation; governance authors manage scope via specific trigger words.claim_typealias string-value drift — olddetectClaimTypereturned"strong_claim","proposal","assumption","observation". Newclaim_typereturns the governance article slug (e.g.,"strong-claim"with a hyphen). Callers doing string equality checks on this field will need to update.discoverChallengeTypescompiles regexes for all challenge types on first call. Mitigated by per-canonUrl caching across requests within the same isolate.Note
Medium Risk
Replaces core
oddkit_challengeworker behavior and response shape based on runtime-parsed governance markdown, so parsing drift or heuristic prerequisite evaluation could change outputs silently. BM25 stop-word overrides and new caches add moderate operational risk but are scoped to the challenge path.Overview
oddkit_challengeinworkers/src/orchestrate.tsis refactored from hardcoded claim-type logic into a governance-driven runtime that discoverschallenge-typearticles, base prerequisites, normative vocabulary, and stakes calibration from canon and applies them per-request (with per-canonUrlcaching, multi-match aggregation, fallback type resolution, and voice-dump suppression).Challenge-type detection pivots to BM25 + stemming over per-type detection text;
workers/src/bm25.tsis extended (backward-compatibly) to support custom stop-word sets stored on the index so modal verbs/negation can remain signal. Tension detection is switched to governance-defined vocabulary regexes, and the action response now includes mode + matched-type governance details (plusblock_until_addressed) while keepingclaim_typeas a compatibility alias.Adds a new
workers/test/governance-parser.test.mjsparser-fidelity test against live governance articles, and updatesrunCleanupStorageto clear the new in-memory caches; also adds evidence/journal/ledger documentation for the refactor.Reviewed by Cursor Bugbot for commit 80416ac. Bugbot is set up for automated code reviews on this repo. Configure here.