Skip to content

feat(challenge): governance-driven runChallengeAction (E0008)#100

Merged
klappy merged 17 commits intomainfrom
feat/e0008-challenge-governance-driven
Apr 17, 2026
Merged

feat(challenge): governance-driven runChallengeAction (E0008)#100
klappy merged 17 commits intomainfrom
feat/e0008-challenge-governance-driven

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented Apr 17, 2026

feat(challenge): governance-driven runChallengeAction (E0008)

Sequence


What Changed

New functions in workers/src/orchestrate.ts:

Function Purpose
extractKeywordsFromCheck Extracts quoted strings from prerequisite check descriptions
extractPrereqTable Shared helper — parses ## Prerequisite Overlays table from any article
discoverChallengeTypes Per-canonUrl cached discovery of challenge-type-tagged articles
fetchBasePrerequisites Reads odd/challenge/base-prerequisites.md prerequisite table
fetchNormativeVocabulary Reads odd/challenge/normative-vocabulary.md RFC 2119 + arch terms
fetchStakesCalibration Reads odd/challenge/stakes-calibration.md mode→depth table

Modified:

  • runChallengeAction — body fully replaced with governance-driven implementation (multi-match, calibration-filtered output, voice-dump suppression invariant)
  • runCleanupStorage — extended to clear all four new caches

New types/caches:

  • ChallengeTypeDef, PrereqOverlay, NormativeVocabulary, StakesCalibration interfaces
  • 8 new cache variables (4 data caches + 4 canonUrl guards), mirroring the encode pattern

Voice-Dump Suppression Invariant

This is load-bearing. When calibration.questionTiers.length === 0 (e.g., a voice-dump mode row in the stakes-calibration table sets tiers to "none"), runChallengeAction returns status: "SUPPRESSED" immediately — before aggregating types, testing prerequisites, or surfacing reframings.

Some modes exist for raw thought capture. Pressure-testing at that stage damages the mode's function. The suppression is not advisory and does not surface a "reduced" set.

if (calibration.questionTiers.length === 0) {
  // short-circuit — returns SUPPRESSED, no further processing
}

Cache Invalidation

runCleanupStorage now clears all four new caches:

cachedChallengeTypes = null;
cachedChallengeTypesCanonUrl = undefined;
cachedBasePrerequisites = null;
cachedBasePrerequisitesCanonUrl = undefined;
cachedNormativeVocabulary = null;
cachedNormativeVocabularyCanonUrl = undefined;
cachedStakesCalibration = null;
cachedStakesCalibrationCanonUrl = undefined;

Graceful Degradation

All four fetch functions degrade gracefully if governance articles are missing or malformed:

  • fetchBasePrerequisites → empty array
  • fetchNormativeVocabulary → minimal built-in set (MUST/MUST NOT/SHOULD/SHOULD NOT)
  • fetchStakesCalibration → "surface everything" fallback for exploration/planning/execution
  • discoverChallengeTypes → empty array (challenge then produces empty output until articles are authored)

Backward Compatibility

Response envelope gains matched_types, mode_used, and governance fields. The legacy claim_type field is preserved as a backward-compat alias returning the first matched type slug.


Build & Type-Check Evidence

cd workers && npx tsc --noEmit
EXIT:0

TypeScript compilation clean. No errors. Full output in evidence doc.

Root-level npm test has a pre-existing failure (missing commander dep for src/cli.js) unrelated to this change. The workers package has no test script; correctness validated via tsc + preview deploy.

Evidence doc: docs/oddkit/evidence/challenge-governance-code-refactor.md


Open Risks

  1. Detection-pattern overlap noise — if multiple challenge-type articles share trigger words, matchedTypes.length > 1 may fire frequently in practice. Design handles this correctly via aggregation; governance authors manage scope via specific trigger words.
  2. Descriptive-only prerequisite checks skipped — prereqs with no quoted keywords (prose-only check descriptions) cannot be mechanically tested and are silently skipped. By design per spec.
  3. claim_type alias string-value drift — old detectClaimType returned "strong_claim", "proposal", "assumption", "observation". New claim_type returns the governance article slug (e.g., "strong-claim" with a hyphen). Callers doing string equality checks on this field will need to update.
  4. Cold-start regex compilationdiscoverChallengeTypes compiles regexes for all challenge types on first call. Mitigated by per-canonUrl caching across requests within the same isolate.

Note

Medium Risk
Replaces core oddkit_challenge worker behavior and response shape based on runtime-parsed governance markdown, so parsing drift or heuristic prerequisite evaluation could change outputs silently. BM25 stop-word overrides and new caches add moderate operational risk but are scoped to the challenge path.

Overview
oddkit_challenge in workers/src/orchestrate.ts is refactored from hardcoded claim-type logic into a governance-driven runtime that discovers challenge-type articles, base prerequisites, normative vocabulary, and stakes calibration from canon and applies them per-request (with per-canonUrl caching, multi-match aggregation, fallback type resolution, and voice-dump suppression).

Challenge-type detection pivots to BM25 + stemming over per-type detection text; workers/src/bm25.ts is extended (backward-compatibly) to support custom stop-word sets stored on the index so modal verbs/negation can remain signal. Tension detection is switched to governance-defined vocabulary regexes, and the action response now includes mode + matched-type governance details (plus block_until_addressed) while keeping claim_type as a compatibility alias.

Adds a new workers/test/governance-parser.test.mjs parser-fidelity test against live governance articles, and updates runCleanupStorage to clear the new in-memory caches; also adds evidence/journal/ledger documentation for the refactor.

Reviewed by Cursor Bugbot for commit 80416ac. Bugbot is set up for automated code reviews on this repo. Configure here.

Mirrors the PR #96 encode pattern. Extracts challenge behavior from
live governance articles (landed in klappy.dev canon via PR #99)
rather than hardcoded source logic.

New functions in workers/src/orchestrate.ts:
- discoverChallengeTypes — per-canonUrl cached type discovery
- fetchBasePrerequisites — universal prerequisite checks
- fetchNormativeVocabulary — RFC 2119 + architectural load-bearing terms
- fetchStakesCalibration — mode-to-depth filter
- extractPrereqTable / extractKeywordsFromCheck — shared helpers

Refactored:
- runChallengeAction — replaces hardcoded detectClaimType /
  generateChallenges / findTensions / findMissingPrerequisites
  with governance extraction. Supports multi-match. Filters output
  by stakes calibration based on mode parameter.
- runCleanupStorage — clears all four new caches on invalidation

Invariant: voice-dump mode suppresses all challenge output
regardless of matched types. Load-bearing per stakes-calibration
governance — some modes exist for raw capture and pressure-testing
at that stage damages the mode.

Graceful degradation: missing governance articles fall back to
minimal built-in behavior with warnings, rather than failing.

Co-authored-by: Claude <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 17, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
oddkit 80416ac Commit Preview URL

Branch Preview URL
Apr 17 2026, 03:09 PM

Comment thread workers/src/orchestrate.ts Outdated
Comment thread workers/src/orchestrate.ts Outdated
Comment thread workers/src/orchestrate.ts Outdated
Comment thread workers/src/orchestrate.ts
Comment thread workers/src/orchestrate.ts Outdated
klappy added 3 commits April 17, 2026 07:11
Refactor runChallengeAction in workers/src/orchestrate.ts to extract
challenge-type behavior from canon governance articles at runtime rather
than hardcoding claim-type detection, questions, prerequisites, and
tension rules in source. Structural mirror of PR #96 (encode).

Detection upgraded mid-implementation from regex-OR to BM25 + stemming
after the gauntlet revealed that regex-based matching was morphologically
brittle ("coin" doesn't match trigger "coining"). The pivot removed an
entire class of bug and seeded a reusable pattern for future
governance-driven tools.

Changes in workers/src/orchestrate.ts:
- New: ChallengeTypeDef, BasePrerequisite, NormativeVocabulary,
  StakesModeConfig, StakesCalibration
- New: discoverChallengeTypes (builds per-canonUrl BM25 index over
  detection text), fetchBasePrerequisites, fetchNormativeVocabulary,
  fetchStakesCalibration — each with per-canonUrl cache and graceful
  degradation on missing articles
- New: evaluatePrerequisiteCheck — interprets natural-language check
  strings from prerequisite overlay tables
- Refactored runChallengeAction: multi-match via BM25 score > 0, base
  + overlay prerequisite aggregation, stakes calibration filtering,
  voice-dump suppression invariant, governance-driven tension detection
- Extended runCleanupStorage with five new cache clears (types,
  type-index, base prerequisites, vocabulary, calibration)
- Removed dead detectClaimType (legacy src/tasks/challenge.js retains
  its copy for CLI backward-compat)
- Added CHALLENGE_STOP_WORDS set preserving modal verbs as signal

Changes in workers/src/bm25.ts (backward-compatible extension):
- tokenize(), buildBM25Index() accept optional stopWords: Set<string>
- BM25Index gains optional stopWords field so searchBM25 tokenizes
  queries consistently with the index
- Default behavior unchanged — existing callers unaffected
- Motivation: default STOP_WORDS filters modals (must, should, shall,
  may, not) which are signal for challenge-type detection

New tests: workers/test/governance-parser.test.mjs — 94 assertions
against live governance articles fetched from klappy.dev raw. Covers
type parsing, fallback resolution, BM25 detection, stemming regression
cases (coin/coining, propose/proposed, principle/principles), multi-
match, and the voice-dump suppression invariant. 94/94 pass.

Bugs the gauntlet caught on this PR:
1. Voice-dump suppression invariant would have shipped broken — the
   calibration cell reads "none (suppress all challenge)" not bare
   "none". Strict-equality parser would have produced a single-element
   array, voice-dump mode would have surfaced all challenges in prod.
2. Morphological brittleness in regex detection (coin vs coining) —
   triggered the pivot to BM25 + stemming.
3. Default BM25 STOP_WORDS silently breaks strong-claim and proposal
   detection by filtering modal verbs. Fixed via custom stop word set.

Verification:
- npm run typecheck: clean
- tests/smoke.sh: 6/6 pass (legacy CLI path — backward compat preserved)
- workers/test/governance-parser.test.mjs: 94/94 pass
- AI voice clichés audit on new comments: clean
- oddkit_preflight, challenge, gate, validate: all run; gate NOT_READY
  due to same hardcoded-logic gap as challenge pre-refactor (flagged as
  follow-up)

Response shape change: adds mode, matched_types, type_definitions,
block_until_addressed; removes claim_type. Consumed programmatically,
not rendered.

Follow-ups flagged:
- Encode parity PR — same regex-OR brittleness in runEncodeAction;
  pattern proven here, port will be near-mechanical
- klappy.dev meta governance PR — "compiles into a case-insensitive
  word-boundary regex" is now stale language
- Gate refactor candidate — same hardcoded-logic shape as challenge pre-refactor

Refs:
- Depends on: klappy/klappy.dev#99 (governance articles this code reads)
- Structural mirror: #96 (governance-driven encode)
- Evidence: docs/oddkit/evidence/challenge-governance-code-refactor.md
Re-applies the four review fixes from sibling commits (31f8134, e9ef2f9,
84932f0) and the dead-code removal that the bugbot review also flagged,
on top of the BM25 + stemming detection swap.

- Vocabulary regex sorted by length descending so 'MUST NOT' matches
  before 'MUST' (closes bugbot 'Regex alternation order')
- Stakes calibration mode column lowercased at parse time AND mode
  normalized to lowercase at lookup time (closes bugbot 'Mode column
  not lowercased breaks voice-dump suppression')
- first_1 reframings policy now surfaces a single reframing total
  across all matched types, not one per type (closes bugbot
  'first_1 reframings surfaces multiple instead of one')
- Detection runs BEFORE voice-dump suppression check, and SUPPRESSED
  response includes the governance field for shape parity with
  CHALLENGED (closes bugbot 'SUPPRESSED response missing governance')
- Renames type_definitions to governance in CHALLENGED response so
  both statuses return the same shape under the same key
- Dead detectClaimType already removed by the BM25 commit (closes
  bugbot 'Dead code: detectClaimType has zero callers')

Verification:
- npm run typecheck: clean
- workers/test/governance-parser.test.mjs: 94/94 pass
- tests/smoke.sh: 6/6 pass
…ctor evidence

Captures the fork-resolution and bugbot-review-driven fixes as a sixth
layer of catch alongside the gauntlet bugs. Records the lesson:
read PR review comments before treating divergent remote as unknown work.
@klappy
Copy link
Copy Markdown
Owner Author

klappy commented Apr 17, 2026

Combine update — BM25 + stemming pivot ported on top of the regex version

Three new commits on top of 84932f0:

  • 726e5ed feat(workers): governance-driven oddkit_challenge with BM25 + stemming — replaces regex-OR detection with BM25 + stemming over per-type detection text (trigger words + blockquote). Mid-implementation pivot triggered by a morphological-brittleness signal in the gauntlet (coining matched, coin didn't). Adds workers/test/governance-parser.test.mjs (94 assertions against live klappy.dev governance articles).
  • e82164b fix(challenge): port bugbot review fixes onto BM25 detection layer — re-applies the four polish fixes from 31f8134, e9ef2f9, 84932f0 onto the BM25 base. They touch unrelated regions so the port was mechanical.
  • fd14a60 docs(evidence): add bugbot review + combine section to challenge refactor evidence

All five bugbot items closed

Severity Item Closed by
High Mode column not lowercased breaks voice-dump suppression e82164b (also lowercased at lookup site)
Medium Regex alternation order breaks multi-word directive matching e82164b (sort by length descending)
Medium first_1 reframings surfaces multiple instead of one e82164b (slice from aggregated list)
Medium SUPPRESSED response missing governance field e82164b (detection runs before suppression; both responses share governance key)
Low Dead code: detectClaimType has zero callers 726e5ed (removed by BM25 commit)

What BM25 + stemming buys

  • Morphological resilience. coin ~ coining ~ coined ~ coinage all stem to the same root. The brittleness that made bugbot's first review meaningful is gone.
  • Phrase boost for free. bm25.ts already has exact-phrase and bigram boosts; multi-word triggers ("the principle is", "forcing function") get bonus scores on top of the BM25 base.
  • Resilience to insertions. "I'm trying to coin a brand new term" now matches pattern-coinage even though no exact regex would have caught it.
  • Score-based confidence signal available. matched_types is currently a list of slugs; trivial to upgrade to [{slug, score}] if a downstream consumer wants relative confidence.
  • Reusable pattern. Encode follow-up PR will be near-mechanical now; future gate refactor inherits the same approach.

bm25.ts extension (backward-compatible)

Added optional stopWords: Set<string> to tokenize, buildBM25Index, BM25Index interface, and searchBM25. Default behavior unchanged — existing callers (including oddkit_search) get the same filtering they always did. Challenge-type detection passes a custom CHALLENGE_STOP_WORDS set that preserves modal verbs (must, should, shall, may, not) since modals ARE the signal for strong-claim and proposal types. Discovered as bug #5 in the gauntlet — default STOP_WORDS would have silently broken those two type detections.

Verification

  • npm run typecheck — clean
  • node workers/test/governance-parser.test.mjs — 94/94 pass (covers parser fidelity, BM25 detection, stemming regression cases including coin/coining, propose/proposed, principle/principles, multi-match, voice-dump suppression invariant, and irrelevant-input zero-match)
  • bash tests/smoke.sh — 6/6 pass (legacy CLI path — backward compat preserved)
  • Cloudflare Workers preview deploys automatically on push

Bugs caught across this PR — six total

  1. Voice-dump suppression invariant nearly shipped broken (parser strict-equality on "none" missed "none (suppress all challenge)")
  2. Morphological brittleness in regex detection (the pivot trigger)
  3. Default BM25 STOP_WORDS filters modal verbs that ARE signal
    4–8. Bugbot's five review items (4 fixed by polish commits, 1 by BM25 commit's dead-code removal)

Follow-ups flagged

  • Encode parity PR — same regex-OR brittleness in runEncodeAction; pattern proven here, port will be near-mechanical
  • klappy.dev meta governancehow-to-write-challenge-types.md says "compiles into a case-insensitive word-boundary regex" which is now stale; small PR to update
  • Gate refactor candidate — same hardcoded-logic shape as challenge pre-refactor

Comment thread workers/src/orchestrate.ts Outdated
Comment thread workers/src/orchestrate.ts
Comment thread workers/src/orchestrate.ts
Comment thread workers/src/orchestrate.ts Outdated
const dtype =
vocab.directiveTypes.get(phrase) ||
vocab.directiveTypes.get(phrase.toLowerCase()) ||
"load-bearing-claim";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case-insensitive directive lookup misses non-lowercase map keys

Low Severity

The directiveTypes map stores keys in their original case from the article (cols[0]). The case-insensitive regex match captures text in the excerpt's case. The lookup tries phrase (excerpt case) then phrase.toLowerCase(), but this only succeeds if the article stored the term in lowercase. If the article uses title case (e.g., "Vodka Architecture"), and the excerpt has a different case, both lookups miss and silently fall through to the "load-bearing-claim" default, producing an incorrect tension type.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6358d65. Configure here.

Comment thread workers/src/orchestrate.ts
Caught in PR #100 review by Klappy: the CHALLENGE_STOP_WORDS Set added
mid-PR to fix a BM25 over-match was itself a Vodka Architecture violation
in a refactor explicitly about removing such violations. The constant
carried a domain opinion ('modals are signal, articles are filler in
challenge detection') that belonged in canon, not in worker source.

Anti-pattern fixed:
- Drop the hardcoded CHALLENGE_STOP_WORDS Set from workers/src/orchestrate.ts
- Drop the duplicate hardcoded copy from workers/test/governance-parser.test.mjs
- Extend NormativeVocabulary interface with stopWords: Set<string>
- Extend fetchNormativeVocabulary to extract '## Detection Noise' code block
  from odd/challenge/normative-vocabulary.md (lands in klappy.dev#100)
- Move BM25 index build out of discoverChallengeTypes into a new lazy
  builder getOrBuildChallengeTypeIndex(types, vocab, canonUrl) so the
  index can use governance-sourced stop words rather than a constant
- Update parser test to fetch Detection Noise the same way the worker
  does — no hardcoded duplicate, no drift risk. Test gains 3 new
  assertions: Detection Noise parses non-empty, excludes modal verbs,
  includes common filler

Net hardcoded-constants delta: this PR removes ~6 classes of hardcoded
domain opinion (claim type detection, questions, prereqs, tension regex,
reframings, stop words) and adds zero. The remaining minimal RFC 2119
fallback ('MUST', 'MUST NOT', 'SHOULD', 'SHOULD NOT') and 'planning'
default mode are server-availability fallbacks for when canon is
unreachable, not domain governance.

Test currently runs against the feature branch via KLAPPYDEV_RAW env
override. After klappy.dev#100 merges, the override comes off and the
test reads from main with no further changes.

Verification:
- npm run typecheck: clean
- workers/test/governance-parser.test.mjs (vs feature branch): 97/97 pass
- tests/smoke.sh: 6/6 pass
- grep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches

Refs:
- Caught in: this PR review by Klappy
- Depends on: klappy/klappy.dev#100 (Detection Noise section)
- Lesson: 'is this the right architectural shape' is a category the
  current gauntlet does not catch — the tools verify governance content,
  not whether new code is creating new ungoverned content. Possible
  future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in
  worker source and asks 'should this be in canon?'
@klappy
Copy link
Copy Markdown
Owner Author

klappy commented Apr 17, 2026

Anti-pattern fix — CHALLENGE_STOP_WORDS moved out of source into governance

Caught in review: the CHALLENGE_STOP_WORDS constant added mid-PR to fix a BM25 over-match was itself a Vodka Architecture violation in a refactor explicitly about removing such violations. The list carried a domain opinion ("modals are signal, articles are filler in challenge detection") that belongs in canon, not in worker source.

Companion klappy.dev PR

klappy/klappy.dev#100 adds a new ## Detection Noise section to odd/challenge/normative-vocabulary.md. The article scope broadened to "two surfaces, one governance article" — signal in canon quotes (existing) plus noise in user input (new).

oddkit changes (commit ce41b39)

  • Drop the hardcoded CHALLENGE_STOP_WORDS Set from workers/src/orchestrate.ts
  • Drop the duplicate hardcoded copy from workers/test/governance-parser.test.mjs
  • Extend NormativeVocabulary interface with stopWords: Set<string>
  • Extend fetchNormativeVocabulary to extract the ## Detection Noise code block
  • Move BM25 index build out of discoverChallengeTypes into a new lazy builder getOrBuildChallengeTypeIndex(types, vocab, canonUrl) so the index can use governance-sourced stop words
  • Update parser test to fetch Detection Noise the same way the worker does — no hardcoded duplicate, no drift risk. Adds 3 new assertions: section parses non-empty, excludes modal verbs, includes common filler

Net hardcoded-constants delta vs main

Constant Before After
detectClaimType regex literals hardcoded governance
Per-claim-type challenges[] arrays hardcoded governance
Per-claim-type reframings[] arrays hardcoded governance
Missing-prereq regex checks hardcoded governance
Tension regex (MUST, MUST NOT) hardcoded governance
CHALLENGE_STOP_WORDS Set added mid-PR governance
RFC 2119 fallback already present unchanged (server-availability fallback)
"planning" default mode already present unchanged (server-availability fallback)

Net: ~6 classes of hardcoded domain opinion deleted, 0 added.

Verification

  • npm run typecheck: clean
  • KLAPPYDEV_RAW=...feat/challenge-detection-noise-vocabulary node workers/test/governance-parser.test.mjs: 97/97 pass
  • tests/smoke.sh: 6/6 pass
  • grep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches

Test currently runs against the feature branch via KLAPPYDEV_RAW env override. After klappy.dev#100 merges, the override comes off and the test reads from main with no further changes.

Lesson recorded

"Is this the right architectural shape" is a category the current gauntlet doesn't catch. The tools verify governance content; they don't audit whether new code is creating new ungoverned content. Possible future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in worker source and asks "should this be in canon?"

Bug count for this PR — now seven across two review surfaces

  1. Voice-dump suppression invariant nearly shipped broken (parser strict-equality)
  2. Morphological brittleness in regex detection (BM25 pivot trigger)
  3. Default BM25 STOP_WORDS filters modal verbs that are signal
  4. Vocabulary regex alternation order (bugbot — High)
  5. Mode column case mismatch breaks calibration (bugbot — High)
  6. first_1 reframings counted per-type (bugbot — Medium)
  7. SUPPRESSED response missing governance (bugbot — Medium)
  8. Dead detectClaimType (bugbot — Low; closed by BM25 commit)
  9. BM25 index cache canonUrl guard (bugbot — Medium; landed as 997a50d)
  10. Question tier case mismatch (bugbot — Medium; landed as 94990b5)
  11. Stop words hardcoded in source — Vodka violation (this fix; not flagged by gauntlet or bugbot, caught by you)

The discipline is load-bearing across multiple review surfaces. None is sufficient alone.

Comment thread workers/src/orchestrate.ts
Comment thread workers/test/governance-parser.test.mjs Outdated
…arser

- Use matchAll and prefer prohibition directive type over leftmost
  requirement match so excerpts like 'You MUST X and MUST NOT Y'
  surface the prohibition. Regex switched to global flag to support
  matchAll.
- Port fetchStakesCalibration toLowerCase fix to the fidelity test
  parser so byMode keys stay lowercase even if governance introduces
  capitalized mode names.
Comment thread workers/src/orchestrate.ts
Comment thread workers/src/orchestrate.ts Outdated
…st pickStrongest

Two fixes:

1. Table row parser (6 call sites) was using
   .split('|').map(trim).filter(c => c.length > 0) which also drops
   legitimately-empty interior cells, silently collapsing the column
   count. In fetchStakesCalibration this would silently drop a
   voice-dump row with an empty tiers cell, breaking the suppression
   invariant with no error signal. Introduce parseTableRow helper that
   only strips the leading/trailing empties produced by surrounding
   pipes, preserving empty middle cells.

2. Hoist pickStrongest (now pickStrongestDirective) out of the
   per-entry loop in runChallengeAction. It captures no loop-scoped
   state, so defining it inside the loop needlessly re-allocated the
   closure on each iteration and misled readers about its scope.
   Matches the placement of evaluatePrerequisiteCheck.
Comment thread workers/test/governance-parser.test.mjs Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 477a213. Configure here.

Comment thread workers/src/orchestrate.ts Outdated
Comment thread workers/src/orchestrate.ts Outdated
…and dead branch

Two issues from bugbot's 14:29 review:

1. Reframing 'none' check applies same defensive pattern as the tiersRaw
   fix in fetchStakesCalibration. The cell may be 'none' or
   'none (parenthetical reason)' — strict equality would silently surface
   all reframings via the 'all' fallback when authors include explanatory
   text. Same defect class as bug #3 in the evidence note; sweep applied.

2. Remove unreachable questionTiers.length === 0 branch in the question-
   surfacing condition. The SUPPRESSED early-return at line 1635 already
   handles that case, so the branch was dead code that misleadingly
   suggested 'surface all questions for empty tiers' semantics — the
   actual semantic is full suppression.

Verified: typecheck clean, parser test 97/97 against main, smoke 6/6.
Defect-class sweep on governance cell strict-equality checks: only two
sites (tiersRaw, surfacing), both now defensive.
@klappy klappy merged commit 6e01a00 into main Apr 17, 2026
5 checks passed
klappy added a commit that referenced this pull request Apr 17, 2026
Production preview test caught a real show-stopper: the MCP tool's mode
parameter Zod schema accepted only exploration/planning/execution but
the calibration governance article (odd/challenge/stakes-calibration)
defines 9 modes. The 6 writing-lifecycle modes (voice-dump, drafting,
peer-review-ready, canon-tier-2, canon-tier-1, published-essay) were
unreachable from the public API — schema validation rejected them
before runtime ever saw them.

Net effect of the bug: the voice-dump suppression invariant — the
load-bearing feature of PR #100, named in evidence as load-bearing —
could not be exercised through the public MCP tool. Internal tests
worked because they bypassed the schema. The CI 'Test CF Preview' job
failed on cold-start timeout (red herring); the real failure was
discovered by manual curl against the preview.

Two sites in workers/src/index.ts:
- Line 170: unified oddkit tool
- Line 235: dedicated oddkit_challenge tool

Both expanded to the full 9-mode enum. Description text updated to
explain the two mode families and call out voice-dump suppression.

Longer-term direction (not this commit): drop the enum entirely and
let canon be the validator. The runtime already validates against the
calibration table at fetchStakesCalibration time — having the schema
also enforce vocabulary is the same Vodka anti-pattern shape that
PR #100 fixed for stop words. Tracked as a follow-up.

Verification:
- npm run typecheck: clean
- workers/test/governance-parser.test.mjs: 97/97 against main
- Manual preview curl with mode=voice-dump pending after this deploys

Lesson: testing the running preview is not optional. PR #100 had
typecheck pass, parser test 97/97, smoke 6/6, AND production preview
deploy succeed — and still shipped a feature the schema blocked from
being called. Three layers of verification missed it because none of
them exercised the public API contract.
klappy added a commit that referenced this pull request Apr 17, 2026
Captures DOLCHE for the session that delivered PR #100 (governance-driven
challenge refactor with BM25 + stemming) and the unresolved schema bug
that made the voice-dump suppression invariant unreachable from the
public API.

Critical for next model picking up:
- PR #101 (prod promotion) is BLOCKED — schema fix not yet merged
- fix/challenge-mode-schema-includes-writing-modes has the fix
- After fix lands, manually curl preview with mode=voice-dump before promoting

Also records lessons for the next session: defect-class sweep discipline,
public API contract verification, parser test flakiness, and three
follow-up refactors carrying the same anti-pattern as challenge
pre-refactor (encode, gate, orient).
klappy added a commit that referenced this pull request Apr 18, 2026
Catalogs the Vodka anti-pattern — canon defines vocabulary, code hardcodes
interpretation — across all 11 oddkit tools. PR #100's
voice-dump schema bug (1h 39m prod breakage) was one instance of this
class. Audit identifies 5 tools carrying the same shape.

Findings:
  SEVERE: orient, gate, validate
  PARTIAL: encode, preflight
  CROSS-CUTTING: mode enum declared in 4 places
  CLEAN: challenge (post-refactor), search, get, catalog, version,
         time, cleanup_storage, telemetry_public

Refactor priority ranked by impact × tractability. validate named as
the most surprising: it gates 'done' but never reads
canon/constraints/definition-of-done.md — which in fact does not exist
in the repo despite three user-facing docs claiming it does.

Companion PR on klappy/klappy.dev establishes
canon/constraints/core-governance-baseline.md as the contract every
sweep refactor will conform to.
klappy added a commit that referenced this pull request Apr 18, 2026
Replaces the hardcoded self_report_headers dictionary with a runtime
parse of canon/constraints/telemetry-governance.md #### Self-Report
Fields table. Response envelope now declares governance_source:
'canon' when the fetch succeeds and the table parses, 'minimal' when
it falls back to the shipped baseline.

This is the canary refactor for the governance anti-pattern sweep
(docs/oddkit/audit/governance-anti-pattern-sweep-2026-04-17). It
conforms to the three-tier resolution contract drafted in
klappy/klappy.dev#101 (canon/constraints/core-governance-baseline),
exercising tiers 1 (live canon) and 3 (minimal baseline in code).
Tier 2 (bundled baseline directory with manifest) and the build-time
schema check arrive in follow-up work once the contract graduates
from status:draft to status:active.

Implementation:
- New helper parseSelfReportHeadersTable in index.ts parses the
  '### Self-Report Fields' table section from the canon doc.
- Parser is permissive (whitespace + backticks) and fails closed to
  null so the caller falls back to the minimal baseline rather than
  hiding the degradation.
- Minimal baseline remains the 8 stable headers; canon controls the
  descriptions once live.

Verified:
- npm run typecheck: clean
- Parser unit-tested against live canon content: 8/8 headers parsed
- Parser degradation paths (no section, empty table) return null

Refactor discipline this commit follows (from PR #100 post-mortem):
- Single feature PR, single site touched
- Public contract (MCP tool response) changes are additive
  (governance_source field added; self_report_headers keys unchanged)
- Preview smoke against live prod will verify canon-tier response
  before promotion
klappy added a commit that referenced this pull request Apr 18, 2026
…me (#106)

* feat(telemetry_policy): canary refactor — headers from canon at runtime

Replaces the hardcoded self_report_headers dictionary with a runtime
parse of canon/constraints/telemetry-governance.md #### Self-Report
Fields table. Response envelope now declares governance_source:
'canon' when the fetch succeeds and the table parses, 'minimal' when
it falls back to the shipped baseline.

This is the canary refactor for the governance anti-pattern sweep
(docs/oddkit/audit/governance-anti-pattern-sweep-2026-04-17). It
conforms to the three-tier resolution contract drafted in
klappy/klappy.dev#101 (canon/constraints/core-governance-baseline),
exercising tiers 1 (live canon) and 3 (minimal baseline in code).
Tier 2 (bundled baseline directory with manifest) and the build-time
schema check arrive in follow-up work once the contract graduates
from status:draft to status:active.

Implementation:
- New helper parseSelfReportHeadersTable in index.ts parses the
  '### Self-Report Fields' table section from the canon doc.
- Parser is permissive (whitespace + backticks) and fails closed to
  null so the caller falls back to the minimal baseline rather than
  hiding the degradation.
- Minimal baseline remains the 8 stable headers; canon controls the
  descriptions once live.

Verified:
- npm run typecheck: clean
- Parser unit-tested against live canon content: 8/8 headers parsed
- Parser degradation paths (no section, empty table) return null

Refactor discipline this commit follows (from PR #100 post-mortem):
- Single feature PR, single site touched
- Public contract (MCP tool response) changes are additive
  (governance_source field added; self_report_headers keys unchanged)
- Preview smoke against live prod will verify canon-tier response
  before promotion

* refactor(workers): extract parseTableRow to shared markdown-utils

* fix(telemetry_policy canary): read Description column + add parser tests

Addresses execution-mode challenge gaps on PR #106:

1. Information regression fixed: parser now reads canon's column 4
   (Description) instead of column 0 (Field label). Canon was extended
   with richer per-header descriptions in klappy/klappy.dev#102; this
   commit updates the parser to consume that new column.
2. Tests committed: Test 8 added to
   workers/test/governance-parser.test.mjs covering 8/8 header extraction,
   non-trivial description lengths, and degradation paths (no section,
   empty table). All 105 tests pass against the unmerged klappy.dev
   branch via KLAPPYDEV_RAW override.

Still outstanding (follow-up work, not blocking the canary):
- parseTableRow duplicated across workers/src/index.ts and
  workers/src/orchestrate.ts. Accepted duplication for now, flagged in
  both sites; export-and-share refactor lands when the sweep surfaces
  more duplication candidates.
- Preview smoke against Cloudflare preview with the extended canon
  loaded but no worker redeploy — run manually after this PR deploys.

Companion PR: klappy/klappy.dev#102 (canon extension). This worker
change is backward-compatible with the old 3-column table: the parser
requires 4 cols, so against the old canon it falls through to the
minimal baseline tier. Once klappy.dev#102 merges, canon tier takes
over.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
klappy added a commit that referenced this pull request Apr 19, 2026
…moke

Addresses three gaps found in live validation of canary PR #106:

1. Response envelope was missing server_time, assistant_text, debug.
   Every other oddkit tool returns {action, result, server_time,
   assistant_text, debug}; telemetry_policy returned only {action,
   result}. This breaks the time-discipline contract — project
   instructions require every oddkit response to carry server_time so
   models have a clock reading on every call.

2. canon_url parameter was silently ignored. The Zod schema was {},
   so MCP stripped canon_url before the handler saw it, and the
   handler hardcoded the default baseline. The three-tier resolution
   contract in canon/constraints/core-governance-baseline assumes
   every canon-driven tool accepts canon_url for overrides — this is
   load-bearing for TruthKit / custom-canon consumers.

3. No live-smoke test for the envelope shape. Parser tests in
   governance-parser.test.mjs exercised parser logic only. The
   canary shipped with partial contract conformance because no test
   invoked the MCP tool end-to-end and asserted the envelope shape.

Changes:
- Add canon_url to the tool's Zod schema; thread through to
  fetcher.getFile(path, canon_url).
- Expand response envelope to match convention: server_time,
  assistant_text (human-readable summary naming the tier), debug
  with duration_ms and canon_url echo.
- New workers/test/canon-tool-envelope.smoke.mjs — live smoke
  script that curls the MCP endpoint and verifies envelope shape
  for oddkit_time (convention baseline), telemetry_policy default
  (canon tier), and telemetry_policy with canon_url override
  (minimal fallback).

Verified:
- npm run typecheck: clean
- Smoke script structure matches PR #100's governance-parser test
  style and exits non-zero on any envelope violation.

Lesson for the sweep: every canon-driven refactor must verify both
the new governance_source signal AND full envelope conformance.
The canary's partial completion was caught by live validation but
should have been caught by pre-merge smoke. Follow-up to update
the refactor template in docs/oddkit/audit/... separately.
klappy added a commit to klappy/klappy.dev that referenced this pull request Apr 19, 2026
Two orphaned artifacts recovered:

1. odd/handoffs/2026-04-19-fresh-session-continuation.md
   Tonight's fresh-session continuation handoff. Created during the
   E0008.3 session and pushed to canon/dolcheo-ledger-open-items
   AFTER #107 was squash-merged, leaving it orphaned on the branch.
   Content: self-contained doc naming priority 1–6 work, the
   governing prompt-over-code refactor arc, and recommended first
   actions. Companion to the E0008.3 session ledger.

2. odd/ledger/journal/2026-04-17-pr100-rage-quit-handoff.md
   Historical ledger from 2026-04-17 session that ended with Klappy
   handing off to another model after a 12-hour bugbot ping-pong
   cycle on klappy/oddkit#100. Lived on oddkit branch
   chore/handoff-journal-pr100, never merged. The branch itself is
   unsafe to merge (also deletes the audit doc and smoke test that
   are load-bearing on main), so the ledger file was salvaged in
   isolation. Frontmatter added for canon consistency; status:
   archival; salvaged_from field records the original location.

   Content worth preserving: 15+ bugs caught across three review
   surfaces (gauntlet, bugbot, Klappy), defect-class blindness as
   the consistent failure pattern, the voice-dump invariant story
   that directly motivated the canary refactor and live-smoke
   ship-blocker contract in canon/constraints/core-governance-baseline.

   This is the session that retrospectively justified E0008.3. Its
   loss would have removed the empirical anchor for several canon
   decisions that now live in main.

Both files are now tracked and durably retrievable via klappy:// URIs.

Related cleanup to follow:
- oddkit branches chore/handoff-journal-pr100 and
  fix/challenge-mode-schema-includes-writing-modes can be deleted
  once this PR merges (their content is either superseded on main
  or salvaged here).
- oddkit/docs/drafts/core-governance-baseline.md is a stale
  pre-rename draft; content lives in klappy.dev canon now. Separate
  tiny cleanup in oddkit repo.
klappy added a commit that referenced this pull request Apr 20, 2026
….19.0)

P1.3.1 canary retrofit mirroring P1.2's encode pattern. Challenge's response
envelope now declares which tier served its governance vocabulary and exposes
the four peer canon URIs it reads.

Changes:
- Four governance helpers return {<domainNoun>, source} tuples:
  * discoverChallengeTypes → {types, source}
  * fetchBasePrerequisites → {prerequisites, source}
  * fetchNormativeVocabulary → {vocabulary, source}
  * fetchStakesCalibration → {calibration, source}
- runChallengeAction aggregates the four sources strictly: any helper
  falling through to hardcoded defaults → aggregate = "minimal".
  All four from canon → aggregate = "knowledge_base".
- Response envelope (both SUPPRESSED and CHALLENGED paths):
  + result.governance_source: "knowledge_base" | "minimal"
  + result.governance_uris: array of 4 peer URIs (alphabetical)
  + debug.knowledge_base_url echoes override
- Smoke test adds challenge envelope assertions + 9-mode parse integrity
  guard (PR #100 regression defense).

Design decision: governance_uris (plural array) intentionally diverges from
encode's singular governance_uri. Challenge reads four peer governance
files with no hierarchy among them; a single anchor would misrepresent
where base-prerequisites and normative-vocabulary live. Consumers that
want a singular anchor can read governance_uris[0] — alphabetical
ordering makes this stable. See CHANGELOG and PRD for full rationale.

Two-tier cascade today ("knowledge_base" | "minimal") because
workers/baseline/ is not yet shipped; the bundled tier from
canon/constraints/core-governance-baseline is a contract aspiration.
Union expands additively when baseline ships.

Verification:
- npm run typecheck: clean
- node workers/test/governance-parser.test.mjs: 105/105 pass
- Live smoke against preview URL: pending (next step)

PRD: /home/claude/work/prd-p1-3-1-challenge-canary.md (local planning artifact)
Closes: P1.3.1 (per klappy://odd/handoffs/2026-04-20-p1-3-challenge-canary)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants