Skip to content

Commit 33ca5bf

Browse files
authored
feat(challenge): D5 stemmed prereq matcher + D9 cache removal (0.21.0) (#120)
Closes the last two vodka anti-pattern remnants in oddkit_challenge per P1.3.3, mirroring the matchers and the no-microsecond-caching discipline gate shipped in 0.20.0. Item 1 — D5 split-by-fit applied to challenge: - evaluatePrerequisiteCheck migrated from regex-per-check to stemmed set intersection over PrereqMatchVocab (parsed once at canon-fetch). - New parseCheckColumn helper extracts quoted vocabulary -> stemmedTokens Set and detects the four structural-test hints (URL, numeric, proper-noun, citation). - BasePrerequisite + ChallengeTypeDef.prerequisiteOverlays both extended with PrereqMatchVocab via interface mixin. - Runtime: tokenize(input) hoisted out of the per-prereq loop; per-prereq cost is now a Set lookup not a regex compile. - Strictly additive: every input that matched the prior regex still matches; stemmed variations newly match; structural side-tests preserved verbatim from pre-refactor. Item 2 — D9 applied to cachedChallengeTypeIndex: - Module-level cachedChallengeTypeIndex + URL companion deleted. - getOrBuildChallengeTypeIndex function deleted. - cleanup_storage resets deleted. - runChallengeAction call site rebuilds the BM25 type index inline per request (microsecond derivation; plumbing tax removed). Same pattern gate shipped in 0.20.0. Item 3 — graduates new canon principle (separate canon PR, merged first): - klappy://canon/principles/cache-fetches-and-parses live at klappy.dev 3726073 (PR #125). Third deciding-argument recurrence satisfied. Verification: - typecheck clean - governance-parser.test.mjs 105/105 pass - ~9 new smoke assertions covering stemmed base + per-type matches, structural-test preservation (URL, proper-noun, citation), rebuild stability, and pre-refactor backward compat. - Lockfile re-synced to 0.21.0 (was stale at 0.18.0 since 0.19.0 release). PRD: /home/claude/work/prd-p1-3-3.md (working dir, not committed). Handoff: klappy://odd/handoffs/2026-04-20-p1-3-3-challenge-revisit.
1 parent 260492c commit 33ca5bf

6 files changed

Lines changed: 284 additions & 69 deletions

File tree

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.21.0] - 2026-04-20
11+
12+
### Changed
13+
14+
- **`oddkit_challenge` prerequisite evaluation migrated from regex-per-check to stemmed set intersection** (per PRD D5 from P1.3.2 — split-by-fit). Each prereq now evaluates via `Array.from(prereq.stemmedTokens).some(s => inputStems.has(s))` over a Set computed once at canon-fetch time, with `tokenize(input)` hoisted out of the per-prereq loop. **Strictly additive**: every input that matched the prior regex still matches, plus stemmed variations now do too — `problems identified` satisfies `evidence-cited` (stems `problem` + `identif`), `considered alternatives` satisfies `alternatives-considered` (stems `consid` + `altern`), `acknowledged the risks` satisfies `risk-acknowledged` (stems `acknowledg` + `risk`). The four structural side-tests (URL / numeric / proper-noun / citation) preserved verbatim from the pre-refactor evaluator because they cover cases the keyword vocabulary cannot — `source-named` inputs like `"here's the URL: https://..."` have no stemmed overlap with the vocab `per / according to / from / source: / who said / where i read` but the URL structural test catches them. The conservative no-keyword-no-flag fallback (pass on `input.trim().length >= 20`) also preserved. Same matcher gate shipped in 0.20.0.
15+
16+
- **`oddkit_challenge` type-detection BM25 index cache removed** (per PRD D9 from P1.3.2 — don't cache microsecond derivations). `cachedChallengeTypeIndex` and `cachedChallengeTypeIndexKnowledgeBaseUrl` module-level fields deleted; `getOrBuildChallengeTypeIndex` function deleted; `cleanup_storage` resets deleted; the call site in `runChallengeAction` rebuilds the BM25 index inline per request via `buildBM25Index(types.map(t => ({id: t.slug, text: t.detectionText})), vocab.stopWords)`. Same pattern gate shipped in 0.20.0. Removes module-level cache state, URL-keyed invalidation logic, cleanup_storage wiring, and drift risk when source data changes — the four hidden costs enumerated in the new canon principle. Parse-product caches (`cachedChallengeTypes`, `cachedBasePrerequisites`, `cachedNormativeVocabulary`, `cachedStakesCalibration`) remain — those are actual parse work.
17+
18+
### Added
19+
20+
- **New canon principle:** `klappy://canon/principles/cache-fetches-and-parses` (klappy.dev#125, merged `3726073`). Graduates the "cache fetches and parses, not microsecond derivations" pattern to canon as a tier-2 principle after its third deciding-argument recurrence across the tool sweep: 0.18.0 encode parse-product caching (implicit), 0.20.0 gate D9 (first explicit), 0.21.0 challenge `cachedChallengeTypeIndex` removal (second explicit). Names the two halves of the principle, enumerates the four-cost plumbing tax, and anchors the threshold to current corpus sizes (6–9 challenge types, 4 gate transitions, 8 base prereqs).
21+
22+
- **New shared interface `PrereqMatchVocab`** in `workers/src/orchestrate.ts` capturing `stemmedTokens: Set<string>` plus four boolean structural-test flags (`hasURLCheck`, `hasNumericCheck`, `hasProperNounCheck`, `hasCitationCheck`). Mixed into both `BasePrerequisite` and the inline type on `ChallengeTypeDef.prerequisiteOverlays[]` to keep per-type and base-prereq structs in sync. Populated by the new `parseCheckColumn(check: string)` helper at canon-fetch time in both `discoverChallengeTypes` and `fetchBasePrerequisites`.
23+
24+
### Known limitations
25+
26+
- Same as 0.20.0 — Porter-style stemmer does not reverse consonant gemination (`shipping``shipp`, not `ship`); affected vocabulary is fixed at canon tier per `klappy.dev#122` precedent. `getIndex` strict-mode (`skipBaselineFallback`) still pending across encode/challenge/gate (carry-forward O-open P2).
27+
1028
## [0.20.0] - 2026-04-20
1129

1230
### Added

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "oddkit",
3-
"version": "0.20.0",
3+
"version": "0.21.0",
44
"description": "Agent-first CLI for ODD-governed repos. Epistemic terrain rendering with portable baseline.",
55
"type": "module",
66
"bin": {

workers/package-lock.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

workers/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "oddkit-mcp-worker",
3-
"version": "0.20.0",
3+
"version": "0.21.0",
44
"private": true,
55
"type": "module",
66
"scripts": {

workers/src/orchestrate.ts

Lines changed: 129 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,13 @@ interface ChallengeTypeDef {
9191
triggerWords: string[];
9292
detectionText: string; // triggerWords + blockquote, fed to BM25 indexer
9393
questions: Array<{ question: string; tier: string }>;
94-
prerequisiteOverlays: Array<{ prerequisite: string; check: string; gapMessage: string }>;
94+
prerequisiteOverlays: Array<
95+
{
96+
prerequisite: string;
97+
check: string;
98+
gapMessage: string;
99+
} & PrereqMatchVocab
100+
>;
95101
reframings: string[];
96102
fallback: boolean;
97103
}
@@ -100,6 +106,26 @@ interface BasePrerequisite {
100106
prerequisite: string;
101107
check: string;
102108
gapMessage: string;
109+
// Per PRD D2 (P1.3.3): parse products populated at canon-fetch time.
110+
// stemmedTokens is the stemmed form of quoted keywords in `check`;
111+
// the four has*Check booleans flag structural-test hints detected in
112+
// the check description. See parseCheckColumn below. These are parse
113+
// products per klappy://canon/principles/cache-fetches-and-parses.
114+
stemmedTokens: Set<string>;
115+
hasURLCheck: boolean;
116+
hasNumericCheck: boolean;
117+
hasProperNounCheck: boolean;
118+
hasCitationCheck: boolean;
119+
}
120+
121+
/** Shared shape for the runtime match vocabulary attached to challenge
122+
* prereqs. Keeps the per-type and base-prereq structs in sync (DRY). */
123+
interface PrereqMatchVocab {
124+
stemmedTokens: Set<string>;
125+
hasURLCheck: boolean;
126+
hasNumericCheck: boolean;
127+
hasProperNounCheck: boolean;
128+
hasCitationCheck: boolean;
103129
}
104130

105131
// Gate governance types — P1.3.2 (0.20.0). Consumed by runGateAction via
@@ -160,8 +186,12 @@ interface StakesCalibration {
160186
let cachedChallengeTypes: ChallengeTypeDef[] | null = null;
161187
let cachedChallengeTypesKnowledgeBaseUrl: string | undefined = undefined;
162188
let cachedChallengeTypesSource: "knowledge_base" | "minimal" = "minimal";
163-
let cachedChallengeTypeIndex: BM25Index | null = null;
164-
let cachedChallengeTypeIndexKnowledgeBaseUrl: string | undefined = undefined;
189+
// Note: challenge's BM25 type-detection index is NOT cached — per
190+
// klappy://canon/principles/cache-fetches-and-parses, rebuilding a BM25
191+
// index over challenge's 6–9-type corpus is a microsecond derivation and
192+
// the plumbing tax (URL-keyed invalidation + cleanup_storage wiring +
193+
// drift risk) costs more than the rebuild. Inline-built at the call site
194+
// in runChallengeAction, same pattern as gate's transition index (0.20.0).
165195
let cachedBasePrerequisites: BasePrerequisite[] | null = null;
166196
let cachedBasePrerequisitesKnowledgeBaseUrl: string | undefined = undefined;
167197
let cachedBasePrerequisitesSource: "knowledge_base" | "minimal" = "minimal";
@@ -550,15 +580,19 @@ async function discoverChallengeTypes(
550580
}
551581
}
552582

553-
// Prerequisite Overlays table — rows of (Prerequisite, Check, Gap message)
583+
// Prerequisite Overlays table — rows of (Prerequisite, Check, Gap message).
584+
// Per P1.3.3 PRD D2: each row is enriched with PrereqMatchVocab (stemmed
585+
// tokens + structural-test flags) at parse time; see parseCheckColumn.
554586
const prereqSection = content.match(
555587
/## Prerequisite Overlays[\s\S]*?\| Prerequisite[\s\S]*?\|[-|\s]+\|\n([\s\S]*?)(?=\n\n|\n##|$)/,
556588
);
557-
const prerequisiteOverlays: Array<{
558-
prerequisite: string;
559-
check: string;
560-
gapMessage: string;
561-
}> = [];
589+
const prerequisiteOverlays: Array<
590+
{
591+
prerequisite: string;
592+
check: string;
593+
gapMessage: string;
594+
} & PrereqMatchVocab
595+
> = [];
562596
if (prereqSection) {
563597
for (const row of prereqSection[1].split("\n").filter((r: string) => r.includes("|"))) {
564598
const cols = parseTableRow(row);
@@ -569,6 +603,7 @@ async function discoverChallengeTypes(
569603
prerequisite: cols[0],
570604
check: cols[1],
571605
gapMessage: gap,
606+
...parseCheckColumn(cols[1]),
572607
});
573608
}
574609
}
@@ -620,37 +655,14 @@ async function discoverChallengeTypes(
620655
// rather than inventing a built-in fallback registry — see PRD D7).
621656
const source: "knowledge_base" | "minimal" = types.length > 0 ? "knowledge_base" : "minimal";
622657
cachedChallengeTypesSource = source;
623-
// Index build deferred — needs vocab.stopWords from fetchNormativeVocabulary,
624-
// assembled lazily by getOrBuildChallengeTypeIndex below. Both types and the
625-
// index are deterministic functions of knowledgeBaseUrl, so caching by knowledgeBaseUrl
626-
// remains safe.
658+
// Note: the BM25 type-detection index over per-type detection text is
659+
// NOT cached — it's a microsecond derivation over already-cached parse
660+
// products, rebuilt inline per request in runChallengeAction. See
661+
// klappy://canon/principles/cache-fetches-and-parses for the principle
662+
// and the plumbing-tax argument.
627663
return { types, source };
628664
}
629665

630-
/** Lazily build (or return cached) per-knowledgeBaseUrl BM25 index over the per-type
631-
* detection text, using governance-sourced stop words from normative-vocabulary.md.
632-
* The cache is keyed on knowledgeBaseUrl so different canon sources do not contaminate
633-
* each other's indexes. */
634-
function getOrBuildChallengeTypeIndex(
635-
types: ChallengeTypeDef[],
636-
vocab: NormativeVocabulary,
637-
knowledgeBaseUrl?: string,
638-
): BM25Index {
639-
if (cachedChallengeTypeIndex && cachedChallengeTypeIndexKnowledgeBaseUrl === knowledgeBaseUrl) {
640-
return cachedChallengeTypeIndex;
641-
}
642-
// Build BM25 index over per-type detection text (triggers + blockquote).
643-
// Stemming handles morphology; IDF weights distinctive trigger terms above filler.
644-
// vocab.stopWords comes from `## Detection Noise` in normative-vocabulary.md;
645-
// it deliberately preserves modal verbs and negation as signal. An empty
646-
// Set means no filtering (governance opted into IDF-only scoring).
647-
const bm25Docs = types.map((t) => ({ id: t.slug, text: t.detectionText }));
648-
const bm25Index = buildBM25Index(bm25Docs, vocab.stopWords);
649-
cachedChallengeTypeIndex = bm25Index;
650-
cachedChallengeTypeIndexKnowledgeBaseUrl = knowledgeBaseUrl;
651-
return bm25Index;
652-
}
653-
654666
// Gate minimal-tier vocabulary — P1.3.2 D6. Used when canon is unreachable
655667
// or missing required sections. Vocabulary mirrors the pre-0.20.0 hardcoded
656668
// detectTransition regexes (L306–L324 pre-refactor) and checkPatterns map
@@ -847,6 +859,7 @@ async function fetchBasePrerequisites(
847859
prerequisite: cols[0],
848860
check: cols[1],
849861
gapMessage: cols[2].replace(/^"|"$/g, ""),
862+
...parseCheckColumn(cols[1]),
850863
});
851864
}
852865
}
@@ -1515,8 +1528,6 @@ async function runCleanupStorage(
15151528
cachedChallengeTypes = null;
15161529
cachedChallengeTypesKnowledgeBaseUrl = undefined;
15171530
cachedChallengeTypesSource = "minimal";
1518-
cachedChallengeTypeIndex = null;
1519-
cachedChallengeTypeIndexKnowledgeBaseUrl = undefined;
15201531
cachedBasePrerequisites = null;
15211532
cachedBasePrerequisitesKnowledgeBaseUrl = undefined;
15221533
cachedBasePrerequisitesSource = "minimal";
@@ -2023,9 +2034,15 @@ async function runChallengeAction(
20232034
// Detection runs BEFORE the voice-dump suppression check so the SUPPRESSED
20242035
// response can still expose `governance` — the model sees what would have
20252036
// fired without surfacing the pressure-test questions.
2037+
// Build BM25 type-detection index inline per request (not cached) —
2038+
// per klappy://canon/principles/cache-fetches-and-parses, a BM25 index
2039+
// over challenge's 6–9-type corpus is a microsecond derivation and the
2040+
// plumbing tax is not worth the rebuild cost. Parse products (types,
2041+
// vocab) are cached upstream; the index is just a reshape.
20262042
// Stop words come from `## Detection Noise` in normative-vocabulary.md
20272043
// (governance), not a hardcoded constant in this file.
2028-
const typeIndex = getOrBuildChallengeTypeIndex(types, vocab, knowledgeBaseUrl);
2044+
const bm25Docs = types.map((t) => ({ id: t.slug, text: t.detectionText }));
2045+
const typeIndex = buildBM25Index(bm25Docs, vocab.stopWords);
20292046
const matchedTypes: ChallengeTypeDef[] = [];
20302047
const hits = searchBM25(typeIndex, input, types.length);
20312048
const typeBySlug = new Map(types.map((t) => [t.slug, t]));
@@ -2124,9 +2141,14 @@ async function runChallengeAction(
21242141
}
21252142

21262143
const strictness = modeConfig?.prerequisiteStrictness?.toLowerCase() || "required";
2144+
// Hoist tokenize(input) out of the per-prereq loop — input is constant across
2145+
// the loop, stemmedTokens differ per prereq. Per PRD D3 (P1.3.3): stemmed
2146+
// set intersection at runtime, structural tests preserved, no regex compile
2147+
// per check. This is the fit-to-problem matcher per D5.
2148+
const inputStems = new Set(tokenize(input));
21272149
const missing: string[] = [];
21282150
for (const p of prereqMap.values()) {
2129-
const passed = evaluatePrerequisiteCheck(input, p.check);
2151+
const passed = evaluatePrerequisiteCheck(inputStems, input, p);
21302152
if (!passed) {
21312153
// source-named check is escalated to blocking when strictness says so
21322154
if (strictness.includes("optional") && !p.prerequisite.includes("source-named")) {
@@ -2288,36 +2310,78 @@ async function runChallengeAction(
22882310
};
22892311
}
22902312

2291-
// Governance-driven check evaluator — interprets natural-language `check` strings
2292-
// from ## Prerequisite Overlays tables. Uses cheap heuristics: substring matching
2293-
// against quoted keywords in the check description, plus a few special-case patterns.
2294-
function evaluatePrerequisiteCheck(input: string, check: string): boolean {
2295-
// Extract quoted keywords like "evidence", "observed", "alternative"
2296-
const quotedKeywords: string[] = [];
2313+
// Parse-time helper: extract quoted keywords from a `check` description and
2314+
// detect the four structural-test hints. Called at canon-fetch time from
2315+
// both discoverChallengeTypes (per-type prereqs) and fetchBasePrerequisites
2316+
// (universal prereqs). Produces a PrereqMatchVocab that the runtime consumes
2317+
// via evaluatePrerequisiteCheck. Per klappy://canon/principles/cache-fetches-
2318+
// and-parses, this is a parse product: the Set is the stemmed form of the
2319+
// canon's vocabulary and is cached alongside the rest of the prereq struct.
2320+
function parseCheckColumn(check: string): PrereqMatchVocab {
22972321
const quotedRegex = /"([^"]+)"/g;
2322+
const stemmedTokens = new Set<string>();
22982323
let m: RegExpExecArray | null;
22992324
while ((m = quotedRegex.exec(check)) !== null) {
2300-
quotedKeywords.push(m[1]);
2301-
}
2302-
2303-
if (quotedKeywords.length > 0) {
2304-
// Pass if ANY quoted keyword appears in input (case-insensitive, word-boundary where possible)
2305-
for (const kw of quotedKeywords) {
2306-
const escaped = kw.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
2307-
// Use word-boundary for single words, substring for phrases
2308-
const pattern = /^\w+$/.test(kw) ? new RegExp("\\b" + escaped + "\\b", "i") : new RegExp(escaped, "i");
2309-
if (pattern.test(input)) return true;
2325+
// Tokenize each quoted keyword or phrase — multi-word phrases like
2326+
// "according to" contribute multiple stems; stop-words are dropped
2327+
// by tokenize(). This preserves semantic coverage while normalizing
2328+
// morphology (problems → problem, considered → consid, etc.).
2329+
for (const stem of tokenize(m[1])) {
2330+
stemmedTokens.add(stem);
23102331
}
2311-
// Special-case check descriptions that mention URLs, citations, numeric markers
2312-
if (/\bURL\b/i.test(check) && /https?:\/\//.test(input)) return true;
2313-
if (/numeric/i.test(check) && /\d/.test(input)) return true;
2314-
if (/proper-?noun/i.test(check) && /\b[A-Z][a-z]+\s+[A-Z]/.test(input)) return true;
2315-
if (/citation/i.test(check) && /\[\d+\]|\bper\s+[A-Z]|\baccording to\b/i.test(input)) return true;
2316-
return false;
23172332
}
2333+
return {
2334+
stemmedTokens,
2335+
hasURLCheck: /\bURL\b/i.test(check),
2336+
hasNumericCheck: /\bnumeric\b/i.test(check),
2337+
hasProperNounCheck: /\bproper-?noun\b/i.test(check),
2338+
hasCitationCheck: /\bcitation\b/i.test(check),
2339+
};
2340+
}
23182341

2319-
// No quoted keywords: conservative fallback — passes if input is non-trivial
2320-
return input.trim().length >= 20;
2342+
// Governance-driven check evaluator — runtime pairing for parseCheckColumn.
2343+
// Per PRD D5 (split-by-fit): prereq evaluation is independent gap-or-not per
2344+
// prereq, not ranked. Stemmed set intersection is the fit-to-problem matcher
2345+
// and catches morphological variations that the prior regex cascade missed
2346+
// (e.g. "problems identified" now stems to `problem` + `identif` and matches
2347+
// a prereq whose vocab includes `problem`). Structural side-tests (URL,
2348+
// numeric, proper-noun, citation) preserved from the pre-refactor evaluator
2349+
// because they cover cases the keyword vocabulary can't — `source-named`
2350+
// inputs like "here's the URL: https://..." have no stemmed overlap with the
2351+
// vocab `per / according to / from / source: / who said / where i read` but
2352+
// the URL structural test catches them. Strictly additive over the prior
2353+
// regex: every input that matched pre-refactor still matches post-refactor.
2354+
function evaluatePrerequisiteCheck(
2355+
inputStems: Set<string>,
2356+
rawInput: string,
2357+
prereq: PrereqMatchVocab,
2358+
): boolean {
2359+
// Token match — stemmed set intersection.
2360+
for (const s of prereq.stemmedTokens) {
2361+
if (inputStems.has(s)) return true;
2362+
}
2363+
// Structural tests — preserved from pre-refactor evaluator. Check against
2364+
// the raw input because these patterns are inherently case- and shape-
2365+
// sensitive (URLs, proper-noun capitalization, bracketed citations).
2366+
if (prereq.hasURLCheck && /https?:\/\//.test(rawInput)) return true;
2367+
if (prereq.hasNumericCheck && /\d/.test(rawInput)) return true;
2368+
if (prereq.hasProperNounCheck && /\b[A-Z][a-z]+\s+[A-Z]/.test(rawInput)) return true;
2369+
if (prereq.hasCitationCheck && /\[\d+\]|\bper\s+[A-Z]|\baccording to\b/i.test(rawInput)) {
2370+
return true;
2371+
}
2372+
// Conservative fallback: prereqs whose check description had NO quoted
2373+
// keywords AND NO structural hints pass on any non-trivial input. This
2374+
// preserves the pre-refactor fallback behavior (`input.trim().length >= 20`).
2375+
if (
2376+
prereq.stemmedTokens.size === 0 &&
2377+
!prereq.hasURLCheck &&
2378+
!prereq.hasNumericCheck &&
2379+
!prereq.hasProperNounCheck &&
2380+
!prereq.hasCitationCheck
2381+
) {
2382+
return rawInput.trim().length >= 20;
2383+
}
2384+
return false;
23212385
}
23222386

23232387
async function runGateAction(

0 commit comments

Comments
 (0)