Conversation
…vocabulary Brings the second half of challenge detection vocabulary into governance. Previously the oddkit worker hardcoded a CHALLENGE_STOP_WORDS Set in workers/src/orchestrate.ts — a Vodka Architecture violation in a refactor explicitly about removing such violations from source code. Adds a new "## Detection Noise" section to normative-vocabulary.md as a code block of common filler words to filter from user input before BM25 scoring. The section deliberately excludes modal verbs (must, should, shall, may), negation (not, no, never, always), and auxiliary verbs (do, does, did, have, has, had) — those are signal for strong-claim, proposal, and assumption type detection. Filtering them would silently break those type matches. Article now governs both surfaces of challenge detection vocabulary: - Signal in retrieved canon quotes (existing two tables under ## Normative Vocabulary) - Noise in user input matched against per-type detection text (new ## Detection Noise section) Blockquote, summary, and notes updated to reflect the dual-surface scope. Other domains (legal, theological, narrative) extend or prune both surfaces together as a single canon edit. Companion oddkit PR will land next: drop the hardcoded CHALLENGE_STOP_WORDS constant, fetch this section via the existing fetchNormativeVocabulary helper, fall back to empty filter when the section is absent.
klappy
added a commit
to klappy/oddkit
that referenced
this pull request
Apr 17, 2026
Caught in PR #100 review by Klappy: the CHALLENGE_STOP_WORDS Set added mid-PR to fix a BM25 over-match was itself a Vodka Architecture violation in a refactor explicitly about removing such violations. The constant carried a domain opinion ('modals are signal, articles are filler in challenge detection') that belonged in canon, not in worker source. Anti-pattern fixed: - Drop the hardcoded CHALLENGE_STOP_WORDS Set from workers/src/orchestrate.ts - Drop the duplicate hardcoded copy from workers/test/governance-parser.test.mjs - Extend NormativeVocabulary interface with stopWords: Set<string> - Extend fetchNormativeVocabulary to extract '## Detection Noise' code block from odd/challenge/normative-vocabulary.md (lands in klappy.dev#100) - Move BM25 index build out of discoverChallengeTypes into a new lazy builder getOrBuildChallengeTypeIndex(types, vocab, canonUrl) so the index can use governance-sourced stop words rather than a constant - Update parser test to fetch Detection Noise the same way the worker does — no hardcoded duplicate, no drift risk. Test gains 3 new assertions: Detection Noise parses non-empty, excludes modal verbs, includes common filler Net hardcoded-constants delta: this PR removes ~6 classes of hardcoded domain opinion (claim type detection, questions, prereqs, tension regex, reframings, stop words) and adds zero. The remaining minimal RFC 2119 fallback ('MUST', 'MUST NOT', 'SHOULD', 'SHOULD NOT') and 'planning' default mode are server-availability fallbacks for when canon is unreachable, not domain governance. Test currently runs against the feature branch via KLAPPYDEV_RAW env override. After klappy.dev#100 merges, the override comes off and the test reads from main with no further changes. Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs (vs feature branch): 97/97 pass - tests/smoke.sh: 6/6 pass - grep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches Refs: - Caught in: this PR review by Klappy - Depends on: klappy/klappy.dev#100 (Detection Noise section) - Lesson: 'is this the right architectural shape' is a category the current gauntlet does not catch — the tools verify governance content, not whether new code is creating new ungoverned content. Possible future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in worker source and asks 'should this be in canon?'
klappy
added a commit
that referenced
this pull request
Apr 18, 2026
Adds canon/constraints/core-governance-baseline.md establishing the three-tier resolution stack (live canon → bundled baseline → fail-loud) that every oddkit canon-driven tool must conform to. Context: PR #100's voice-dump suppression bug was a canon/code drift (schema said 3 modes, canon defined 9) that shipped to prod for 1h 39m because no contract governed how tools reconcile canon vs shipped code. The governance anti-pattern sweep audit identified 5 of 11 tools with the same shape of bug. This contract is the architectural answer the sweep refactors conform to. Key provisions: - Three-tier resolution per governance file: canon (preferred) → bundled baseline (fallback) → fail-loud error envelope - Response envelope declares governance_source on every call - Six required-baseline files enumerated - Baseline regenerated from canon at build; build-time schema check fails deploy if baseline and canon diverge - Fail-loud envelope includes actionable resolution block with reference_content_url pointing at oddkit-hosted canon (reference, not mandatory) - New tool oddkit_baseline_check probes canon completeness pre-deploy Passes Writing Canon (5 tiers verified). Converged after 5 challenge rounds (no blocking objections). Ships tier:1 status:draft; graduates to status:active after the canary refactor (telemetry_policy) lands following this contract. Companion PR: klappy/oddkit audit/governance-anti-pattern-sweep
This was referenced Apr 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Move challenge detection stop words from oddkit code into canon governance
What
Adds a new
## Detection Noisesection toodd/challenge/normative-vocabulary.mdas a code block of common filler words. Updates the blockquote, Summary, and Notes to acknowledge the article's now-dual scope: signal in retrieved canon quotes (existing) plus noise in user input matched against per-type detection text (new).Why
The oddkit worker currently hardcodes a
CHALLENGE_STOP_WORDSSet inworkers/src/orchestrate.ts— a Vodka Architecture violation in a refactor (oddkit#100) that was explicitly about removing such violations from source code. The hardcoded constant carries a domain opinion ("modals are signal, articles are filler in challenge detection") that belongs in canon, not in worker source.Caught in PR #100 review by Klappy. The gauntlet didn't surface it — the category "is this the right architectural shape" requires a different lens than the current tools provide.
Scope decision
Single article, two surfaces — chose Option A from the discussion. Pros: one fetcher pattern, drift-free domain vocabulary, atomic edits when extending or pruning to a new domain. The two surfaces are two roles of the same domain opinion ("what counts as content vs filler in this domain").
Modal verbs are deliberately absent from the filler list
must,should,shall,may,might,can,could,will,would,not,no,never,always,do,does,did,have,has,hadare all NOT in the Detection Noise list. They are the load-bearing trigger words for thestrong-claim,proposal, andassumptionchallenge types. Filtering them would silently break those type detections — exactly the bug the BM25 pivot in oddkit#100 caught.Companion PR
oddkit#101 (will follow): drop the hardcoded
CHALLENGE_STOP_WORDSconstant, extendfetchNormativeVocabularyto extract the new section into a Set, return it on the vocab object, consume it indiscoverChallengeTypeswhen building the per-type BM25 index. Backward-compatible — empty Set when section is absent (server falls back to no filter, IDF only).Verification
governsfield broadened to reflect dual scopedatebumped to 2026-04-17Refs
Note
Low Risk
Documentation-only canon governance changes; no runtime code changes in this PR, with low risk aside from downstream tooling interpreting the new
## Detection Noisesection incorrectly.Overview
Updates
odd/challenge/normative-vocabulary.mdto explicitly cover two detection surfaces: signal words/phrases in retrieved canon quotes (tension detection) and a new## Detection Noisestop-word list for filtering user input before BM25 scoring.Adds the noise vocabulary section (code block of common filler words) and revises frontmatter, summary, and notes to describe how the server should parse/apply both vocabularies, including fallback behavior when the new section is missing/empty.
Reviewed by Cursor Bugbot for commit 7c0e65f. Bugbot is set up for automated code reviews on this repo. Configure here.