feat(challenge): add Detection Noise vocabulary to normative-vocabulary by klappy · Pull Request #100 · klappy/klappy.dev

klappy · 2026-04-17T07:31:21Z

Move challenge detection stop words from oddkit code into canon governance

What

Adds a new ## Detection Noise section to odd/challenge/normative-vocabulary.md as a code block of common filler words. Updates the blockquote, Summary, and Notes to acknowledge the article's now-dual scope: signal in retrieved canon quotes (existing) plus noise in user input matched against per-type detection text (new).

Why

The oddkit worker currently hardcodes a CHALLENGE_STOP_WORDS Set in workers/src/orchestrate.ts — a Vodka Architecture violation in a refactor (oddkit#100) that was explicitly about removing such violations from source code. The hardcoded constant carries a domain opinion ("modals are signal, articles are filler in challenge detection") that belongs in canon, not in worker source.

Caught in PR #100 review by Klappy. The gauntlet didn't surface it — the category "is this the right architectural shape" requires a different lens than the current tools provide.

Scope decision

Single article, two surfaces — chose Option A from the discussion. Pros: one fetcher pattern, drift-free domain vocabulary, atomic edits when extending or pruning to a new domain. The two surfaces are two roles of the same domain opinion ("what counts as content vs filler in this domain").

Modal verbs are deliberately absent from the filler list

must, should, shall, may, might, can, could, will, would, not, no, never, always, do, does, did, have, has, had are all NOT in the Detection Noise list. They are the load-bearing trigger words for the strong-claim, proposal, and assumption challenge types. Filtering them would silently break those type detections — exactly the bug the BM25 pivot in oddkit#100 caught.

Companion PR

oddkit#101 (will follow): drop the hardcoded CHALLENGE_STOP_WORDS constant, extend fetchNormativeVocabulary to extract the new section into a Set, return it on the vocab object, consume it in discoverChallengeTypes when building the per-type BM25 index. Backward-compatible — empty Set when section is absent (server falls back to no filter, IDF only).

Verification

AI voice clichés audit on new prose: clean
Summary section preserved (Writing Canon tier 2 requirement)
Frontmatter governs field broadened to reflect dual scope
Frontmatter date bumped to 2026-04-17

Refs

Caught in: feat(challenge): governance-driven runChallengeAction (E0008) oddkit#100 review feedback
Will unblock: oddkit follow-up PR (drop hardcoded constant, read from this section)

Note

Low Risk
Documentation-only canon governance changes; no runtime code changes in this PR, with low risk aside from downstream tooling interpreting the new ## Detection Noise section incorrectly.

Overview
Updates odd/challenge/normative-vocabulary.md to explicitly cover two detection surfaces: signal words/phrases in retrieved canon quotes (tension detection) and a new ## Detection Noise stop-word list for filtering user input before BM25 scoring.

Adds the noise vocabulary section (code block of common filler words) and revises frontmatter, summary, and notes to describe how the server should parse/apply both vocabularies, including fallback behavior when the new section is missing/empty.

^{Reviewed by Cursor Bugbot for commit 7c0e65f. Bugbot is set up for automated code reviews on this repo. Configure here.}

…vocabulary Brings the second half of challenge detection vocabulary into governance. Previously the oddkit worker hardcoded a CHALLENGE_STOP_WORDS Set in workers/src/orchestrate.ts — a Vodka Architecture violation in a refactor explicitly about removing such violations from source code. Adds a new "## Detection Noise" section to normative-vocabulary.md as a code block of common filler words to filter from user input before BM25 scoring. The section deliberately excludes modal verbs (must, should, shall, may), negation (not, no, never, always), and auxiliary verbs (do, does, did, have, has, had) — those are signal for strong-claim, proposal, and assumption type detection. Filtering them would silently break those type matches. Article now governs both surfaces of challenge detection vocabulary: - Signal in retrieved canon quotes (existing two tables under ## Normative Vocabulary) - Noise in user input matched against per-type detection text (new ## Detection Noise section) Blockquote, summary, and notes updated to reflect the dual-surface scope. Other domains (legal, theological, narrative) extend or prune both surfaces together as a single canon edit. Companion oddkit PR will land next: drop the hardcoded CHALLENGE_STOP_WORDS constant, fetch this section via the existing fetchNormativeVocabulary helper, fall back to empty filter when the section is absent.

Caught in PR #100 review by Klappy: the CHALLENGE_STOP_WORDS Set added mid-PR to fix a BM25 over-match was itself a Vodka Architecture violation in a refactor explicitly about removing such violations. The constant carried a domain opinion ('modals are signal, articles are filler in challenge detection') that belonged in canon, not in worker source. Anti-pattern fixed: - Drop the hardcoded CHALLENGE_STOP_WORDS Set from workers/src/orchestrate.ts - Drop the duplicate hardcoded copy from workers/test/governance-parser.test.mjs - Extend NormativeVocabulary interface with stopWords: Set<string> - Extend fetchNormativeVocabulary to extract '## Detection Noise' code block from odd/challenge/normative-vocabulary.md (lands in klappy.dev#100) - Move BM25 index build out of discoverChallengeTypes into a new lazy builder getOrBuildChallengeTypeIndex(types, vocab, canonUrl) so the index can use governance-sourced stop words rather than a constant - Update parser test to fetch Detection Noise the same way the worker does — no hardcoded duplicate, no drift risk. Test gains 3 new assertions: Detection Noise parses non-empty, excludes modal verbs, includes common filler Net hardcoded-constants delta: this PR removes ~6 classes of hardcoded domain opinion (claim type detection, questions, prereqs, tension regex, reframings, stop words) and adds zero. The remaining minimal RFC 2119 fallback ('MUST', 'MUST NOT', 'SHOULD', 'SHOULD NOT') and 'planning' default mode are server-availability fallbacks for when canon is unreachable, not domain governance. Test currently runs against the feature branch via KLAPPYDEV_RAW env override. After klappy.dev#100 merges, the override comes off and the test reads from main with no further changes. Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs (vs feature branch): 97/97 pass - tests/smoke.sh: 6/6 pass - grep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches Refs: - Caught in: this PR review by Klappy - Depends on: klappy/klappy.dev#100 (Detection Noise section) - Lesson: 'is this the right architectural shape' is a category the current gauntlet does not catch — the tools verify governance content, not whether new code is creating new ungoverned content. Possible future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in worker source and asks 'should this be in canon?'

Adds canon/constraints/core-governance-baseline.md establishing the three-tier resolution stack (live canon → bundled baseline → fail-loud) that every oddkit canon-driven tool must conform to. Context: PR #100's voice-dump suppression bug was a canon/code drift (schema said 3 modes, canon defined 9) that shipped to prod for 1h 39m because no contract governed how tools reconcile canon vs shipped code. The governance anti-pattern sweep audit identified 5 of 11 tools with the same shape of bug. This contract is the architectural answer the sweep refactors conform to. Key provisions: - Three-tier resolution per governance file: canon (preferred) → bundled baseline (fallback) → fail-loud error envelope - Response envelope declares governance_source on every call - Six required-baseline files enumerated - Baseline regenerated from canon at build; build-time schema check fails deploy if baseline and canon diverge - Fail-loud envelope includes actionable resolution block with reference_content_url pointing at oddkit-hosted canon (reference, not mandatory) - New tool oddkit_baseline_check probes canon completeness pre-deploy Passes Writing Canon (5 tiers verified). Converged after 5 challenge rounds (no blocking objections). Ships tier:1 status:draft; graduates to status:active after the canary refactor (telemetry_policy) lands following this contract. Companion PR: klappy/oddkit audit/governance-anti-pattern-sweep

klappy mentioned this pull request Apr 17, 2026

feat(challenge): governance-driven runChallengeAction (E0008) klappy/oddkit#100

Merged

klappy merged commit 52f2492 into main Apr 17, 2026
1 check passed

klappy mentioned this pull request Apr 17, 2026

Promote PR #100: governance-driven challenge with BM25 + stemming to prod klappy/oddkit#101

Merged

This was referenced Apr 18, 2026

canon: add core-governance-baseline contract (draft) #101

Merged

salvage: orphaned handoff + PR #100 rage-quit ledger #108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(challenge): add Detection Noise vocabulary to normative-vocabulary#100

feat(challenge): add Detection Noise vocabulary to normative-vocabulary#100
klappy merged 1 commit intomainfrom
feat/challenge-detection-noise-vocabulary

klappy commented Apr 17, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

klappy commented Apr 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Move challenge detection stop words from oddkit code into canon governance

What

Why

Scope decision

Modal verbs are deliberately absent from the filler list

Companion PR

Verification

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

klappy commented Apr 17, 2026 •

edited by cursor Bot

Loading