Skip to content

feat(challenge): add Detection Noise vocabulary to normative-vocabulary#100

Merged
klappy merged 1 commit intomainfrom
feat/challenge-detection-noise-vocabulary
Apr 17, 2026
Merged

feat(challenge): add Detection Noise vocabulary to normative-vocabulary#100
klappy merged 1 commit intomainfrom
feat/challenge-detection-noise-vocabulary

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented Apr 17, 2026

Move challenge detection stop words from oddkit code into canon governance

What

Adds a new ## Detection Noise section to odd/challenge/normative-vocabulary.md as a code block of common filler words. Updates the blockquote, Summary, and Notes to acknowledge the article's now-dual scope: signal in retrieved canon quotes (existing) plus noise in user input matched against per-type detection text (new).

Why

The oddkit worker currently hardcodes a CHALLENGE_STOP_WORDS Set in workers/src/orchestrate.ts — a Vodka Architecture violation in a refactor (oddkit#100) that was explicitly about removing such violations from source code. The hardcoded constant carries a domain opinion ("modals are signal, articles are filler in challenge detection") that belongs in canon, not in worker source.

Caught in PR #100 review by Klappy. The gauntlet didn't surface it — the category "is this the right architectural shape" requires a different lens than the current tools provide.

Scope decision

Single article, two surfaces — chose Option A from the discussion. Pros: one fetcher pattern, drift-free domain vocabulary, atomic edits when extending or pruning to a new domain. The two surfaces are two roles of the same domain opinion ("what counts as content vs filler in this domain").

Modal verbs are deliberately absent from the filler list

must, should, shall, may, might, can, could, will, would, not, no, never, always, do, does, did, have, has, had are all NOT in the Detection Noise list. They are the load-bearing trigger words for the strong-claim, proposal, and assumption challenge types. Filtering them would silently break those type detections — exactly the bug the BM25 pivot in oddkit#100 caught.

Companion PR

oddkit#101 (will follow): drop the hardcoded CHALLENGE_STOP_WORDS constant, extend fetchNormativeVocabulary to extract the new section into a Set, return it on the vocab object, consume it in discoverChallengeTypes when building the per-type BM25 index. Backward-compatible — empty Set when section is absent (server falls back to no filter, IDF only).

Verification

  • AI voice clichés audit on new prose: clean
  • Summary section preserved (Writing Canon tier 2 requirement)
  • Frontmatter governs field broadened to reflect dual scope
  • Frontmatter date bumped to 2026-04-17

Refs


Note

Low Risk
Documentation-only canon governance changes; no runtime code changes in this PR, with low risk aside from downstream tooling interpreting the new ## Detection Noise section incorrectly.

Overview
Updates odd/challenge/normative-vocabulary.md to explicitly cover two detection surfaces: signal words/phrases in retrieved canon quotes (tension detection) and a new ## Detection Noise stop-word list for filtering user input before BM25 scoring.

Adds the noise vocabulary section (code block of common filler words) and revises frontmatter, summary, and notes to describe how the server should parse/apply both vocabularies, including fallback behavior when the new section is missing/empty.

Reviewed by Cursor Bugbot for commit 7c0e65f. Bugbot is set up for automated code reviews on this repo. Configure here.

…vocabulary

Brings the second half of challenge detection vocabulary into governance.
Previously the oddkit worker hardcoded a CHALLENGE_STOP_WORDS Set in
workers/src/orchestrate.ts — a Vodka Architecture violation in a refactor
explicitly about removing such violations from source code.

Adds a new "## Detection Noise" section to normative-vocabulary.md as a
code block of common filler words to filter from user input before BM25
scoring. The section deliberately excludes modal verbs (must, should,
shall, may), negation (not, no, never, always), and auxiliary verbs (do,
does, did, have, has, had) — those are signal for strong-claim, proposal,
and assumption type detection. Filtering them would silently break those
type matches.

Article now governs both surfaces of challenge detection vocabulary:
- Signal in retrieved canon quotes (existing two tables under
  ## Normative Vocabulary)
- Noise in user input matched against per-type detection text (new
  ## Detection Noise section)

Blockquote, summary, and notes updated to reflect the dual-surface scope.
Other domains (legal, theological, narrative) extend or prune both
surfaces together as a single canon edit.

Companion oddkit PR will land next: drop the hardcoded CHALLENGE_STOP_WORDS
constant, fetch this section via the existing fetchNormativeVocabulary
helper, fall back to empty filter when the section is absent.
klappy added a commit to klappy/oddkit that referenced this pull request Apr 17, 2026
Caught in PR #100 review by Klappy: the CHALLENGE_STOP_WORDS Set added
mid-PR to fix a BM25 over-match was itself a Vodka Architecture violation
in a refactor explicitly about removing such violations. The constant
carried a domain opinion ('modals are signal, articles are filler in
challenge detection') that belonged in canon, not in worker source.

Anti-pattern fixed:
- Drop the hardcoded CHALLENGE_STOP_WORDS Set from workers/src/orchestrate.ts
- Drop the duplicate hardcoded copy from workers/test/governance-parser.test.mjs
- Extend NormativeVocabulary interface with stopWords: Set<string>
- Extend fetchNormativeVocabulary to extract '## Detection Noise' code block
  from odd/challenge/normative-vocabulary.md (lands in klappy.dev#100)
- Move BM25 index build out of discoverChallengeTypes into a new lazy
  builder getOrBuildChallengeTypeIndex(types, vocab, canonUrl) so the
  index can use governance-sourced stop words rather than a constant
- Update parser test to fetch Detection Noise the same way the worker
  does — no hardcoded duplicate, no drift risk. Test gains 3 new
  assertions: Detection Noise parses non-empty, excludes modal verbs,
  includes common filler

Net hardcoded-constants delta: this PR removes ~6 classes of hardcoded
domain opinion (claim type detection, questions, prereqs, tension regex,
reframings, stop words) and adds zero. The remaining minimal RFC 2119
fallback ('MUST', 'MUST NOT', 'SHOULD', 'SHOULD NOT') and 'planning'
default mode are server-availability fallbacks for when canon is
unreachable, not domain governance.

Test currently runs against the feature branch via KLAPPYDEV_RAW env
override. After klappy.dev#100 merges, the override comes off and the
test reads from main with no further changes.

Verification:
- npm run typecheck: clean
- workers/test/governance-parser.test.mjs (vs feature branch): 97/97 pass
- tests/smoke.sh: 6/6 pass
- grep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches

Refs:
- Caught in: this PR review by Klappy
- Depends on: klappy/klappy.dev#100 (Detection Noise section)
- Lesson: 'is this the right architectural shape' is a category the
  current gauntlet does not catch — the tools verify governance content,
  not whether new code is creating new ungoverned content. Possible
  future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in
  worker source and asks 'should this be in canon?'
@klappy klappy merged commit 52f2492 into main Apr 17, 2026
1 check passed
klappy added a commit that referenced this pull request Apr 18, 2026
Adds canon/constraints/core-governance-baseline.md establishing the
three-tier resolution stack (live canon → bundled baseline → fail-loud)
that every oddkit canon-driven tool must conform to.

Context: PR #100's voice-dump suppression bug was a canon/code drift
(schema said 3 modes, canon defined 9) that shipped to prod for 1h 39m
because no contract governed how tools reconcile canon vs shipped code.
The governance anti-pattern sweep audit identified 5 of 11 tools with
the same shape of bug. This contract is the architectural answer the
sweep refactors conform to.

Key provisions:
- Three-tier resolution per governance file: canon (preferred) →
  bundled baseline (fallback) → fail-loud error envelope
- Response envelope declares governance_source on every call
- Six required-baseline files enumerated
- Baseline regenerated from canon at build; build-time schema check
  fails deploy if baseline and canon diverge
- Fail-loud envelope includes actionable resolution block with
  reference_content_url pointing at oddkit-hosted canon (reference,
  not mandatory)
- New tool oddkit_baseline_check probes canon completeness pre-deploy

Passes Writing Canon (5 tiers verified). Converged after 5 challenge
rounds (no blocking objections). Ships tier:1 status:draft; graduates
to status:active after the canary refactor (telemetry_policy) lands
following this contract.

Companion PR: klappy/oddkit audit/governance-anti-pattern-sweep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant