Fix acronym search: add acronym-aware indexing, scoring, routing, and auto-rebuild by klappy · Pull Request #17 · klappy/klappy.dev

klappy · 2026-02-06T04:26:25Z

Three interacting bugs prevented acronym queries (CST, ODD, ESE) from returning results:

Stale index: new definition files (e.g. canon/definitions/cognitive-saturation-threshold.md)
were not in docs.json because it was never rebuilt after they were added.
No acronym matching: librarian scored tokens by substring against titles/tags/paths
but had no dedicated acronym scoring. Adds acronym extraction at index time
(from parenthetical and title initials) and acronym_match scoring (weight: 25).
Router gap: definition query pattern required an article ("What is the X?")
so "What is CST?" never routed to the librarian. Made article optional.
Build pipeline: smart-build.js now runs docs:index so every deploy has fresh data.

Also fixes pre-existing bug where non-JSON bracket tags (e.g. [agent, guide]) were
stored as strings instead of arrays, causing .map() crashes in the librarian.

Tests: 14/15 pass (metric-laundering quote overlap is pre-existing).

https://claude.ai/code/session_01Ht8m1Nd7dMgaHNf4qswqgY

Note

Medium Risk
Touches retrieval routing/scoring and the docs indexing/build pipeline; mistakes could degrade search relevance or break builds if the generated index schema changes unexpectedly.

Overview
Improves Librarian’s ability to answer acronym/definition lookups by loosening the router’s definition pattern (article optional) and adding acronym-aware scoring in librarian.js (new acronym_match weight using doc.acronyms).

Extends build-docs-index.js to extract acronyms from titles (parenthetical + initials) and to robustly parse bracketed frontmatter arrays (fallback for non-JSON lists), preventing .map() crashes when tags are malformed.

Updates the build pipeline (smart-build.js) to always run docs:index so public/_compiled/index/docs.json stays fresh, and adds/updates Librarian tests to cover acronym queries like CST/ESE/ODD.

^{Written by Cursor Bugbot for commit aa31344. This will update automatically on new commits. Configure here.}

… auto-rebuild Three interacting bugs prevented acronym queries (CST, ODD, ESE) from returning results: 1. Stale index: new definition files (e.g. canon/definitions/cognitive-saturation-threshold.md) were not in docs.json because it was never rebuilt after they were added. 2. No acronym matching: librarian scored tokens by substring against titles/tags/paths but had no dedicated acronym scoring. Adds acronym extraction at index time (from parenthetical and title initials) and acronym_match scoring (weight: 25). 3. Router gap: definition query pattern required an article ("What is the X?") so "What is CST?" never routed to the librarian. Made article optional. 4. Build pipeline: smart-build.js now runs docs:index so every deploy has fresh data. Also fixes pre-existing bug where non-JSON bracket tags (e.g. [agent, guide]) were stored as strings instead of arrays, causing .map() crashes in the librarian. Tests: 14/15 pass (metric-laundering quote overlap is pre-existing). https://claude.ai/code/session_01Ht8m1Nd7dMgaHNf4qswqgY

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-06T04:50:43Z

+  // Strategy 2: Generate acronym from title initials
+  // Remove any parenthetical content first
+  const cleaned = title.replace(/\s*\([^)]*\)\s*/g, " ").trim();
+  const words = cleaned.split(/[\s\-]+/).filter((w) => w.length > 0);


Acronym extraction splits on limited separators, producing garbage

Low Severity

The word-split regex [\s\-]+ in extractAcronyms only handles whitespace and ASCII hyphens, missing em-dashes (—), ampersands (&), and emoji. Titles like "Fragments of the Canon — Reconstructions" produce acronyms containing literal special characters (e.g., "fc—r", "dd&ep", "v&e", "☁cp—bd", "\ud83dot&d"). These can never match real queries because normalizeQuery strips the same characters, so they're dead entries in the index. Expanding the split to include common Unicode separators and punctuation would eliminate the noise.

klappy merged commit 6616ad7 into main Feb 6, 2026
2 of 3 checks passed

klappy deleted the claude/fix-acronym-search-BLaLH branch February 6, 2026 04:27

cursor Bot reviewed Feb 6, 2026

View reviewed changes

klappy mentioned this pull request Apr 24, 2026

canon(principles): add partial-data-with-transparency-and-background-warm + graduation ledger #137

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix acronym search: add acronym-aware indexing, scoring, routing, and auto-rebuild#17

Fix acronym search: add acronym-aware indexing, scoring, routing, and auto-rebuild#17
klappy merged 1 commit intomainfrom
claude/fix-acronym-search-BLaLH

klappy commented Feb 6, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klappy commented Feb 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Feb 6, 2026

Choose a reason for hiding this comment

Acronym extraction splits on limited separators, producing garbage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

klappy commented Feb 6, 2026 •

edited by cursor Bot

Loading