Conversation
… auto-rebuild
Three interacting bugs prevented acronym queries (CST, ODD, ESE) from returning results:
1. Stale index: new definition files (e.g. canon/definitions/cognitive-saturation-threshold.md)
were not in docs.json because it was never rebuilt after they were added.
2. No acronym matching: librarian scored tokens by substring against titles/tags/paths
but had no dedicated acronym scoring. Adds acronym extraction at index time
(from parenthetical and title initials) and acronym_match scoring (weight: 25).
3. Router gap: definition query pattern required an article ("What is the X?")
so "What is CST?" never routed to the librarian. Made article optional.
4. Build pipeline: smart-build.js now runs docs:index so every deploy has fresh data.
Also fixes pre-existing bug where non-JSON bracket tags (e.g. [agent, guide]) were
stored as strings instead of arrays, causing .map() crashes in the librarian.
Tests: 14/15 pass (metric-laundering quote overlap is pre-existing).
https://claude.ai/code/session_01Ht8m1Nd7dMgaHNf4qswqgY
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| // Strategy 2: Generate acronym from title initials | ||
| // Remove any parenthetical content first | ||
| const cleaned = title.replace(/\s*\([^)]*\)\s*/g, " ").trim(); | ||
| const words = cleaned.split(/[\s\-]+/).filter((w) => w.length > 0); |
There was a problem hiding this comment.
Acronym extraction splits on limited separators, producing garbage
Low Severity
The word-split regex [\s\-]+ in extractAcronyms only handles whitespace and ASCII hyphens, missing em-dashes (—), ampersands (&), and emoji. Titles like "Fragments of the Canon — Reconstructions" produce acronyms containing literal special characters (e.g., "fc—r", "dd&ep", "v&e", "☁cp—bd", "\ud83dot&d"). These can never match real queries because normalizeQuery strips the same characters, so they're dead entries in the index. Expanding the split to include common Unicode separators and punctuation would eliminate the noise.


Three interacting bugs prevented acronym queries (CST, ODD, ESE) from returning results:
Stale index: new definition files (e.g. canon/definitions/cognitive-saturation-threshold.md)
were not in docs.json because it was never rebuilt after they were added.
No acronym matching: librarian scored tokens by substring against titles/tags/paths
but had no dedicated acronym scoring. Adds acronym extraction at index time
(from parenthetical and title initials) and acronym_match scoring (weight: 25).
Router gap: definition query pattern required an article ("What is the X?")
so "What is CST?" never routed to the librarian. Made article optional.
Build pipeline: smart-build.js now runs docs:index so every deploy has fresh data.
Also fixes pre-existing bug where non-JSON bracket tags (e.g. [agent, guide]) were
stored as strings instead of arrays, causing .map() crashes in the librarian.
Tests: 14/15 pass (metric-laundering quote overlap is pre-existing).
https://claude.ai/code/session_01Ht8m1Nd7dMgaHNf4qswqgY
Note
Medium Risk
Touches retrieval routing/scoring and the docs indexing/build pipeline; mistakes could degrade search relevance or break builds if the generated index schema changes unexpectedly.
Overview
Improves Librarian’s ability to answer acronym/definition lookups by loosening the router’s definition pattern (article optional) and adding acronym-aware scoring in
librarian.js(newacronym_matchweight usingdoc.acronyms).Extends
build-docs-index.jsto extract acronyms from titles (parenthetical + initials) and to robustly parse bracketed frontmatter arrays (fallback for non-JSON lists), preventing.map()crashes when tags are malformed.Updates the build pipeline (
smart-build.js) to always rundocs:indexsopublic/_compiled/index/docs.jsonstays fresh, and adds/updates Librarian tests to cover acronym queries like CST/ESE/ODD.Written by Cursor Bugbot for commit aa31344. This will update automatically on new commits. Configure here.