feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback by mschreib28 · Pull Request #6 · mschreib28/codegraph

mschreib28 · 2026-05-06T00:04:27Z

Summary\n\nTwo UX improvements that turn free-text search into something a user can drive precisely.\n\n### 1. Field-qualified queries\n\nA new query parser splits the raw query into structured filters and a free-text remainder:\n\n`\nkind:function name:auth path:src/api authenticate\n`\n\nbecomes:\n\n`js\n{ kinds: ['function'], nameFilters: ['auth'],\n pathFilters: ['src/api'], text: 'authenticate' }\n`\n\nFilters compose with the `SearchOptions` arg (intersection). Unknown prefixes pass through as plain text so `query "TODO:"` keeps working. Quoted values (`path:"my dir"`) handle whitespace. When the user supplies only filters with no text, the search uses a filter-only candidate scan instead of bailing out.\n\nRecognised fields:\n| Prefix | Value |\n|--------|-------|\n| `kind:` | any `NodeKind` value (`function`, `method`, `class`, ...) |\n| `lang:` (alias `language:`) | any `Language` value |\n| `path:` | case-insensitive substring of `file_path` |\n| `name:` | case-insensitive substring of `node.name` |\n\n### 2. Fuzzy typo fallback\n\nWhen both FTS and LIKE return nothing AND the text is at least 3 chars, scan the distinct-name set with a bounded edit distance (≤2 for ≥5-char queries, ≤1 for 4-char). Bounded edit distance early-exits once the row min exceeds `maxDist`, so the per-query cost stays O(distinct-names × avg-name-length) with a very low constant.\n\n## Test plan\n\nVerified live against ollama/ollama@v0.22.0:\n\n| Query | Result |\n|-------|--------|\n| `kind:function auth` | only function-kind hits |\n| `lang:go path:server route` | Go files under `server/` |\n| `getUssr` (typo) | finds `getUser`, `SetUser` |\n| `confg` (typo) | finds `Config` |\n\n- [x] `npx vitest run` — 380 passed\n- [x] `npx tsc --noEmit` clean\n- [x] `npm run build` succeeds\n\n🤖 Generated with Claude Code\n

Copied from colbymchenry/codegraph#131

…zy typo fallback Two UX improvements that turn a free-text search into something a real user can drive precisely. 1) Field-qualified queries. A new query parser (src/search/query-parser.ts) splits the raw query into structured filters and a free-text remainder: kind:function name:auth path:src/api authenticate becomes { kinds: ['function'], nameFilters: ['auth'], pathFilters: ['src/api'], text: 'authenticate' } Filters compose with the SearchOptions arg (intersection). Unknown prefixes pass through as plain text so `query "TODO:"` keeps working. Quoted values (`path:"my dir"`) handle whitespace. When the user specifies only filters with no text, the search uses a filter-only candidate scan instead of bailing out. Recognised today: kind: any NodeKind value lang: any Language value (alias: language:) path: case-insensitive substring of file_path name: case-insensitive substring of node.name 2) Fuzzy fallback. When BOTH FTS and LIKE return nothing AND the text is at least 3 chars, the resolver scans the distinct-name set with a bounded Damerau-Levenshtein-style edit distance (≤2 for ≥5 chars, ≤1 for 4-char queries, off for shorter). Bounded edit-distance early-exits once the row min exceeds maxDist, so this stays O(distinct-names * avg-name-length) with a very low constant. Verified live against ollama/ollama@v0.22.0: query "kind:function auth" → only function-kind hits query "lang:go path:server route" → Go files under server/ query "getUssr" (typo) → finds getUser, SetUser query "confg" (typo) → finds Config Full test suite: 380 passed.

…fuzzy fan-out cap, larger filter-only over-fetch, unit tests Five fixes from independent review: - parseQuery tokenizer: quotes that appear MID-token (path:"my dir/ file") were not being recognised — only quotes at the start of a token were treated as quoted spans. The fixture path:"my dir" parsed as ['path:"my', 'dir"'] instead of ['path:"my dir"']. Tokeniser is now a single state machine that scans into a token until whitespace OR a quote, and recognises quotes anywhere within the token (skips to the matching close quote). - searchNodesFuzzy: cap the per-name follow-up SQL queries at Math.max(limit*2, 50) AFTER edit-distance filtering. Without this, a project with many similar names (getUser1, getUser2...) could fan out far beyond limit queries before the inner-loop break kicks in. - searchAllByFilters (filter-only no-text path): bumped over-fetch multiplier from 2× to 5× so a selective post-filter (e.g. path:src/very/specific/file.ts) doesn't return fewer than limit results despite the DB having matches. - 23 new unit tests in __tests__/search-query-parser.test.ts: parseQuery covers known-field filter, lang/language alias, multiple kind: ORs, quoted spans (incl. mid-token), URL passthrough, empty-value passthrough, unknown prefix passthrough, unknown value passthrough, all-filters-no-text, empty input, 20k-char input. boundedEditDistance covers identity, single insertion/deletion/substitution, length-difference shortcut, empty inputs, case-sensitivity, early-exit correctness. Full test suite: 853 passed (up from 830).

andreinknv added 2 commits April 28, 2026 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback#6

feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback#6
mschreib28 wants to merge 2 commits into
mainfrom
upstream/feat/search-fields-and-fuzzy

mschreib28 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mschreib28 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants