feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback#6
Open
mschreib28 wants to merge 2 commits into
Open
feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback#6mschreib28 wants to merge 2 commits into
mschreib28 wants to merge 2 commits into
Conversation
…zy typo fallback
Two UX improvements that turn a free-text search into something a
real user can drive precisely.
1) Field-qualified queries.
A new query parser (src/search/query-parser.ts) splits the raw query
into structured filters and a free-text remainder:
kind:function name:auth path:src/api authenticate
becomes
{ kinds: ['function'], nameFilters: ['auth'],
pathFilters: ['src/api'], text: 'authenticate' }
Filters compose with the SearchOptions arg (intersection). Unknown
prefixes pass through as plain text so `query "TODO:"` keeps working.
Quoted values (`path:"my dir"`) handle whitespace. When the user
specifies only filters with no text, the search uses a filter-only
candidate scan instead of bailing out.
Recognised today:
kind: any NodeKind value
lang: any Language value (alias: language:)
path: case-insensitive substring of file_path
name: case-insensitive substring of node.name
2) Fuzzy fallback.
When BOTH FTS and LIKE return nothing AND the text is at least 3
chars, the resolver scans the distinct-name set with a bounded
Damerau-Levenshtein-style edit distance (≤2 for ≥5 chars, ≤1 for
4-char queries, off for shorter). Bounded edit-distance early-exits
once the row min exceeds maxDist, so this stays O(distinct-names *
avg-name-length) with a very low constant.
Verified live against ollama/ollama@v0.22.0:
query "kind:function auth" → only function-kind hits
query "lang:go path:server route" → Go files under server/
query "getUssr" (typo) → finds getUser, SetUser
query "confg" (typo) → finds Config
Full test suite: 380 passed.
…fuzzy fan-out cap, larger filter-only over-fetch, unit tests Five fixes from independent review: - parseQuery tokenizer: quotes that appear MID-token (path:"my dir/ file") were not being recognised — only quotes at the start of a token were treated as quoted spans. The fixture path:"my dir" parsed as ['path:"my', 'dir"'] instead of ['path:"my dir"']. Tokeniser is now a single state machine that scans into a token until whitespace OR a quote, and recognises quotes anywhere within the token (skips to the matching close quote). - searchNodesFuzzy: cap the per-name follow-up SQL queries at Math.max(limit*2, 50) AFTER edit-distance filtering. Without this, a project with many similar names (getUser1, getUser2...) could fan out far beyond limit queries before the inner-loop break kicks in. - searchAllByFilters (filter-only no-text path): bumped over-fetch multiplier from 2× to 5× so a selective post-filter (e.g. path:src/very/specific/file.ts) doesn't return fewer than limit results despite the DB having matches. - 23 new unit tests in __tests__/search-query-parser.test.ts: parseQuery covers known-field filter, lang/language alias, multiple kind: ORs, quoted spans (incl. mid-token), URL passthrough, empty-value passthrough, unknown prefix passthrough, unknown value passthrough, all-filters-no-text, empty input, 20k-char input. boundedEditDistance covers identity, single insertion/deletion/substitution, length-difference shortcut, empty inputs, case-sensitivity, early-exit correctness. Full test suite: 853 passed (up from 830).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary\n\nTwo UX improvements that turn free-text search into something a user can drive precisely.\n\n### 1. Field-qualified queries\n\nA new query parser splits the raw query into structured filters and a free-text remainder:\n\n
\nkind:function name:auth path:src/api authenticate\n\n\nbecomes:\n\njs\n{ kinds: ['function'], nameFilters: ['auth'],\n pathFilters: ['src/api'], text: 'authenticate' }\n\n\nFilters compose with theSearchOptionsarg (intersection). Unknown prefixes pass through as plain text soquery "TODO:"keeps working. Quoted values (path:"my dir") handle whitespace. When the user supplies only filters with no text, the search uses a filter-only candidate scan instead of bailing out.\n\nRecognised fields:\n| Prefix | Value |\n|--------|-------|\n|kind:| anyNodeKindvalue (function,method,class, ...) |\n|lang:(aliaslanguage:) | anyLanguagevalue |\n|path:| case-insensitive substring offile_path|\n|name:| case-insensitive substring ofnode.name|\n\n### 2. Fuzzy typo fallback\n\nWhen both FTS and LIKE return nothing AND the text is at least 3 chars, scan the distinct-name set with a bounded edit distance (≤2 for ≥5-char queries, ≤1 for 4-char). Bounded edit distance early-exits once the row min exceedsmaxDist, so the per-query cost stays O(distinct-names × avg-name-length) with a very low constant.\n\n## Test plan\n\nVerified live against ollama/ollama@v0.22.0:\n\n| Query | Result |\n|-------|--------|\n|kind:function auth| only function-kind hits |\n|lang:go path:server route| Go files underserver/|\n|getUssr(typo) | findsgetUser,SetUser|\n|confg(typo) | findsConfig|\n\n- [x]npx vitest run— 380 passed\n- [x]npx tsc --noEmitclean\n- [x]npm run buildsucceeds\n\n🤖 Generated with Claude Code\nCopied from colbymchenry/codegraph#131