Skip to content

feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback#6

Open
mschreib28 wants to merge 2 commits into
mainfrom
upstream/feat/search-fields-and-fuzzy
Open

feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback#6
mschreib28 wants to merge 2 commits into
mainfrom
upstream/feat/search-fields-and-fuzzy

Conversation

@mschreib28
Copy link
Copy Markdown
Owner

Summary\n\nTwo UX improvements that turn free-text search into something a user can drive precisely.\n\n### 1. Field-qualified queries\n\nA new query parser splits the raw query into structured filters and a free-text remainder:\n\n\nkind:function name:auth path:src/api authenticate\n\n\nbecomes:\n\njs\n{ kinds: ['function'], nameFilters: ['auth'],\n pathFilters: ['src/api'], text: 'authenticate' }\n\n\nFilters compose with the SearchOptions arg (intersection). Unknown prefixes pass through as plain text so query "TODO:" keeps working. Quoted values (path:"my dir") handle whitespace. When the user supplies only filters with no text, the search uses a filter-only candidate scan instead of bailing out.\n\nRecognised fields:\n| Prefix | Value |\n|--------|-------|\n| kind: | any NodeKind value (function, method, class, ...) |\n| lang: (alias language:) | any Language value |\n| path: | case-insensitive substring of file_path |\n| name: | case-insensitive substring of node.name |\n\n### 2. Fuzzy typo fallback\n\nWhen both FTS and LIKE return nothing AND the text is at least 3 chars, scan the distinct-name set with a bounded edit distance (≤2 for ≥5-char queries, ≤1 for 4-char). Bounded edit distance early-exits once the row min exceeds maxDist, so the per-query cost stays O(distinct-names × avg-name-length) with a very low constant.\n\n## Test plan\n\nVerified live against ollama/ollama@v0.22.0:\n\n| Query | Result |\n|-------|--------|\n| kind:function auth | only function-kind hits |\n| lang:go path:server route | Go files under server/ |\n| getUssr (typo) | finds getUser, SetUser |\n| confg (typo) | finds Config |\n\n- [x] npx vitest run380 passed\n- [x] npx tsc --noEmit clean\n- [x] npm run build succeeds\n\n🤖 Generated with Claude Code\n


Copied from colbymchenry/codegraph#131

…zy typo fallback

Two UX improvements that turn a free-text search into something a
real user can drive precisely.

1) Field-qualified queries.

A new query parser (src/search/query-parser.ts) splits the raw query
into structured filters and a free-text remainder:

  kind:function name:auth path:src/api authenticate

becomes
  { kinds: ['function'], nameFilters: ['auth'],
    pathFilters: ['src/api'], text: 'authenticate' }

Filters compose with the SearchOptions arg (intersection). Unknown
prefixes pass through as plain text so `query "TODO:"` keeps working.
Quoted values (`path:"my dir"`) handle whitespace. When the user
specifies only filters with no text, the search uses a filter-only
candidate scan instead of bailing out.

Recognised today:
  kind:        any NodeKind value
  lang:        any Language value (alias: language:)
  path:        case-insensitive substring of file_path
  name:        case-insensitive substring of node.name

2) Fuzzy fallback.

When BOTH FTS and LIKE return nothing AND the text is at least 3
chars, the resolver scans the distinct-name set with a bounded
Damerau-Levenshtein-style edit distance (≤2 for ≥5 chars, ≤1 for
4-char queries, off for shorter). Bounded edit-distance early-exits
once the row min exceeds maxDist, so this stays O(distinct-names *
avg-name-length) with a very low constant.

Verified live against ollama/ollama@v0.22.0:
  query "kind:function auth"          → only function-kind hits
  query "lang:go path:server route"   → Go files under server/
  query "getUssr"   (typo)            → finds getUser, SetUser
  query "confg"     (typo)            → finds Config

Full test suite: 380 passed.
…fuzzy fan-out cap, larger filter-only over-fetch, unit tests

Five fixes from independent review:

- parseQuery tokenizer: quotes that appear MID-token (path:"my dir/
  file") were not being recognised — only quotes at the start of a
  token were treated as quoted spans. The fixture path:"my dir"
  parsed as ['path:"my', 'dir"'] instead of ['path:"my dir"'].
  Tokeniser is now a single state machine that scans into a token
  until whitespace OR a quote, and recognises quotes anywhere within
  the token (skips to the matching close quote).

- searchNodesFuzzy: cap the per-name follow-up SQL queries at
  Math.max(limit*2, 50) AFTER edit-distance filtering. Without
  this, a project with many similar names (getUser1, getUser2...)
  could fan out far beyond limit queries before the inner-loop
  break kicks in.

- searchAllByFilters (filter-only no-text path): bumped over-fetch
  multiplier from 2× to 5× so a selective post-filter (e.g.
  path:src/very/specific/file.ts) doesn't return fewer than limit
  results despite the DB having matches.

- 23 new unit tests in __tests__/search-query-parser.test.ts:
  parseQuery covers known-field filter, lang/language alias,
  multiple kind: ORs, quoted spans (incl. mid-token), URL
  passthrough, empty-value passthrough, unknown prefix passthrough,
  unknown value passthrough, all-filters-no-text, empty input,
  20k-char input. boundedEditDistance covers identity, single
  insertion/deletion/substitution, length-difference shortcut,
  empty inputs, case-sensitivity, early-exit correctness.

Full test suite: 853 passed (up from 830).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants