feat: search UX — auto-retry, per-file truncation, query WAL, skip dirs#194
feat: search UX — auto-retry, per-file truncation, query WAL, skip dirs#194
Conversation
Auto-retry broadening: codedb_find strips delimiters (_-.) and retries when initial query returns 0 results. Per-file match truncation: search output limits to 5 matches per file with truncation indicator. Header shows "(N shown)" count. Query tracking WAL: search/find/word queries logged to queries.log as NDJSON with JSON-escaped query strings. Foundation for combo-boost. Skip dirs: added .swc, .terraform, .serverless, elm-stuff, .stack-work, .cabal-sandbox, .cargo, bower_components. Codex review fixes: removed unused var, JSON escaping for special chars, accurate shown-count header. Generated with Claude Code Co-Authored-By: Claude <claude@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d6ec0503e9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var file_counts = std.StringHashMap(u8).init(alloc); | ||
| defer file_counts.deinit(); | ||
| const max_per_file: u8 = 5; |
There was a problem hiding this comment.
Replace per-file hit counter with usize
In handleSearch, the per-file match count is stored as u8, then incremented for every hit. Queries that return more than 255 matches from one file will overflow at gop.value_ptr.* += 1; in safety-checked builds this can panic the MCP request, and in non-safe builds it can wrap and break truncation behavior. This makes high-hit searches unstable and should use a wider integer type (usize/u16) for counting.
Useful? React with 👍 / 👎.
| if (c == '"') { escaped[elen] = '\''; elen += 1; } | ||
| else if (c == '\\') { if (elen + 1 < escaped.len) { escaped[elen] = '\\'; escaped[elen + 1] = '\\'; elen += 2; } } | ||
| else if (c == '\n' or c == '\r' or c == '\t') { escaped[elen] = ' '; elen += 1; } | ||
| else { escaped[elen] = c; elen += 1; } |
There was a problem hiding this comment.
Escape all control bytes in query log JSON
The query WAL writer only sanitizes ", \\, \n, \r, and \t, but leaves other ASCII control bytes untouched. A client can send a query containing escaped control chars (for example \u0001), which gets written raw into queries.log, producing invalid JSON/NDJSON lines and breaking downstream parsing for ranking features. Use a full JSON string escaper for all 0x00..0x1F characters.
Useful? React with 👍 / 👎.
Summary
Auto-retry with query broadening
When
codedb_findreturns 0 results, automatically strips delimiters (_,-,.) and retries.auth_middleware→authmiddlewarestill finds the file.Per-file match truncation
Search output limits to 5 matches per file with
... (more matches truncated)indicator. Footer shows(N shown, M truncated)when results are cut. Header stays as first line (MCP parser compatible).Query tracking WAL
Search/find/word queries logged to
~/.codedb/projects/<hash>/queries.logas NDJSON with JSON-escaped query strings. Foundation for future combo-boost ranking.Additional skip dirs
Added
.swc,.terraform,.serverless,elm-stuff,.stack-work,.cabal-sandbox,.cargo,bower_components.Memory check
100-query burst test: 73.6MB → 74.4MB (+0.8MB). No leaks.
zig build testzero leaks.Test plan