Skip to content

feat: search UX — auto-retry, per-file truncation, query WAL, skip dirs#194

Merged
justrach merged 1 commit intomainfrom
feat/search-ux-improvements
Apr 7, 2026
Merged

feat: search UX — auto-retry, per-file truncation, query WAL, skip dirs#194
justrach merged 1 commit intomainfrom
feat/search-ux-improvements

Conversation

@justrach
Copy link
Copy Markdown
Owner

@justrach justrach commented Apr 7, 2026

Summary

Auto-retry with query broadening

When codedb_find returns 0 results, automatically strips delimiters (_, -, .) and retries. auth_middlewareauthmiddleware still finds the file.

Per-file match truncation

Search output limits to 5 matches per file with ... (more matches truncated) indicator. Footer shows (N shown, M truncated) when results are cut. Header stays as first line (MCP parser compatible).

Query tracking WAL

Search/find/word queries logged to ~/.codedb/projects/<hash>/queries.log as NDJSON with JSON-escaped query strings. Foundation for future combo-boost ranking.

Additional skip dirs

Added .swc, .terraform, .serverless, elm-stuff, .stack-work, .cabal-sandbox, .cargo, bower_components.

Memory check

100-query burst test: 73.6MB → 74.4MB (+0.8MB). No leaks. zig build test zero leaks.

Test plan

  • All tests pass (zero leaks)
  • 2 new tests: auto-retry delimiter stripping, per-file search count
  • Live MCP test: per-file truncation confirmed, header format correct
  • Codex reviewed: fixed unused var, JSON escaping, header order regression
  • Pre-push benchmarks pass

Auto-retry broadening: codedb_find strips delimiters (_-.) and retries
when initial query returns 0 results.

Per-file match truncation: search output limits to 5 matches per file
with truncation indicator. Header shows "(N shown)" count.

Query tracking WAL: search/find/word queries logged to queries.log as
NDJSON with JSON-escaped query strings. Foundation for combo-boost.

Skip dirs: added .swc, .terraform, .serverless, elm-stuff, .stack-work,
.cabal-sandbox, .cargo, bower_components.

Codex review fixes: removed unused var, JSON escaping for special chars,
accurate shown-count header.

Generated with Claude Code

Co-Authored-By: Claude <claude@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Benchmark Regression Report

Threshold: 10.00%

Tool Base (ns) Head (ns) Delta Status
codedb_bundle 392946 392515 -0.11% OK
codedb_changes 66674 64196 -3.72% OK
codedb_deps 17012 16752 -1.53% OK
codedb_edit 11348 11768 +3.70% OK
codedb_find 83960 92358 +10.00% FAIL
codedb_hot 90660 90915 +0.28% OK
codedb_outline 285531 272605 -4.53% OK
codedb_read 100895 104591 +3.66% OK
codedb_search 156533 154284 -1.44% OK
codedb_snapshot 3054485 2961251 -3.05% OK
codedb_status 258266 284772 +10.26% FAIL
codedb_symbol 37799 36156 -4.35% OK
codedb_tree 68478 65005 -5.07% OK
codedb_word 52185 50269 -3.67% OK

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6ec0503e9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +765 to +767
var file_counts = std.StringHashMap(u8).init(alloc);
defer file_counts.deinit();
const max_per_file: u8 = 5;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace per-file hit counter with usize

In handleSearch, the per-file match count is stored as u8, then incremented for every hit. Queries that return more than 255 matches from one file will overflow at gop.value_ptr.* += 1; in safety-checked builds this can panic the MCP request, and in non-safe builds it can wrap and break truncation behavior. This makes high-hit searches unstable and should use a wider integer type (usize/u16) for counting.

Useful? React with 👍 / 👎.

Comment on lines +1613 to +1616
if (c == '"') { escaped[elen] = '\''; elen += 1; }
else if (c == '\\') { if (elen + 1 < escaped.len) { escaped[elen] = '\\'; escaped[elen + 1] = '\\'; elen += 2; } }
else if (c == '\n' or c == '\r' or c == '\t') { escaped[elen] = ' '; elen += 1; }
else { escaped[elen] = c; elen += 1; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Escape all control bytes in query log JSON

The query WAL writer only sanitizes ", \\, \n, \r, and \t, but leaves other ASCII control bytes untouched. A client can send a query containing escaped control chars (for example \u0001), which gets written raw into queries.log, producing invalid JSON/NDJSON lines and breaking downstream parsing for ranking features. Use a full JSON string escaper for all 0x00..0x1F characters.

Useful? React with 👍 / 👎.

@justrach justrach merged commit d6c1f6a into main Apr 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant