Skip to content

fix(explore): keep Tier 0 code-first diversity for popular identifiers (#449)#457

Merged
justrach merged 2 commits into
mainfrom
fix/449-tier0-code-first
May 11, 2026
Merged

fix(explore): keep Tier 0 code-first diversity for popular identifiers (#449)#457
justrach merged 2 commits into
mainfrom
fix/449-tier0-code-first

Conversation

@justrach
Copy link
Copy Markdown
Owner

Summary

Fixes #449. Tier 0 of searchContent had a word_hits.len <= max_results * 2 gate that skipped the whole Tier 0 code-first/doc-second diversity pass when a posting list got large. For popular identifiers like `fooBar`, that meant markdown files with many incidental mentions could fill `max_results` before any code file was scanned.

Approach

Replace the total-hit-count gate with a code-language-only gate. The new check counts hits in code-language files specifically; when code hits stay within bounds, Tier 0's two-pass (code, then doc) runs even if total hits are large. When the population is all-code (the #427 scenario), Tier 1's existing hit-count sort takes over as before.

Test plan

  • zig build test passes (519/519 including the new issue-449 test).
  • issue-427 regression scenario still passes (verified manually).

Commits

  1. test: failing test for #449 (Tier 0 gate bypass)
  2. fix(explore): keep Tier 0 code/doc diversity for popular identifiers (#449)

🤖 Generated with Claude Code

justrach and others added 2 commits May 12, 2026 00:41
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…449)

Gate Tier 0 on code-language hit count instead of total posting-list
length so queries where doc files dominate the word index still get the
code-first pass, while all-code popular queries (issue-427) still fall
through to Tier 1's hit-count sort.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d1ea27adf4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/explore.zig
Comment on lines +1577 to +1581
var code_hit_count: usize = 0;
for (word_hits) |hit| {
const hp = self.word_index.hitPath(hit);
if (hp.len > 0 and !isDocLanguage(detectLanguage(hp))) code_hit_count += 1;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Short-circuit code-hit counting once over the Tier 0 gate

For popular identifiers that have more than max_results * 2 code hits, this loop still walks the entire posting list just to decide Tier 0 should be skipped, and Tier 1 immediately walks the same word_hits again to build hits_per_file. The old total-hit gate was an O(1) length check in this path, so large all-code or mostly-code queries now pay an extra full posting-list traversal before taking the same Tier 1 path; break as soon as code_hit_count exceeds the threshold to avoid regressing common searchContent calls.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 559550 544898 -2.62% -14652 OK
codedb_changes 54771 53701 -1.95% -1070 OK
codedb_deps 8980 10636 +18.44% +1656 NOISE
codedb_edit 6213 5961 -4.06% -252 OK
codedb_find 60989 59310 -2.75% -1679 OK
codedb_hot 100374 100264 -0.11% -110 OK
codedb_outline 287492 305259 +6.18% +17767 OK
codedb_read 94288 99367 +5.39% +5079 OK
codedb_search 198851 240607 +21.00% +41756 NOISE
codedb_snapshot 295711 293050 -0.90% -2661 OK
codedb_status 213582 209390 -1.96% -4192 OK
codedb_symbol 61131 63391 +3.70% +2260 OK
codedb_tree 65134 67785 +4.07% +2651 OK
codedb_word 69601 70622 +1.47% +1021 OK

@justrach justrach merged commit 26e29c5 into main May 11, 2026
1 check passed
@justrach justrach deleted the fix/449-tier0-code-first branch May 11, 2026 17:38
justrach added a commit that referenced this pull request May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

explore: popular identifiers bypass Tier 0 code/doc diversity

1 participant