index: add BM25 ranking for content search

## Problem

`Explorer.searchContent` (src/explore.zig:1504) returns matches in scan order through a 5-tier pipeline (word, prefix, trigram, sparse, full). The only post-hoc ordering is a per-line occurrence count tiebreaker at src/explore.zig:1674-1688. This means:

- A file mentioning every query term many times ranks identically to a file with a single tangential mention.
- Multi-term queries (`"parse Token"`, `"render component state"`) dump every match without surfacing the most relevant *file* first.
- The filename fuzzy-finder (`fuzzyScore` at src/explore.zig:4881) is the only path with a real relevance score; content search has none.

BM25 over the existing inverted indices would rank documents by a normalized (tf × idf) sum so the densest, term-rich files surface above noise.

## Failing test

Branch: `issue-393-failing-test` (HEAD: 214b3c3). See `test "issue-393: BM25 ranking surfaces high-density file before single-mention file"` in src/tests.zig. Fails on main with:

```
src/tests.zig:9376:33: error: no field or member function named 'searchContentRanked' in 'explore.Explorer'
```

The test indexes a dense file (many `parse`/`Token` hits), a sparse file (one `parse` mention), and three noise files, calls `searchContentRanked("parse Token", ...)`, and asserts that the top-ranked result is `src/dense.zig` with `score > 0`.

## Proposed API

Additive, opt-in. Existing `searchContent` keeps its scan-order semantics so callers that just want hit listings are unaffected.

```zig
pub const RankedResult = SearchResult;  // already has score: f32

pub fn searchContentRanked(
    self: *Explorer,
    query: []const u8,
    allocator: std.mem.Allocator,
    max_results: usize,
) ![]const SearchResult;
```

Behavior:
- Tokenizes the query the same way `WordIndex` tokenizes documents (lowercase + identifier split). Single-token queries fall through to the existing pipeline (BM25 with one term degenerates to df-weighted scan order).
- Multi-term queries: collect candidate documents from `WordIndex.search` for each term plus `TrigramIndex.candidates` for the joined literal. Score each candidate doc with BM25 (k1=1.2, b=0.75). Emit one `SearchResult` per top-N document — line is the highest-tf line for any query term in that doc.
- MCP wiring (out of scope for this issue but sketched): add `rank: "bm25"` to `codedb_search` args; default stays unranked.

## Implementation sketch

### State (in `WordIndex`)

The `WordIndex.index` map already stores `(doc_id, line_num)` per term — repeated entries for the same `(term, doc_id)` give us per-doc TF for free. Add three lazy structures:

```zig
// All optional — built lazily on first searchContentRanked call,
// invalidated by indexFile / removeFile.
doc_lengths: ?std.AutoHashMap(u32, u32),  // doc_id -> # tokens
total_tokens: u64,                         // sum of doc_lengths.values
df_cache: ?std.StringHashMap(u32),         // term -> # distinct docs (computed on demand)
ranking_dirty: bool,
```

- N (doc count): already available via `WordIndex.fileCount`.
- avgdl: `total_tokens / N`.
- df(term): count distinct doc_ids in `index.get(term).items` — O(hits) once, cached.
- tf(term, doc): count entries in posting list with matching doc_id — O(hits/N) on average via a single linear pass; for low-df hot terms this is bounded by the existing 512 MAX_POSTINGS pressure.

### Persistence

- `doc_lengths` adds `4 + 8 * N` bytes — for N=14k files that's ~110 KB. Persist alongside the WordIndex on-disk format (DiskHeader bumps to v3, adds an optional length-table block; readers without v3 ignore it and rebuild lazily).
- `df_cache` is *not* persisted — it's a per-query memoization; cold start does one O(hits) pass per term.
- No per-(term, doc) tf table on disk: tf is recomputed from the existing `index` posting list, which already encodes occurrence count by repeated entries.

### Perf and memory budget

- Extra heap state: ~110 KB for `doc_lengths` on a 14k-file repo, ~8 KB on a 1k-file repo — negligible vs. the 11M codedb-linux-x86_64 binary.
- `df_cache`: bounded by distinct query terms across a session — typically <1k entries, ~32 KB.
- Trigram MAX_POSTINGS=512 stays untouched. We do *not* need full per-doc tf for trigrams; word-level TF on the WordIndex side is sufficient and already exists.
- Per-query cost: O(Σ_t hits(t)) — same as the existing word-index path. Sorting by score adds O(C log C) where C ≤ max_results × #terms.
- Cold-start rebuild of `doc_lengths`: one pass over `file_words` map. <50ms on 14k files.

### Out of scope

- Reranking the trigram-only Tier 1 (no doc-level term info there). Trigrams stay as a candidate filter; ranking happens after candidates are resolved through word-level lookups.
- Cross-language stemming / synonyms.
- Phrase / proximity scoring (BM25F, BM25 with positions) — possible follow-up using the existing `loc_mask` byte in `PostingMask`, but not in this issue.
- Persisting `df_cache` to disk.
- Replacing the default `searchContent` ranking — keep the additive `searchContentRanked` entry point.

## Fix

Implement `searchContentRanked` per sketch above, wire `WordIndex.docLength` / `WordIndex.docFrequency` / `WordIndex.termFrequency` helpers, and turn the failing test green.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index: add BM25 ranking for content search #400

Problem

Failing test

Proposed API

Implementation sketch

State (in `WordIndex`)

Persistence

Perf and memory budget

Out of scope

Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

index: add BM25 ranking for content search #400

Description

Problem

Failing test

Proposed API

Implementation sketch

State (in WordIndex)

Persistence

Perf and memory budget

Out of scope

Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

State (in `WordIndex`)