Pre-size trigram posting list allocations to reduce GC pressure during indexing

## Context

[seek](https://github.com/dualeai/seek) wraps zoekt as a local CLI for AI coding agents. Every search is a blocking tool call — the agent waits for results. On the cold-index path, shard building is the dominant cost, and over half of that cost is Go runtime memory management rather than useful work.

## Problem

`postingsBuilder.newSearchableString` (`index/shard_builder.go:90`) iterates rune-by-rune through each file, extracting trigrams and appending to posting lists:

```go
s.postings[ng] = append(s.postings[ng], buf[:m]...)
```

Each trigram's posting list (`[]byte`) grows via repeated `append`. Profiled on kubernetes (29k files, 23 shards, Apple M1 Max):

| Flat CPU | Function | What |
|---|---|---|
| 9.7s (38%) | `runtime.memclrNoHeapPointers` | Clearing newly allocated pages |
| 1.5s (6%) | `runtime.memmove` | Copying on slice growth |
| 1.8s (7%) | `runtime.madvise` | Kernel memory management |
| 0.8s (3%) | `runtime.mapassign_fast64` | Map insertions |

**54% of CPU** is runtime memory management. `newSearchableString` cumulative: 11.1s (44%), flat: 0.54s (2%) — nearly all time is in runtime calls it triggers.

Related PRs: #430 (skip trigram check for small files, 10% speedup), #522 (B+-tree for posting lists on search side), #680 (faster newLinesIndices), #838/#839 (reduce prepareNormalBuild allocations; #839 body states "we need a true rewrite"). None address `newSearchableString`.

## Possible approaches

**1. Pre-size maps** (lowest effort): `postings` and `lastOffsets` are initialized as empty maps (`map[ngram][]byte{}` at line 82). Pre-sizing with `make(map[ngram][]byte, 100_000)` eliminates rehashing. The `DocChecker` already does this at line 596.

**2. Single backing buffer**: Instead of independent `[]byte` per trigram, allocate one large `[]byte` and sub-slice it. Eliminates per-trigram memclr/memmove. Go's arena experiment was removed in Go 1.23, but a manual arena (pre-allocated slice + offset tracking) works.

**3. Pre-size posting list slices**: Estimate per-trigram sizes based on corpus size, pre-allocate to reduce growth events.

**4. GC tuning** (immediate, no code changes): `GOGC=off` + `GOMEMLIMIT=6GiB` reduces GC overhead by 80-95% for batch workloads (Go team's recommended approach since 1.19). Could be set in the indexer entry point.

## Impact

Reducing runtime memory management from 54% to ~20% yields ~1.5x speedup in shard building. On kubernetes, cold index from ~8s to ~5-6s.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-size trigram posting list allocations to reduce GC pressure during indexing #1017

Context

Problem

Possible approaches

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flat CPU	Function	What
9.7s (38%)	`runtime.memclrNoHeapPointers`	Clearing newly allocated pages
1.5s (6%)	`runtime.memmove`	Copying on slice growth
1.8s (7%)	`runtime.madvise`	Kernel memory management
0.8s (3%)	`runtime.mapassign_fast64`	Map insertions

Pre-size trigram posting list allocations to reduce GC pressure during indexing #1017

Description

Context

Problem

Possible approaches

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions