Skip to content

perf: worker-local initial indexing with deterministic merge #218

@justrach

Description

@justrach

Goal

Move more of initial indexing work into worker-local buffers and merge deterministically on the main thread.

Why

The current startup path still does too much shared mutation during ingest. That increases lock contention, allocator churn, and makes perf work harder to reason about.

Scope

  • Parse/index file-local state in worker-local arenas or buffers.
  • Avoid shared map mutation inside the hot worker loop wherever possible.
  • Merge results in a deterministic order so output is stable regardless of worker count.
  • Preserve current indexing correctness and search behavior.

Acceptance Criteria

  • Initial indexing with N=1 and N=4 workers produces identical results for:
    • tree
    • outline
    • find
    • word
    • search
  • No change in visible ordering or result stability between runs.
  • No correctness regressions on incremental update paths.

Regression Checks

Add explicit benchmark and parity checks that must pass before merge:

  • Sequential vs parallel parity test over a multi-language fixture corpus.
  • Cold tree benchmark on a large repo with fresh HOME.
  • Warm codedb_search MCP benchmark on a large repo.
  • Memory check on large cold index path.

Merge Gate

Do not merge if any of the following regress beyond threshold versus main:

  • Cold full-index wall time: > 5%
  • Warm search latency: > 5%
  • Peak RSS on large repo: > 10%

Suggested Benchmark Corpus

  • openclaw
  • one medium repo
  • one small repo for variance sanity checks

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions