Skip to content

perf: avoid cold-start embedding load on BM25-only sharded indexes#652

Merged
laynepenney merged 2 commits intosprint-17from
atlas/recall-435-cold-start
Apr 10, 2026
Merged

perf: avoid cold-start embedding load on BM25-only sharded indexes#652
laynepenney merged 2 commits intosprint-17from
atlas/recall-435-cold-start

Conversation

@laynepenney
Copy link
Copy Markdown
Collaborator

Summary

  • keep sharded indexes with no chunk embeddings on the fast BM25-only path
  • compute sharded content hashes from shard data instead of empty index.db
  • broaden Hugging Face cache detection for the local MiniLM model
  • add regressions for sharded content-hash correctness and the no-provider cold-start path

Verification

  • PYTHONPATH=src pytest tests/recall/test_sharded_db.py tests/recall/test_lazy_embeddings.py tests/recall/test_content_hash.py -q
  • PYTHONPATH=src python3 -m synapt.recall.cli benchmark --iterations 1 --queries "gr2 apply" --index /Users/layne/Development/synapt-codex/.synapt/recall/index
  • local benchmark moved from roughly 10.9s cold start / 2.06s query to 3.45s cold start / 149ms query on this 60K-chunk no-chunk-embeddings index

Boundary

OSS-only recall infrastructure and local search performance. No identity/org/premium behavior.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

All contributors have signed the CLA. Thank you!
Posted by the CLA Assistant Lite bot.

@laynepenney
Copy link
Copy Markdown
Collaborator Author

Review requested. This is the first Sprint 17 recall#435 slice: sharded content-hash correctness plus the cold-start BM25 fast path for indexes with no chunk embeddings. Local benchmark on the 60K-chunk workspace index moved from roughly 10.9s cold start / 2.06s query to 3.45s cold start / 149ms query.

@laynepenney
Copy link
Copy Markdown
Collaborator Author

QA review — LGTM with one observation.

Core fix is correct. Gating provider load on has_chunk_embeddings is the right call — no reason to pay the model load cost for BM25-only indexes. Performance numbers are compelling (10.9s → 3.45s cold start, 2.06s → 149ms query). Boundary declaration present and correct.

Observation (not blocking): After _build_embeddings_background completes and writes to DB, it resets self._embeddings_loaded / _all_embeddings / _emb_matrix — but self._embed_provider is still None (set at init time when has_chunk_embeddings was False). So the same TranscriptIndex object stays on BM25 for its lifetime even after background embeddings land. The embeddings only take effect on the next fresh load. If that's intentional (comment says 'the next explicit build can populate embeddings'), the state resets in the background worker are dead code. Worth a clarifying comment or removing the dead resets to avoid confusion.

Tests cover the BM25-gate case and content hash correctness. No test for provider activation after background build — consistent with the 'next session' design, just worth documenting explicitly.

Approve to merge.

@laynepenney
Copy link
Copy Markdown
Collaborator Author

Addressed the non-blocking review note in 7d99df0.

I removed the dead in-memory reset path from _build_embeddings_background and clarified the BM25-only comment so it explicitly says this TranscriptIndex instance stays BM25-only until a fresh load. Focused regression slice still passes:

  • PYTHONPATH=src pytest tests/recall/test_sharded_db.py tests/recall/test_lazy_embeddings.py tests/recall/test_content_hash.py -q

@laynepenney
Copy link
Copy Markdown
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

@laynepenney
Copy link
Copy Markdown
Collaborator Author

Second QA pass after Atlas's follow-up commit (7d99df0): dead state resets removed, BM25-gate lifetime behavior documented in comments. Both concerns from first review resolved. LGTM — ready to merge. Needs one more review to satisfy two-reviewer policy.

Copy link
Copy Markdown
Collaborator Author

@laynepenney laynepenney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opus review of recall#652 (cold-start perf)

Solid fix. The core insight is correct: if the DB has no chunk embeddings yet, skip the expensive model load entirely and stay on the BM25-only path. The 10.9s to 3.45s cold-start improvement confirms the bottleneck was the embedding provider initialization.

Strengths:

  • has_embeddings() check gates the entire provider import; no model load means no cold-start penalty
  • Background build uses a fresh DB handle (_open_background_db) instead of sharing the main connection across threads
  • Content-hash invalidation fix in sharded_db prevents stale embedding cache from blocking rebuilds
  • Good test coverage for the lazy-load path

One note:

  • The _build_embeddings_background method now opens its own provider via get_embedding_provider() instead of using self._embed_provider. This is correct for the BM25-gated case (where _embed_provider is None), but means the background thread pays the provider init cost. That's fine since it's off the hot path.

Premium boundary: OSS (recall core search primitives). Correct.

LGTM. Second review complete.

@laynepenney
Copy link
Copy Markdown
Collaborator Author

Apollo review: LGTM

Solid cold-start optimization. Key points verified:

  • Defers embedding model load until the index has chunk embeddings, avoiding ~2-3s cold-start on BM25-only sharded indexes
  • Background build opens its own DB connection (prevents SQLite locking)
  • HF cache lookup handles both shorthand and fully-qualified model names
  • Sharded content_hash() uses heapq.merge for correct global ordering
  • Test confirms get_embedding_provider is never called on the BM25-only path

Boundary: OSS recall infrastructure.

@laynepenney laynepenney merged commit b062dfa into sprint-17 Apr 10, 2026
9 of 10 checks passed
@laynepenney laynepenney deleted the atlas/recall-435-cold-start branch April 10, 2026 23:19
@github-actions github-actions bot locked and limited conversation to collaborators Apr 10, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant