Skip to content

feat: batch embedding with local filesystem cache#81

Merged
samcm merged 1 commit intomasterfrom
bold-octopus-392
Mar 17, 2026
Merged

feat: batch embedding with local filesystem cache#81
samcm merged 1 commit intomasterfrom
bold-octopus-392

Conversation

@samcm
Copy link
Copy Markdown
Member

@samcm samcm commented Mar 17, 2026

Summary

Embedding was doing one HTTP call per item — about 1800 sequential requests for EIPs alone, which took ~20 minutes and blocked the server from starting. Now all index builders (examples, runbooks, EIPs) use EmbedBatch, split into sub-batches of 50 on the server side and 100 on the proxy side. A new filesystem cache (pkg/cache/filesystem.go) stores vectors locally keyed by {model}:{textHash}, so warm restarts skip the proxy entirely. The proxy /embed endpoint is capped at 500 items and the old EIP vector cache (which didn't track model identity) is gone. Also added actual info-level logging so you can tell what's happening during startup.

…hing

Server startup was blocked for ~20 minutes embedding ~1800 EIP chunks
one HTTP request at a time. This change:

- Batch embeds all index items (examples, runbooks, EIPs) via EmbedBatch
- Splits large batches into sub-batches of 50 (server) / 100 (proxy)
- Adds a local filesystem cache (pkg/cache/filesystem.go) so warm
  restarts avoid proxy round-trips entirely (~55MB of vectors)
- Cache keys include model name for automatic invalidation on model change
- Adds 500-item hard cap on proxy /embed endpoint
- Removes the broken EIP-local vector cache (no model awareness)
- Improves info-level logging throughout the embedding pipeline
@samcm samcm merged commit 11dc69a into master Mar 17, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant