Skip to content

fix: persist reranker ONNX cache to ~/.cache/fastembed#19

Closed
joyson-fernandes wants to merge 1 commit intolyonzin:masterfrom
joyson-fernandes:fix/reranker-offline-cache
Closed

fix: persist reranker ONNX cache to ~/.cache/fastembed#19
joyson-fernandes wants to merge 1 commit intolyonzin:masterfrom
joyson-fernandes:fix/reranker-offline-cache

Conversation

@joyson-fernandes
Copy link
Copy Markdown

Summary

  • Pass cache_dir=~/.cache/fastembed to TextCrossEncoder so the reranker ONNX model is cached in a durable location instead of the default system temp directory.
  • Matches the existing fix on the embedder (TextEmbedding, line 145).

Why

On macOS the default FastEmbed cache dir resolves to $TMPDIR, which the OS periodically purges. Combined with HF_HUB_OFFLINE=1 (recommended for startup stability), any purge between runs causes:

Could not load model Xenova/ms-marco-MiniLM-L-6-v2 from any source.

When the reranker was first added it missed the cache_dir kwarg that TextEmbedding already uses. This PR makes both consistent so both models survive tmp cleaning.

Test plan

  • rm -rf ~/.cache/fastembed/models--Xenova--ms-marco-MiniLM-L-6-v2
  • Start MCP server with HF_HUB_OFFLINE=1 → reranker now loads (previously failed)
  • search_knowledge returns reranked results as before
  • Embedder behaviour unchanged

🤖 Generated with Claude Code

Before this change TextCrossEncoder was instantiated without a cache_dir,
so FastEmbed defaulted to a temp directory. macOS periodically purges it,
which meant that any run with HF_HUB_OFFLINE=1 couldn't find the ONNX
weights and failed with "Could not load model Xenova/ms-marco-MiniLM-L-6-v2
from any source."

TextEmbedding already had this fix applied on line 145; the reranker just
never got the same treatment when it was added. Match the embedder's
cache_dir so both live at ~/.cache/fastembed/ and survive macOS tmp cleaning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lyonzin
Copy link
Copy Markdown
Owner

lyonzin commented Apr 20, 2026

Hey @joyson-fernandes, thanks for taking the time to report this!

You're right that the macOS $TMPDIR purge combined with HF_HUB_OFFLINE=1 can break model loading — valid catch.

That said, we'll handle the fix internally since we want to apply it a bit differently. Closing this one out, but appreciate the heads-up.

@lyonzin lyonzin closed this Apr 20, 2026
lyonzin pushed a commit that referenced this pull request Apr 20, 2026
CrossEncoderReranker was not passing cache_dir to TextCrossEncoder,
causing model re-download on macOS where $TMPDIR is periodically purged.
Aligns with TextEmbedding which already uses config.models_cache_dir.

Closes #19
lyonzin added a commit that referenced this pull request Apr 20, 2026
CrossEncoderReranker was not passing cache_dir to TextCrossEncoder,
causing model re-download on macOS where $TMPDIR is periodically purged.
Aligns with TextEmbedding which already uses config.models_cache_dir.

Closes #19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants