What's New in v1.11.0
Added
- Bring Your Own Model (BYOM) — embedding model is now configurable via the
[embedding] model = "..." setting. Four models validated: BAAI/bge-small-en-v1.5 (default), BAAI/bge-large-en-v1.5, mixedbread-ai/mxbai-embed-large-v1, and nomic-ai/nomic-embed-text-v1.5. See docs/embedding-models.md for MTEB scores and trade-offs.
model_name threaded through pdf_search and pdf_cache_stats responses so agents can verify which embedding model produced a given result.
model column on page_embeddings cache table — switching models evicts stale rows automatically (no manual cache clear needed).
- Embedding-model benchmark script (
scripts/bench_embedding_models.py) with MRR + latency gate, summary tables, and markdown export.
embedding_model property on Config (renamed MODEL_NAME → DEFAULT_MODEL).
Fixed
embedder batch_size lowered to 8 (fastembed default is 256) to prevent OOM and hang on long-context models like nomic-embed-text-v1.5 (8192-token window) when processing 75-page PDFs.
Changed
- Bumped
pip to 26.1.1 and python-multipart to 0.0.27 (transitive dep updates).
Installation
pip install pdf-mcp==1.11.0
Links