Add Support for XProvence Sentence-Level Context Pruning (naver/xprovence-reranker-bgem3-v1) #770
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR integrates XProvence (naver/xprovence-reranker-bgem3-v1), a zero-cost context pruning model for RAG. The model scores sentences by query relevance and removes irrelevant ones, returning both reranking scores and
pruned_text(the pruned context).Motivation
In RAG pipelines, retrieved documents often include distracting content that confuses LLMs and wastes tokens. XProvence mitigates this by:
Changes
Python Backend (backends/python/)
XProvenceModelclass withprocess()for sentence-level pruningpruned_textfield toScoretypeflash_attnimports optional for environments without flash attentionbfloat16 → float32conversion (XProvenceprocess()requires float32)Core (core/)
raw_queryandraw_textthrough the tokenization pipeline for pruningpruned_textin inference resultsRouter (router/)
pruned_textin HTTP rerank responsegRPC (backends/grpc-client/, backends/proto/)
pruned_textfield to protobuf definitionsFiles Changed
backends/python/.../xprovence_model.py: New XProvence model implementationbackends/python/.../models/__init__.py: Model detection and optionalflash_attnimportbackends/python/.../models/types.py: Addpruned_texttoScorebackends/proto/embed.proto: Addpruned_textto protobufcore/src/tokenization.rs: Pass raw text for pruningcore/src/infer.rs: Handlepruned_textin resultscore/src/queue.rs: Store raw text in queue entriesrouter/src/http/types.rs: Addpruned_textto response typerouter/src/http/server.rs: Includepruned_textin rerank responseConfiguration
XPROVENCE_THRESHOLD: Pruning threshold0.0–1.0(default:0.3)XPROVENCE_ALWAYS_SELECT_TITLE: Keep first sentence as title (default:true)Usage
API Example
Request
Response
[ { "index": 0, "text": "Deep learning uses neural networks. The weather is nice. I like pizza.", "score": 0.9997, "pruned_text": "Deep learning uses neural networks." } ]Test Plan
pruned_textcontains only relevant sentencesReferences