# NVIDIA NeMo Retriever Reranking

Reranking is a critical piece of high accuracy, efficient retrieval pipelines.

Two important use cases:
- Combining results from multiple data sources
- Enhancing accuracy for single data sources

## Working with NVIDIA NIMs

[ai.nvidia.com](http://ai.nvidia.com) hosts a variety of AI models accessible with an api key and the `langchain-nvidia-ai-endpoints` library. The use cases below operate in this mode by default.

### Combining results from multiple sources

Consider a pipeline with data from a semantic store, such as FAISS, as well as a BM25 store.

Each store is queried independently and returns results that the individual store considers to be highly relevant. Figuring out the overall relevance of the results is where reranking comes into play.

We will search for information about the query `What is the meaning of life?` across a BM25 store and semantic store.

In [1]:
query = "What is the meaning of life?"

#### BM25 relevant documents

Below we assume you have ElasticSearch running with documents stored in a `langchain-index` store.

In [2]:
%pip install --upgrade --quiet langchain-community elasticsearch

Note: you may need to restart the kernel to use updated packages.


In [3]:
import elasticsearch
from langchain_community.retrievers import ElasticSearchBM25Retriever

bm25_retriever = ElasticSearchBM25Retriever(
    client=elasticsearch.Elasticsearch("http://localhost:9200"),
    index_name="langchain-index"
)

In [4]:
bm25_docs = bm25_retriever.get_relevant_documents(query)

  warn_deprecated(


ConnectionError: Connection error caused by: ConnectionError(Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x10bddd1c0>: Failed to establish a new connection: [Errno 61] Connection refused))

#### Semantic documents

Below we assume you have a saved FAISS index.

In [None]:
%pip install --upgrade --quiet langchain-community langchain-nvidia-ai-endpoints faiss-gpu

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

embedder = NVIDIAEmbeddings()

# De-serialization relies on loading a pickle file.
# Pickle files can be modified to deliver a malicious payload that
# results in execution of arbitrary code on your machine.
# Only perform this with a pickle file you have created and no one
# else has modified.
allow_dangerous_deserialization=True

sem_retriever = FAISS.load_local("langchain_index", embeddings=embeddings
                                 allow_dangerous_deserialization=allow_dangerous_deserialization).as_retriever()

In [None]:
sem_docs = sem_retriever.get_relevant_documents(query)

#### Combine and rank documents

The resulting `docs` will be ordered by their relevance to the query.

In [None]:
from langchain_nvidia_ai_endpoints import NVIDIARerank

ranker = NVIDIARerank()

all_docs = bm25_docs + sem_docs

docs = ranker.compress_documents(query=query, documents=all_docs)

### Enhancing accuracy for single data sources

Semantic search with vector embeddings is an efficient way to turn a large corpus of documents into a smaller corpus of relevant documents. This is done by trading accuracy for efficiency. Reranking as a tool adds accuracy back into the search by post-processing the smaller corpus of documents. Typically, ranking on the full corpus is too slow for applications.

In [None]:
%pip install --upgrade --quiet langchain langchain-nvidia-ai-endpoints pgvector psycopg langchain-postgres

Below we assume you have Postgresql running with documents stored in a collection named `langchain-index`.

We will narrow the collection to 1,000 results and further narrow it to 10 with the reranker.

In [None]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
from langchain.vectorstores.pgvector import PGVector

ranker = NVIDIARerank(top_n=10)
embedder = NVIDIAEmbeddings()

store = PGVector(embeddings=embedder,
                 collection_name="langchain-index",
                 connection="postgresql+psycopg://langchain:langchain@localhost:6024/langchain")

subset_docs = store.similarity_search(query, k=1_000)

docs = ranker.compress_documents(query=query, documents=subset_docs)

## Working with a local NIM

[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)

The `NVIDIAEmbeddings` and `NVIDIARerank` classes give you a way to work with local NIMs through `mode` switching.

In [None]:
# connect to an embedding NIM running at localhost:2016
embedder = NVIDIAEmbeddings().mode("nim", base_url="http://localhost:2016/v1")

# connect to a reranking NIM running at localhost:1976
ranker = NVIDIARerank().mode("nim", base_url="http://localhost:1976/v1")

You can rerun the examples above with this new `embedder` and `ranker`.