# NvidiaRetriever: NVIDIA RAG Blueprint Integration

The `NvidiaRetriever` connects LangChain to the [NVIDIA RAG Blueprint](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/RAG) `/v1/search` endpoint. Use it to retrieve relevant documents from a containerized RAG deployment.

**Prerequisites:**
- A running NVIDIA RAG server (typically at `http://localhost:8081`). Follow the documentation [here](https://docs.nvidia.com/rag/latest/deploy-docker-self-hosted.html) to set it up.
- At least one ingested collection in the vector database (e.g. `test_multimodal_query` as shown in this example)

**Features:**
- Sync and async retrieval
- Full support for DocumentSearch parameters (reranker, query rewriting, filters, etc.)
- Clear exceptions when the server is unreachable or returns errors

## Install the Package

In [None]:
%pip install --upgrade --quiet langchain-nvidia-ai-endpoints

## Basic Usage

Create a retriever pointing at your RAG server. The default base URL is `http://localhost:8081`. Specify the collection(s) to search and the number of documents to return (`k`).

In [None]:
from langchain_nvidia_ai_endpoints import NvidiaRetriever

retriever = NvidiaRetriever(
    base_url="http://localhost:8081",
    collection_names=["test_multimodal_query"],  # Provide your collection names here
    k=5,
)

docs = retriever.invoke("What is RAG?")
print(f"Retrieved {len(docs)} documents")
print(f"Retrieved citations: {docs}")
for i, doc in enumerate(docs):
    print(f"\n--- Document {i + 1} ---")
    print(f"Content: {doc.page_content}...")
    print(f"Score: {doc.metadata.get('score', 'N/A')}")
    print(f"Source: {doc.metadata.get('document_name', 'N/A')}")

## Customize Parameters

`NvidiaRetriever` supports all DocumentSearch API parameters. Common options:

In [None]:
retriever = NvidiaRetriever(
    base_url="http://localhost:8081",
    collection_names=["test_multimodal_query"],
    k=4,                    # reranker_top_k: number of docs to return
    vdb_top_k=50,           # candidates from vector DB before reranking
    enable_reranker=False,
    enable_query_rewriting=False,
)

docs = retriever.invoke("Explain approach 1")
print(f"Retrieved {len(docs)} documents")
for i, doc in enumerate(docs):
    print(f"\n--- Document {i + 1} ---")
    print(f"Content: {doc.page_content}...")
    print(f"Score: {doc.metadata.get('score', 'N/A')}")
    print(f"Source: {doc.metadata.get('document_name', 'N/A')}")

## Async Retrieval

Use `ainvoke` for async workflows:

In [None]:
# In Jupyter, use top-level await (don't use asyncio.run() - the kernel already has an event loop).
retriever = NvidiaRetriever(
    base_url="http://localhost:8081",
    collection_names=["test_multimodal_query"],
    k=3,
)
docs = await retriever.ainvoke("What is RAG?")
print(f"Retrieved {len(docs)} documents")
for i, doc in enumerate(docs):
    print(f"\n--- Document {i + 1} ---")
    print(f"Content: {doc.page_content}...")
    print(f"Score: {doc.metadata.get('score', 'N/A')}")
    print(f"Source: {doc.metadata.get('document_name', 'N/A')}")

## Use in a RAG Chain

Combine `NvidiaRetriever` with a chat model for end-to-end RAG. For `ChatNVIDIA` with the cloud API, set `NVIDIA_API_KEY` in your environment.

In [None]:
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NvidiaRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

retriever = NvidiaRetriever(
    base_url="http://localhost:8081",
    collection_names=["test_multimodal_query"],
    k=4,
)

llm = ChatNVIDIA(model="meta/llama-3.1-8b-instruct")

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer based only on the following context:\n\n{context}"),
    ("human", "{question}"),
])

def format_docs(docs):
    return "\n\n".join(d.page_content for d in docs)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

response = chain.invoke("What is RAG?")
print(response)

## Error Handling

The retriever raises specific exceptions when the RAG server is unreachable or returns errors:

In [None]:
from langchain_nvidia_ai_endpoints import NvidiaRetriever
from langchain_nvidia_ai_endpoints.retrievers import (
    NvidiaRAGConnectionError,
    NvidiaRAGServerError,
    NvidiaRAGValidationError,
)

try:
    retriever = NvidiaRetriever(
        base_url="http://localhost:8081",
        collection_names=["test_multimodal_query"],
    )
    docs = retriever.invoke("test query")
    print(f"Success: {len(docs)} documents")
except NvidiaRAGConnectionError as e:
    print(f"Connection failed: {e}")
except NvidiaRAGValidationError as e:
    print(f"Validation error (422): {e}")
except NvidiaRAGServerError as e:
    print(f"Server error ({e.status_code}): {e}")

## Related Topics

- [langchain-nvidia-ai-endpoints README](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/ai-endpoints/README.md)
- [NVIDIA RAG Blueprint](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/RAG)