# NVIDIA NIMs

The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on 
NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models 
from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA 
accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single 
command on NVIDIA accelerated infrastructure.

NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, 
NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, 
giving enterprises ownership and full control of their IP and AI application.

NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. 
At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model.

This example goes over how to use LangChain to interact with a NIM for a re-ranking model as well as a NIM for embeddings via LangChain's `NVIDIARerank` and `NVIDIAEmbeddings` classes. The example demonstrates how a re-ranking model can be used to combine retrieval results and improve accuracy during retrieval of documents.

For more information on accessing the chat models through this API, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation.

# NVIDIA NeMo Retriever Reranking

Reranking is a critical piece of high accuracy, efficient retrieval pipelines.

Two important use cases:
- Combining results from multiple data sources
- Enhancing accuracy for single data sources

We'll demonstrate the former below.

## Installation

In [1]:
%pip install --upgrade --quiet langchain-nvidia-ai-endpoints

Note: you may need to restart the kernel to use updated packages.


## Setup

**To get started:**

1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.

2. Select the `Retrieval` tab, then select your model of choice.

3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.

4. Copy and save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.

In [2]:
import getpass
import os

# del os.environ['NVIDIA_API_KEY']  ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

## Working with NVIDIA NIMs

When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.

[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)


In [3]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, NVIDIARerank

# connect to an embedding NIM running at localhost:8080
embedder = NVIDIAEmbeddings(base_url="http://localhost:8080/v1")

# connect to a reranking NIM running at localhost:2016
reranker = NVIDIARerank(base_url="http://localhost:2016/v1")

### Combining results from multiple sources

Consider a pipeline with data from a BM25 store as well as a semantic store, such as FAISS.  

Each store is queried independently and returns results that the individual store considers to be highly relevant. Figuring out the overall relevance of the results is where re-ranking comes into play.

We will search for information about the query `What is the meaning of life?` across a both a BM25 store and semantic store.

In [4]:
query = "What is the meaning of life?"

#### BM25 relevant documents

Let's create a BM25 index that we can query. We'll use the [`BM25Retriever`](hhttps://python.langchain.com/v0.2/docs/integrations/retrievers/bm25/) and web search results from [DuckDuckGo](https://duckduckgo.com/).

In [5]:
%pip install --upgrade --quiet langchain-community duckduckgo-search beautifulsoup4 rank_bm25

Note: you may need to restart the kernel to use updated packages.


In [6]:
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langchain_community.retrievers import BM25Retriever
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
import requests
from bs4 import BeautifulSoup

In [7]:
from typing import List

def build_documents(query, search_util, text_splitter, source) -> List[Document]:
    documents = []
    print(f"Building documents for {query}")
    for result in search_util(query):
        print(f"Processing {result['title']} - {result['link']}")
        try:
            text = BeautifulSoup(requests.get(result["link"]).text, "html.parser").get_text()
            for text in text_splitter.split_text(text):
                documents.append(
                    Document(
                        page_content=text,
                        metadata={
                            "title": result["title"],
                            "url": result["link"],
                            "source": source,
                        },
                    )
                )
        except Exception as e:
            print(f"Skipping due to connection error: {e}")
    print(f"Done building {len(documents)} documents")
    return documents        

In [8]:
bm25_tool = lambda query: DuckDuckGoSearchAPIWrapper().results(query, max_results=100, source="text")
bm25_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200, length_function=len)
bm25_retriever = BM25Retriever.from_documents(build_documents(query, bm25_tool, bm25_splitter, "DuckDuckGo Text"))

Building documents for What is the meaning of life?
Processing Meaning of life - Wikipedia - https://en.wikipedia.org/wiki/Meaning_of_life
Processing Life | Definition, Origin, Evolution, Diversity, & Facts | Britannica - https://www.britannica.com/science/life
Processing 4 philosophical answers to the meaning of life - Big Think - https://bigthink.com/thinking/four-philosophical-answers-meaning-of-life/
Processing 5 Philosophical Answers to the Meaning of Life - WorldAtlas - https://www.worldatlas.com/philosophy/5-philosophical-answers-to-the-meaning-of-life.html
Processing The Ultimate Meaning of Life | Psychology Today - https://www.psychologytoday.com/us/blog/mindbloggling/202310/the-ultimate-meaning-of-life
Processing The Meaning of Life: Exploring Different Perspectives and ... - Medium - https://theinstituteofintellect.medium.com/the-meaning-of-life-8ff69decb5c6
Processing 'What Is the Point of Life?': Why You Might Feel This Way - Verywell Mind - https://www.verywellmind.com/wh

Return the relevant documents from the query `"What is the meaning of life?"` with the BM25 retriever.

In [9]:
bm25_retriever.k = 500
bm25_docs = bm25_retriever.invoke(query)
len(bm25_docs), bm25_docs[:5]

(500,
 [Document(page_content='Origin of the question\nPhilosopher in Meditation (detail) by RembrandtArthur Schopenhauer was the first to explicitly ask the question,[1] in an essay entitled "Character".Since a man does not alter, and his moral character remains absolutely the same all through his life; since he must play out the part which he has received, without the least deviation from the character; since neither experience, nor philosophy, nor religion can effect any improvement in him, the question arises, What is the meaning of life at all? To what purpose is it played, this farce in which everything that is essential is irrevocably fixed and determined?[5]Questions about the meaning of life, and similar, have been expressed in a broad variety of other ways, including:\nWhat is the meaning of life? What\'s it all about? Who are we?[6][7][8]\nWhy are we here? What are we here for?[9][10][11]\nWhat is the origin of life?[12]\nWhat is the nature of life? What is the nature of rea

#### Semantic documents

Below we assume you have a saved FAISS index.

In [10]:
%pip install --upgrade --quiet faiss-gpu

Note: you may need to restart the kernel to use updated packages.


In [11]:
from langchain_community.vectorstores import FAISS

# De-serialization relies on loading a pickle file.
# Pickle files can be modified to deliver a malicious payload that
# results in execution of arbitrary code on your machine.
# Only perform this with a pickle file you have created and no one
# else has modified.
allow_dangerous_deserialization=True

sem_tool = lambda query: DuckDuckGoSearchAPIWrapper().results(query, max_results=100, source="news")
sem_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, length_function=len)
sem_store = FAISS.from_documents(build_documents(query, sem_tool, sem_splitter, "DuckDuckGo News"), embedding=NVIDIAEmbeddings(truncate="END"))

Building documents for What is the meaning of life?
Processing 25 Philosophical Quotes on the Meaning of Life - https://www.msn.com/en-us/lifestyle/mind-and-soul/25-philosophical-quotes-on-the-meaning-of-life/ss-AA1nLKXP
Processing Monty Python's 'The Meaning Of Life' In Cannes: In 1983, The World's Most Serious Film Festival Went For Something Completely Different… - https://www.msn.com/en-us/movies/other/monty-python-s-the-meaning-of-life-in-cannes-in-1983-the-world-s-most-serious-film-festival-went-for-something-completely-different/ar-BB1mHt40
Processing Religion and the Meaning of Life - https://www.cambridge.org/core/books/religion-and-the-meaning-of-life/2C118CBF40B68F288B9010457F78571E
Processing Jeremy Fink and the Meaning of Life Streaming: Watch & Stream Online via Amazon Prime Video - https://www.yahoo.com/entertainment/jeremy-fink-meaning-life-streaming-042636701.html
Processing An Eastern perspective on the meaning of life - https://www.bbc.com/reel/video/p0b373r6/an-east

In [12]:
sem_retriever = sem_store.as_retriever(
    search_kwargs = {"k": 500},
)

Return the relevant documents from the query `"What is the meaning of life?"` with FAISS semantic store.

In [13]:
sem_docs = sem_retriever.invoke(query)
len(sem_docs), sem_docs[:5]

(500,
 [Document(page_content='Definition of Life | Science News\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSkip to content\n\n\n\n\n\n\n\n\n\n\n\n\n\n\t\tSubscribe today\t\n\nEvery print subscription comes with full digital access\n\n\nSubscribe now\n\n\n\n\nMenu\n\n\n\n\nAll Topics\nHealth\nHumans\nLife\nEarth\nPhysics\nSpace\n \n\nMagazine\nMenu\n\nAll Stories\nMultimedia\nReviews\nCollections\nCentury of Science\nFor Educators\nCoronavirus Outbreak\nNewsletters\n\n\nAbout\nFor Students\nOur Store\n\n\n\nSIGN IN\n\n\n\t\t\tDonate\t\t\n\n\n\nScience News\n\n\n\t\t\tINDEPENDENT JOURNALISM SINCE 1921\t\t\n\n\nSIGN IN\n\n\n\n\n\n\n\t\t\t\tSearch\t\t\t\n\n\n\n\n\n\nOpen search\n\n\nClose search\n\n\n\n\n\n\n\nScience News\n\n\n\t\t\tINDEPENDENT JOURNALISM SINCE 1921\t\t\n\n\n\n\n\n\t\t\t\tAll Topics\t\t\t\n\n\n\n\t\tEarth\t\n\n\n\n\t\t\t\t\t\tAgriculture\t\t\t\t\t\n\n\n\n\t\t\t\t\t\t

#### Combine and rank documents

Let's combine the BM25 as well as semantic search results. The resulting `docs` will be ordered by their relevance to the query by the reranking NIM.

#### Note on truncation

Reranking models typically have a fixed context window that determines the maximum number of input tokens that can be processed. This limit could be a hard limit, equal to the model's maximum input token length, or an effective limit, beyond which the accuracy of the ranking decreases.

Since models operate on tokens and applications usually work with text, it can be challenging for an application to ensure that its input stays within the model's token limits. By default, an exception is thrown if the input is too large.

To assist with this, NVIDIA's NIMs (API Catalog or local) provide a `truncate` parameter that truncates the input on the server side if it's too large.

The `truncate` parameter has three options:
 - "NONE": The default option. An exception is thrown if the input is too large.
 - "END": The server truncates the input from the end (right), discarding tokens as necessary.

In [14]:
ranker = NVIDIARerank(truncate="END")

all_docs = bm25_docs + sem_docs

ranker.top_n = 5
docs = ranker.compress_documents(query=query, documents=all_docs)
docs

[Document(page_content='The meaning of life can be derived from philosophical and religious contemplation of, and scientific inquiries about, existence, social ties, consciousness, and happiness. Many other issues are also involved, such as symbolic meaning, ontology, value, purpose, ethics, good and evil, free will, the existence of one or multiple gods, conceptions of God, the soul, and the afterlife. Scientific contributions focus primarily on describing related empirical facts about the universe, exploring the context and parameters concerning the "how" of life. Science also studies and can provide recommendations for the pursuit of well-being and a related conception of morality. An alternative, humanistic approach poses the question, "What is the meaning of my life?"', metadata={'title': 'Meaning of life - Wikipedia', 'url': 'https://en.wikipedia.org/wiki/Meaning_of_life', 'source': 'DuckDuckGo Text', 'relevance_score': 3.875}),
 Document(page_content='What is the meaning of life