# NVIDIA AI Foundation Endpoints 

> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable Diffusion, etc. These models, hosted on the [NVIDIA NGC catalog](https://catalog.ngc.nvidia.com/ai-foundation-models), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.
> 
> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these models can be deployed anywhere with enterprise-grade security, stability, and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).
> 
> These models can be easily accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) package, as shown below.

This example goes over how to use LangChain to interact with the supported [NVIDIA Reranker Model](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/nvolve-40k) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.

For more information on accessing the chat models through this api, check out the [ChatNVIDIA](../chat/nvidia_ai_endpoints) documentation.

## Installation

In [1]:
# %pip install --upgrade --quiet langchain-nvidia-ai-endpoints
# %pip install --upgrade --quiet langchain langchain-community langchain-text-splitters
# %pip install --upgrade --quiet faiss-cpu

## Setup

**To get started:**

1. Create a free account with the [NVIDIA NGC](https://catalog.ngc.nvidia.com/) service, which hosts AI solution catalogs, containers, models, etc.

2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.

3. Select the `API` option and click `Generate Key`.

4. Save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.

In [2]:
import getpass
import os

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvapi_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

## Initialization

Let's list out some of the models we will be using for this notebook:

In [3]:
from langchain_nvidia_ai_endpoints import (
    ChatNVIDIA,
    NVIDIAEmbeddings,
    NVIDIARerank,
)

NVIDIARerank.get_available_models()

[Model(id='nvidia/rerank-qa-mistral-4b', model_type='ranking')]

In [4]:
NVIDIAEmbeddings.get_available_models()

[Model(id='nvidia/embed-qa-4', model_type='embedding'),
 Model(id='snowflake/arctic-embed-l', model_type='embedding')]

In [5]:
ChatNVIDIA.get_available_models(filter="mistralai/")

[Model(id='mistralai/mistral-7b-instruct-v0.2', model_type='chat'),
 Model(id='mistralai/mistral-large', model_type='chat'),
 Model(id='mistralai/mixtral-8x22b-instruct-v0.1', model_type='chat'),
 Model(id='mistralai/mixtral-8x7b-instruct-v0.1', model_type='chat')]

Among the list above, we should be able to see the following models:
- `ai-mixtral-8x7b-instruct`: A NIM-containerized Mixtral-8x7b model which we will use as our LLM backbone via `ChatNVIDIA`.
- `ai-embed-qa-4`: A NIM-containterized query-answer embedding model based on the e5-large architecture which we will use to generate embeddings via `NVIDIAEmbeddings`.
- `ai-rerank-qa-mistral-4b`: A NIM-containerized mistral-backed question-answer reranking model which we will use to rank question-answer pairs via `NVIDIARerank`.

In this notebook, we will focus on the **Reranking Model** which evaluates the relevance of passages in making decisions about a query. They are a common component of a retrieval-augmented generation pipeline and allow you to access quick relevance scores to help rank, order, filter, or otherwise process your retrieval. 

Let's initialize these models for use later:

In [6]:
from langchain_nvidia_ai_endpoints import NVIDIARerank

# llm = ChatNVIDIA(model="ai-mixtral-8x7b-instruct")
# embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
# reranker = NVIDIARerank(model="ai-rerank-qa-mistral-4b")

llm = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
embedder = NVIDIAEmbeddings(model="nvidia/embed-qa-4")
reranker = NVIDIARerank(model="nvidia/rerank-qa-mistral-4b")


In [7]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

documents = TextLoader("../../modules/state_of_the_union.txt",).load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
for idx, text in enumerate(texts):
    text.metadata["id"] = idx

retriever = FAISS.from_documents(texts, embedder).as_retriever(search_kwargs={"k": 3})

query = "What did the president say about Ketanji Brown Jackson"
docs = retriever.invoke(query)

print("\nDoc Snippets:")
for doc in docs:
    print(repr(doc.page_content[:200])+"...")
    print({k:v for k,v in doc.dict().items() if k != "page_content"})


Doc Snippets:
'One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court o'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 73}, 'type': 'Document'}
'As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n\nWhile it often appear'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 79}, 'type': 'Document'}
'And I know you’re tired, frustrated, and exhausted. \n\nBut I also know this. \n\nBecause of the progress we’ve made, because of your resilience and the tools we have, tonight I can say  \nwe are moving fo'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 55}, 'type': 'Document'}


In [8]:
top_docs = reranker.compress_documents(docs, query, top_n=5)

print("Query:", query)

print("\nMost Relevant Chunks:")
for doc in top_docs:
    print(repr(doc.page_content[:100])+"...")
    print({k:v for k,v in doc.dict().items() if k != "page_content"})

print("\n'Relevant' Documents:")
for doc in top_docs:
    if doc.metadata.get('relevance_score') > 0:
        print(doc.page_content)

Query: What did the president say about Ketanji Brown Jackson

Most Relevant Chunks:
'One of the most serious constitutional responsibilities a President has is nominating someone to ser'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 73, 'relevance_score': 0.1844482421875}, 'type': 'Document'}
'As I said last year, especially to our younger transgender Americans, I will always have your back a'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 79, 'relevance_score': -16.078125}, 'type': 'Document'}
'And I know you’re tired, frustrated, and exhausted. \n\nBut I also know this. \n\nBecause of the progres'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 55, 'relevance_score': -18.046875}, 'type': 'Document'}

'Relevant' Documents:
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Ci

In [9]:
reranker.client.last_response.json()

{'rankings': [{'index': 0, 'logit': 0.1844482421875},
  {'index': 1, 'logit': -16.078125},
  {'index': 2, 'logit': -18.046875}]}

In [10]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.chains import RetrievalQA

compression_retriever = ContextualCompressionRetriever(
    base_compressor=reranker, base_retriever=retriever
)

compressed_docs = compression_retriever.invoke(
    "What did the president say about Ketanji Jackson Brown"
)
print("Most Relevant Documents:", [doc.metadata["id"] for doc in compressed_docs])

chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)
chain.invoke(query)

Most Relevant Documents: [73, 79, 55]


{'query': 'What did the president say about Ketanji Brown Jackson',
 'result': " The president, Joe Biden, nominated Ketanji Brown Jackson to serve on the United States Supreme Court four days ago. Judge Jackson is one of the nation's top legal minds and will continue Justice Breyer's legacy of excellence."}