# Accelerating RAG with NVIDIA and Redis

[NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B.

These models, hosted on the NVIDIA API catalog, are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.

This demo showcases a RAG stack using LangChain, NVIDIA for model serving. Mistral *(Mixtral 8x7B model)*, and Redis.

**Get Started Here:**

<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/RAG/nvidia_ai_rag_redis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## 1. Setup
Before we begin, we must install some required libraries, authenticate with NVIDIA, create a Redis database, and initialize other required components.


### Install required libraries

In [None]:
%pip install --upgrade -q langchain-core langchain-community langchain-nvidia-ai-endpoints
%pip install -q "unstructured[pdf]" sentence-transformers
%pip install -q redisvl>=0.3.0

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

### Connect to NVIDIA hosted LLM and embedding models

In [None]:
import getpass
import os

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvapi_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

Enter your NVIDIA API key: ··········


In [None]:
from langchain_core.messages import ChatMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_nvidia_ai_endpoints import ChatNVIDIA

# Create LLM instance with the Mistral model
llm = ChatNVIDIA(
    model="mistralai/mixtral-8x22b-instruct-v0.1",
    temperature=0.1,
    top_p=1.0,
)

In [None]:
llm.invoke("Write me a one sentence encouragement.").content

' "Remember, every step you take, no matter how small, is a step towards your goal; keep going, you\'re doing great!"'

In [None]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

emb = NVIDIAEmbeddings()

In [None]:
len(emb.embed_documents(["test"])[0])

1024

### Fetch sample dataset

For the demo we will focus on a chevy colorado truck brochure.


In [None]:
!mkdir -p 'data/'
!wget 'https://raw.githubusercontent.com/redis-developer/LLM-Document-Chat/main/docs/2022-chevrolet-colorado-ebrochure.pdf' -O 'data/2022-chevrolet-colorado-ebrochure.pdf'

--2024-05-23 16:48:12--  https://raw.githubusercontent.com/redis-developer/LLM-Document-Chat/main/docs/2022-chevrolet-colorado-ebrochure.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3566101 (3.4M) [application/octet-stream]
Saving to: ‘data/2022-chevrolet-colorado-ebrochure.pdf’


2024-05-23 16:48:12 (29.8 MB/s) - ‘data/2022-chevrolet-colorado-ebrochure.pdf’ saved [3566101/3566101]



### Install Redis locally
If you have a Redis db running elsewhere with [Redis Stack](https://redis.io/docs/about/about-stack/) installed, you don't need to run it on this machine.

In [None]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack





## RAG with NVIDIA hosted LLM and Redis




### Prepare document for RAG

In [None]:
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
from langchain.document_loaders import UnstructuredFileLoader

# Load list of pdfs from a folder
data_path = f"data/"
docs = [os.path.join(data_path, file) for file in os.listdir(data_path)]

print("Listing available documents ...", docs)

Listing available documents ... ['data/2022-chevrolet-colorado-ebrochure.pdf']


In [None]:
text_splitter = SentenceTransformersTokenTextSplitter()

loader = UnstructuredFileLoader(
    docs[0], mode="single", strategy="fast"
)

# extract, load, and make chunks
chunks = loader.load_and_split(text_splitter)

print("Done preprocessing. Created", len(chunks), "chunks")



Done preprocessing. Created 35 chunks


### Load documents to Redis vector database

In [None]:
from langchain_community.vectorstores.redis import Redis

vectordb = Redis.from_documents(
    documents=chunks,
    embedding=emb,
    redis_url="redis://localhost:6379"
  )

In [None]:
retriever = vectordb.as_retriever(
    search_type="similarity_distance_threshold",
    search_kwargs={"distance_threshold":0.4}
)

### Design RAG prompt


In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    """You are an intelligent assistant specializing in the Chevy Colorado
    2022. You have access to the car manual and production information and should help
    answer users questions based on provided context below. Be truthful and honest --
    do not make anything up if it's not clearly provided in the context below.

    User Question: {question}

    Context:\n{context}

    Answer:"""
)


### Build RAG chain

In [None]:
# Build the RAG chain using LCEL
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_core.pydantic_v1 import BaseModel


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


chain = (
    RunnableParallel({"context": retriever | format_docs, "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
)


# Add typing for input
class Question(BaseModel):
    __root__: str


chain = chain.with_types(input_type=Question)

Let's test this out.

In [None]:
chain.invoke("What models are available for the chevy colorado?")

' The 2022 Chevy Colorado is available in four models: WT, LT, Z71, and ZR2. These models can come in either an extended cab or crew cab configuration. The engines available are a 2.5L 4-cylinder, a 3.6L V6, and a Duramax 2.8L turbo-diesel engine. The Duramax engine provides up to 7,700 lbs. of towing capacity and can deliver up to 30 mpg on the highway. The ZR2 model is an off-road beast with the capability to conquer tough trails. Additionally, there is an available ZR2 Bison Edition which includes 17-inch AEV.'

Now we can apply this same logic to nodes from our pdf document.

### Add sources to the response

In [None]:
rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

In [None]:
rag_chain_with_source.invoke("What models are available for the chevy colorado?")

{'context': [Document(page_content='2022 colorado choose your adventure. the 2022 colorado delivers everything you could ask for in a midsize pickup. engine choices that are powerful and efficient, including an available gm - exclusive duramax® 2. 8l turbo - diesel engine that provides up to 7, 700 lbs. of towing1, 2 muscle. a zr2 off - road beast with the capability to conquer tough trails. and a comfortable interior filled with convenience and technology features. so go ahead. choose your best life in colorado. colorado crew cab zr2 in sand dune metallic with available zr2 dusk special edition. vehicle shown can tow up to 5, 000 lbs. 2, 3 1 requires colorado crew cab short box lt 2wd with available trailering package, lt convenience package and safety package. 2 maximum trailering ratings are intended for comparison purposes only. before you buy a vehicle or use it for trailering, carefully review the trailering section of the owner ’ s manual. the trailering capacity of your specifi