# 2.5 RAG Question & Answering

![RAG - query pipeline](https://python.langchain.com/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png)

## Setup

### Install dependencies

In [None]:
%pip install python-dotenv~=1.0 docarray~=0.40.0 pypdf~=5.1 --upgrade --quiet
%pip install chromadb~=0.5.18 sentence-transformers~=3.3 --upgrade --quiet 
%pip install langchain~=0.3.7 langchain_openai~=0.2.6 langchain_community~=0.3.5 --upgrade --quiet

# If running locally, you can do this instead:
#%pip install -r ../requirements.txt

### Load environment variables

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

# If running in Google Colab, you can use this code instead:
# from google.colab import userdata
# os.environ["AZURE_OPENAI_API_KEY"] = userdata.get("AZURE_OPENAI_API_KEY")
# os.environ["AZURE_OPENAI_ENDPOINT"] = userdata.get("AZURE_OPENAI_ENDPOINT")

### Setup Chat Model

In [None]:
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
api_version = "2024-10-01-preview"
llm = AzureChatOpenAI(deployment_name="gpt-4o", temperature=0.0, api_version=api_version)
embedding_model = AzureOpenAIEmbeddings(model="text-embedding-3-large", api_version=api_version)

### Setup LangSmith tracing for this notebook

In [None]:
import os

# API key etc is in the .env file
# my_name = "Totoro"
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_PROJECT"] = f"tokyo24-test-{my_name}"

### Setup path to data 

In [None]:
data_path = "../data"

## Initialize VectorDB

We've discussed `Document Loading` and `Splitting` as well as `Indexing` and `Retrieval` already.

Let's load our vectorDB and set it up as in chapter 2.3. _If you already have a persisted vectorDB, you can skip to "Vector DB" below._

### Load docs

In [None]:
from langchain.document_loaders import PyPDFLoader

# Load PDFs
loaders = [
    PyPDFLoader(f"{data_path}/MachineLearning-Lecture01.pdf"),
    PyPDFLoader(f"{data_path}/MachineLearning-Lecture01.pdf"),
    PyPDFLoader(f"{data_path}/MachineLearning-Lecture03.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

### Split docs

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)

### Vector DB - Indexing / Store


In [None]:
from langchain.vectorstores import Chroma

# Optional persist_directory to save the database
persist_directory = './db/chroma-ML-docs/'

vectordb = Chroma.from_documents(
    collection_name="ml_docs",
    documents=splits,
    embedding=embedding_model,
    #persist_directory=persist_directory # Optionally persist the database
)

In [None]:
print(vectordb._collection.count())

In [None]:
question = "What are major topics for this class?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

## Create a RAG chain

### Simple RAG chain

In [None]:
from typing import List
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser

def format_docs(docs: List[Document]) -> str:
    return "\n\n".join(doc.page_content for doc in docs)

# Setup chain using LCEL
qa_chain = (
        vectordb.as_retriever()
        | format_docs 
        | llm
        | StrOutputParser()
)

In [None]:
qa_chain.invoke(question)

### Using a prompt

In [None]:
from langchain.prompts import ChatPromptTemplate

# Build prompt
system_template = """Use the following pieces of context to answer the question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
<context>
{context}
</context>
"""
q_and_a_prompt = ChatPromptTemplate([
    ("system", system_template),
    ("human", "{input}"),
])


#### Build a chain with the prompt, injecting the context and question

In [None]:
from langchain_core.runnables import RunnablePassthrough

# Setup chain using LCEL
qa_chain = (
    { # This is a shorthand for a RunnableMap / RunnableParallel
        "context": vectordb.as_retriever() | format_docs,
        "input": RunnablePassthrough(),
    }
    | q_and_a_prompt
    | llm
    | StrOutputParser()
)

In [None]:
question = "Is probability a class topic?"

In [None]:
qa_chain.invoke(question)

### Alternative - using helper functions to create the chain

#### Create the alternative chain

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

combine_docs_chain = create_stuff_documents_chain(llm, q_and_a_prompt)
alt_rag_chain = create_retrieval_chain(vectordb.as_retriever(), combine_docs_chain)

In [None]:
alt_result = alt_rag_chain.invoke({"input": question})

In [None]:
alt_result["answer"]

In [None]:
# Get first source document
alt_result["context"][0]

#### Have a look at the trace in LangSmith
Exammple: https://smith.langchain.com/public/6d3ebe1f-fc1e-434d-90b5-f60e2fe1d286/r

### Next step will add chat memory!