# Building a RAG (Retrieval Augmented Generation) Application

This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) system for question answering over text data sources. We'll use LangChain, LangGraph, and other libraries to create a simple but powerful RAG application.

## 1. Setup Environment

First, let's set up the environment variables for LangSmith tracing and API keys for required services.

In [1]:
import os
from dotenv import load_dotenv

# Set TOKENIZERS_PARALLELISM to False to avoid deadlocks and warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Load environment variables from .env file
load_dotenv()

# Setup LangSmith for tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "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"
os.environ["LANGCHAIN_PROJECT"] = "RAG Application"

# Verify OpenAI API key is available
assert os.getenv("GROQ_API_KEY"), "OpenAI API key not found in environment variables"

## 2. Install Required Libraries

Install all necessary libraries, including langchain, langgraph, and other dependencies.

In [4]:
# Uncomment and run this cell to install required packages
%pip install langchain langchain-openai langgraph chromadb beautifulsoup4 tiktoken

Note: you may need to restart the kernel to use updated packages.


## 3. Load and Chunk Data

We'll use WebBaseLoader to load blog post content and RecursiveCharacterTextSplitter to split it into manageable chunks.

In [None]:

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("./final_review.pdf")
pages = []
async for page in loader.alazy_load():
    pages.append(page)

Ignoring wrong pointing object 7 0 (offset 0)
Ignoring wrong pointing object 9 0 (offset 0)
Ignoring wrong pointing object 56 0 (offset 0)


## 4. Index Data

Embed the document chunks using an embeddings model and store them in a vector store.

In [157]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_groq import ChatGroq

# Initialize embeddings model
# all-mpnet-base-v2 produces 768-dimensional embeddings to match existing collection
embeddings = HuggingFaceEmbeddings(
    model_name="thenlper/gte-large",  # This model produces 768-dimensional vectors
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True, "batch_size": 16}
)

# Create vector store from documents
vector_store = InMemoryVectorStore.from_documents(pages, embeddings)

# text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=600)
# all_splits = text_splitter.split_documents(pages)

# Index chunks
# _ = vector_store.add_documents(documents=all_splits)

docs = vector_store.similarity_search("Problem statement", k=4)
for doc in docs:
    print(f'Page {doc.metadata["page"]}: {doc.page_content[0:]}\n')

# Create a retriever
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

TypeError: VectorStore.from_documents() takes 3 positional arguments but 4 were given

## 5. Define Application State

Define a TypedDict to represent the state of the application, including question, context, and answer.

In [144]:
from typing import TypedDict, List, Optional
from langchain_core.documents import Document

class RAGState(TypedDict):
    """State for the RAG application."""
    question: str
    retrieved_documents: Optional[List[Document]]
    context: Optional[str]
    answer: Optional[str]

## 6. Implement Retrieval and Generation Steps

Define the retrieval step to search for relevant documents and the generation step to produce answers using a chat model.

In [147]:
from langchain_core.prompts import ChatPromptTemplate

def retrieve(state: RAGState):
    """Retrieve relevant documents based on the question."""
    question = state["question"]
    retrieved_docs = retriever.invoke(question)
    sorted_results = sorted(
        retrieved_docs,
        key=lambda doc: doc.metadata.get('page', float('inf')) # Use .get() with a default for safety
    )
    context = "\n\n".join([doc.page_content for doc in sorted_results])
    
    return {
        "question": question,
        "retrieved_documents": retrieved_docs,
        "context": context,
        "answer": None
    }

def generate_answer(state: RAGState):
    """Generate an answer based on the question and retrieved context."""
    # Initialize Groq LLM
    llm = ChatGroq(
        model="meta-llama/llama-4-maverick-17b-128e-instruct",  # Using Llama 3.3 for better reasoning
        api_key=os.getenv("GROQ_API_KEY"),
        temperature=0.2  # Lower temperature for more factual responses
    )
    
    prompt_template = """
    You are a helpful AI assistant that answers questions based on the provided context.
    
    Context:
    {context}
    
    Question:
    {question}
    
    Answer the question based only on the provided context. Be concise, accurate, and helpful.
    If the answer cannot be determined from the context, say so.
    """
    
    prompt = ChatPromptTemplate.from_template(prompt_template)
    
    # Generate answer
    answer = llm.invoke(
        prompt.format(
            context=state["context"],
            question=state["question"]
        )
    ).content
    
    return {
        "question": state["question"],
        "retrieved_documents": state["retrieved_documents"],
        "context": state["context"],
        "answer": answer
    }

## 7. Build and Compile Application Graph

Use LangGraph to define the control flow of the application and compile it into a graph object.

In [150]:
from langgraph.graph import START, StateGraph

# Create a new graph
graph_builder = StateGraph(RAGState).add_sequence([retrieve, generate_answer])

graph_builder.add_edge(START, "retrieve")

# Compile the graph
rag_app = graph_builder.compile()

## 8. Test Application

Invoke the application with a sample question and display the retrieved context and generated answer.

In [155]:
# Test the RAG application with a sample question
question = "Processing Module"

# Initialize the state with the question
initial_state = {"question": question, "retrieved_documents": None, "context": None, "answer": None}

# Run the application
result = rag_app.invoke(initial_state)

print("Question:")
print(question)
print("\n" + "-"*50 + "\n")

print("Retrieved Context (sample):")
print(result["context"][0:] + "...\n")
print("-"*50 + "\n")

print("Generated Answer:")
print(result["answer"])

Question:
Processing Module

--------------------------------------------------

Retrieved Context (sample):
3. Domain : 
The domain of the Price Pulse project encompasses a broad spectrum of e-commerce, consumer 
technology, price optimization tools, and web application development, each contributing to the project's 
mission of simplifying the online shopping experience for users. 
E-commerce 
E-commerce is the backbone of online retail, where consumers access digital marketplaces to purchase goods 
and services. These platforms feature dynamic pricing, influenced by factors such as demand, competition, 
and promotions, which often result in fluctuating prices. In this environment, consumers face the challenge 
of monitoring prices across various platforms and taking advantage of the best deals in real-time. Price Pulse 
addresses this issue by focusing on real-time price tracking, allowing users to monitor product prices across 
multiple e-commerce sites. The solution helps users id

In [25]:
# Try another question
question = "What are the challenges in building reliable AI agents?"

# Initialize the state with the question
initial_state = {"question": question, "retrieved_documents": None, "context": None, "answer": None}

# Run the application
result = rag_app.invoke(initial_state)

print("Question:")
print(question)
print("\n" + "-"*50 + "\n")

print("Generated Answer:")
print(result["answer"])

Question:
What are the challenges in building reliable AI agents?

--------------------------------------------------

Generated Answer:
The challenges in building reliable AI agents include the reliability of the natural language interface, as LLMs may make formatting errors and exhibit rebellious behavior, such as refusing to follow instructions.


## Conclusion

In this notebook, we've built a complete RAG application that:
1. Loads and chunks document data from a web source
2. Creates embeddings and a vector store for semantic search
3. Implements a retrieval step to find relevant context
4. Generates answers using a LLM with the retrieved context
5. Structures the application flow using LangGraph

This pattern can be extended with additional features like:
- Adding a self-critique step to review and improve answers
- Implementing human feedback
- Adding memory for conversation history
- Supporting multiple data sources