# Building a RAG System: Step by Step Tutorial

This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) system using LlamaIndex. We'll break down each component to understand how RAG works.

## What is RAG?
RAG (Retrieval Augmented Generation) is a technique that enhances Large Language Models by:
1. Storing domain-specific knowledge in a vector database
2. Retrieving relevant information when needed
3. Using that information to generate more accurate and contextual responses

Let's build one step by step!

## Step 1: Setup and Installation

First, let's install the required packages:

In [1]:
!pip install llama-index-core google-ai-generativelanguage llama-index-readers-file llama-index-llms-langchain llama-index-embeddings-langchain langchain_community langchain-google-genai python-dotenv requests



## Step 2: Configure Environment

You'll need an OpenAI API key to proceed. Let's set up our environment:

In [2]:
import os
from getpass import getpass

# Safely input your API key
# See https://ai.google.dev/gemini-api/docs/api-key

os.environ["GOOGLE_API_KEY"] = getpass("Enter your Gemini API key: ")

## Step 3: Initialize Settings

Let's set up our LLM and embedding models. We'll use OpenAI's models:

In [3]:
from llama_index.core import Settings
from langchain_google_genai import ChatGoogleGenerativeAI
from llama_index.llms.langchain import LangChainLLM
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from llama_index.embeddings.langchain import LangchainEmbedding
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# Initialize settings with specific models
# See https://ai.google.dev/gemini-api/docs/models#text-embedding
Settings.llm = LangChainLLM(ChatGoogleGenerativeAI(model="gemini-2.0-flash"))
Settings.embed_model = LangchainEmbedding(GoogleGenerativeAIEmbeddings(model="models/text-embedding-004"))

# Let's see what an embedding look like
info = "Settings initialized!"
print(Settings.embed_model.get_text_embedding(info))
print(info)

[0.01196148619055748, -0.01046975888311863, -0.06205957755446434, -0.022057529538869858, 0.02292407676577568, 0.005410132464021444, 0.03177236020565033, 0.015446889214217663, -0.026118792593479156, 0.0399305559694767, -0.007852270267903805, 0.03176525607705116, 0.06875719130039215, -0.0050914837047457695, -0.03309789299964905, -0.024360012263059616, 0.021406004205346107, 0.004017258062958717, -0.09305911511182785, 0.025644810870289803, 0.014212378300726414, -0.0319548100233078, 0.016463130712509155, 0.0061324238777160645, -0.03154753893613815, -0.029837684705853462, -0.009131282567977905, -0.029029587283730507, -9.827438771026209e-05, -0.011392872780561447, 0.05288812518119812, 0.06295164674520493, 0.010872931219637394, -0.05070597305893898, 0.03317609429359436, 0.006293145008385181, -0.0031746113672852516, 0.016111960634589195, 0.03397845849394798, -0.049869079142808914, -0.10596867650747299, 0.021181650459766388, 0.002663138322532177, 0.04371045157313347, -0.01763414405286312, -0.012

## Step 4: Create Sample Documents

Let's create some sample documents to demonstrate RAG. We'll create a simple directory with text files:

In [4]:
import os
import requests

# Create a data directory
os.makedirs("data", exist_ok=True)
# Download Paul Graham's essay
worked_on_url = "https://gist.githubusercontent.com/gardner/8181cd9c74dcf310b6e440e3bc01c2ff/raw/f739fa60ff00eca8a3a15632e0c06b054c97c7c1/what_i_worked_on.txt"
worked_on_path = "data/what_i_worked_on.txt"

response = requests.get(worked_on_url)
with open(worked_on_path, "w") as f:
    f.write(response.text)

pdf_url = "https://www.mpi.govt.nz/dmsdocument/68256-Proposals-to-Amend-the-New-Zealand-Food-Notice-Maximum-Residue-Levels-for-Agricultural-Compounds"
pdf_path = "data/nz_food_notice.pdf"

# Download the PDF
response = requests.get(pdf_url)
with open(pdf_path, "wb") as f:
    f.write(response.content)


print("Sample documents created!")

Sample documents created!


## Step 5: Create and Store Embeddings

Now, let's create embeddings for our documents and store them in a vector index:

In [5]:
from llama_index.core.indices import VectorStoreIndex
from llama_index.core.readers import SimpleDirectoryReader

# Load documents
# See https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/
reader = SimpleDirectoryReader("data")
documents = reader.load_data()

# Create the index
index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True
)

# Store the index
index.storage_context.persist("storage")

print("Index created and stored!")



invalid pdf header: b'<!DOC'




EOF marker not found




invalid pdf header: b'<!DOC'




EOF marker not found




invalid pdf header: b'<!DOC'




EOF marker not found
Failed to load file /content/data/nz_food_notice.pdf with error: RetryError[<Future at 0x7ccba71e3f10 state=finished raised PdfStreamError>]. Skipping...


Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/21 [00:00<?, ?it/s]

Index created and stored!


## Step 6: Query the RAG System

Let's try querying our RAG system to see how it retrieves and generates responses:

In [None]:
# Create a query engine
query_engine = index.as_query_engine()

# Try some queries
questions = [
    "What did the author work on?",
    "What was the most recent thing the author worked on?",
]

for question in questions:
    print(f"\nQuestion: {question}")
    response = query_engine.query(question)
    print(f"Answer: {response}")

## Understanding What Just Happened

Let's break down the RAG process we just implemented:

1. **Document Loading**: We loaded text documents using SimpleDirectoryReader
2. **Embedding Creation**: Each document was converted into embeddings using OpenAI's embedding model
3. **Vector Storage**: The embeddings were stored in a vector index for efficient retrieval
4. **Query Processing**: When we ask a question:
   - The question is converted to an embedding
   - Similar documents are retrieved from the vector store
   - The LLM uses the retrieved context to generate an answer

This is the essence of RAG - combining retrieval with generation to provide accurate, contextual responses!