# RAG Tutorial

This tutorial is modified from Krish Naik's YouTube [video](https://www.youtube.com/watch?v=hH4WkgILUD4) for Harvard's GenEd 1188 on Generative AI. 

The purpose of this tutorial is for designing a potential technical section activity for future students of this class.

### Import Libraries

In [33]:
# Notes: LlamaIndex is constantly developing. They seem to change the structure of the library quite often.
#        If the imports are not successful, please google and find the newest structure to import the
#        necessary libraries.

import os
from dotenv import load_dotenv

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.response.pprint_utils import pprint_response
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.indices.postprocessor import SimilarityPostprocessor

### Set Up


In [2]:
# check if the environment is loaded
load_dotenv()

True

In [3]:
# recall that we stored our OpenAI API key in the .env file and named it "OPENAI_API_KEY"
os.environ['OPENAI_API_KEY']=os.getenv("OPENAI_API_KEY")

### Process Our PDFs

In [11]:
# create the metadata of our documents
documents = SimpleDirectoryReader("data").load_data()

In [16]:
# convert the documents to an index
# from this index, we can directly query any questions we have
index=VectorStoreIndex.from_documents(documents, show_progress=True)

Parsing nodes:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/37 [00:00<?, ?it/s]

### Create A Query Engine to Retrieve Information

In [18]:
# the query_engine is responsible for retrieving information from the indexes
query_engine=index.as_query_engine()

In [28]:
response1=query_engine.query("What is a transformer")
print(response1)

The Transformer is a model architecture that relies entirely on an attention mechanism to draw global dependencies between input and output. It eschews recurrence and instead uses self-attention to compute representations of its input and output without relying on sequence-aligned recurrent neural networks or convolutional layers.


In [29]:
response2=query_engine.query("What is YOLO")
print(response2)

YOLO is a new approach to object detection that frames the task as a regression problem to spatially separated bounding boxes and associated class probabilities. It uses a single neural network to predict bounding boxes and class probabilities directly from full images in one evaluation. YOLO is known for its speed, processing images in real-time and achieving high mean average precision compared to other real-time detection systems.


In [31]:
# use `pprint_response` for much better presentation of the results
# `pprint_response` is powerful because it will show you the best response along with similarity scores with the source
pprint_response(response1, show_source=True)
print("====================")
pprint_response(response2, show_source=True)

Final Response: The Transformer is a model architecture that relies
entirely on an attention mechanism to draw global dependencies between
input and output. It eschews recurrence and instead uses self-
attention to compute representations of its input and output without
relying on sequence-aligned recurrent neural networks or convolutional
layers.
______________________________________________________________________
Source Node 1/2
Node ID: 2e2aac14-82ff-4f91-8a8f-2c499a8379b1
Similarity: 0.7885361023202706
Text: Table 4: The Transformer generalizes well to English
constituency parsing (Results are on Section 23 of WSJ) Parser
Training WSJ 23 F1 Vinyals & Kaiser el al. (2014) [37] WSJ only,
discriminative 88.3 Petrov et al. (2006) [29] WSJ only, discriminative
90.4 Zhu et al. (2013) [40] WSJ only, discriminative 90.4 Dyer et al.
(2016) [8] WSJ only, disc...
______________________________________________________________________
Source Node 2/2
Node ID: 200522ec-d3f8-425f-badc-156f381c6

### More Advanced Usage

In [35]:
# we can also retrieve more than 1 results by setting the number of k
retriever=VectorIndexRetriever(index=index, similarity_top_k=4)
query_engine_advanced=RetrieverQueryEngine(retriever=retriever)

In [36]:
response1_advanced = query_engine_advanced.query("What is a transformer")
pprint_response(response1_advanced, show_source=True)

Final Response: A transformer is a model architecture that relies
entirely on attention mechanisms to draw global dependencies between
input and output, without using recurrent layers or convolutions. It
allows for more parallelization and has been shown to achieve state-
of-the-art results in tasks like machine translation.
______________________________________________________________________
Source Node 1/4
Node ID: 2e2aac14-82ff-4f91-8a8f-2c499a8379b1
Similarity: 0.7885361023202706
Text: Table 4: The Transformer generalizes well to English
constituency parsing (Results are on Section 23 of WSJ) Parser
Training WSJ 23 F1 Vinyals & Kaiser el al. (2014) [37] WSJ only,
discriminative 88.3 Petrov et al. (2006) [29] WSJ only, discriminative
90.4 Zhu et al. (2013) [40] WSJ only, discriminative 90.4 Dyer et al.
(2016) [8] WSJ only, disc...
______________________________________________________________________
Source Node 2/4
Node ID: 200522ec-d3f8-425f-badc-156f381c6227
Similarity: 0.78378

In [37]:
response2_advanced = query_engine_advanced.query("What is YOLO")
pprint_response(response2_advanced, show_source=True)

Final Response: YOLO is a new approach to object detection that frames
the task as a regression problem to spatially separated bounding boxes
and associated class probabilities. It uses a single neural network to
predict bounding boxes and class probabilities directly from full
images in one evaluation. YOLO is known for its speed, processing
images in real-time at high frame rates while achieving good mean
average precision compared to other real-time detectors. Additionally,
YOLO is designed to reason globally about the image and can be trained
directly on full images, offering a simple and unified model for
object detection.
______________________________________________________________________
Source Node 1/4
Node ID: dc7c7f93-ed1b-45c6-be03-0d3056cf24d6
Similarity: 0.8137150587504083
Text: You Only Look Once: Uniﬁed, Real-Time Object Detection Joseph
Redmon∗, Santosh Divvala∗†, Ross Girshick¶, Ali Farhadi∗† University
of Washington∗, Allen Institute for AI†, Facebook AI Research¶


In [43]:
# we can also filter responses to display by similarity scores (e.g. only show responses with similarity score > 0.8)
postprocessor=SimilarityPostprocessor(similarity_cutoff=0.8)
query_engine_advanced2=RetrieverQueryEngine(retriever=retriever,
                                            node_postprocessors=[postprocessor])

In [44]:
response1_advanced2 = query_engine_advanced2.query("What is a transformer")
pprint_response(response1_advanced2, show_source=True)

Final Response: Empty Response


In [45]:
response2_advanced2 = query_engine_advanced2.query("What is YOLO")
pprint_response(response2_advanced2, show_source=True)

Final Response: YOLO is a new approach to object detection that frames
the task as a regression problem to spatially separated bounding boxes
and associated class probabilities. It uses a single neural network to
predict bounding boxes and class probabilities directly from full
images in one evaluation, optimizing end-to-end directly on detection
performance. YOLO is known for its speed, processing images in real-
time and achieving high mean average precision compared to other real-
time systems.
______________________________________________________________________
Source Node 1/3
Node ID: dc7c7f93-ed1b-45c6-be03-0d3056cf24d6
Similarity: 0.8137150587504083
Text: You Only Look Once: Uniﬁed, Real-Time Object Detection Joseph
Redmon∗, Santosh Divvala∗†, Ross Girshick¶, Ali Farhadi∗† University
of Washington∗, Allen Institute for AI†, Facebook AI Research¶
http://pjreddie.com/yolo/ Abstract We present YOLO, a new approach to
object detection. Prior work on object detection repurposes cla

### Store Results for Future Use

After you've run the following cell, you should see a new folder called `storage` which contains the index results!

In [47]:
import os.path
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

# check if storage already exists
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What are transformers?")
print(response)

Transformers are a model architecture that relies entirely on an attention mechanism to establish global dependencies between input and output, without utilizing recurrent layers commonly found in encoder-decoder structures. This approach allows for increased parallelization during training and has shown significant improvements in translation quality, surpassing previous state-of-the-art models in certain tasks.


### Conclusion

We just finished building a basic RAG system! To be honest, it is very similar to what ChatGPT4 can achieve today. But having LlamaIndex and knowing how RAG works is helpful as they show you how people are trying to solve the problems around LLMs and provide you with a framework for solving these kinds of problems!