# RAG Tutorial

Guest lecture for DS 3891 Generative AI

*By Myranda Uselton Shirk, Senior Data Scientist, Data Science Institute*

## What is RAG?

RAG (Retrieval Augmented Generation) is a method of interacting with a large language model (LLM) by providing the model with a corpus of documents. In code, this process has two main steps (each listed here with their own sub-steps): 

**1. Retrieve information relevant to query**

    a) Load the documents

    b) Split/Chunk text

    c) Embed each split

    d) Semantic Search over text embeddings

**2. Generate a response based on the retrieved information**

    a) Give prompt + retrieved information to model

    b) Generate response

This tutorial will walk through how to implement RAG using the Python library [Langchain](https://python.langchain.com/docs/get_started/introduction)

## Libraries

### Google Colab 

If you are using this notebook in Google Colab, uncomment and run this code cell:

In [None]:
#!pip install -q langchain langchain-community langchain-core langchain-openai getpass

In [1]:
# import libraries

from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader, PyPDFLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter

### Set up OpenAI API Key

You will need an OpenAI API key to run this notebook, unless you change the model to inference. If you have an OpenAI API key, run this cell and enter it when prompted.

In [2]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()


········


## Walkthrough: Text Files

First we will walk through the simplest case for information storage: a text file. The text file "state_of_the_union.txt" should be in your working directory (if it is not, find it in the GitHub repo and move it to where you can access it).

### Load Files

The first step in RAG is to load our information corpus - in this case, "state_of_the_union.txt", which contains the state of the union address given by President Biden.

Langchain has many different types of document loaders. For text files we can use `TextLoader`.

In [3]:
# Load a text file

text_loader = TextLoader("./state_of_the_union.txt")
docs = text_loader.load()

In [None]:
# view the docs
docs

### Split Text

We need to split our text into individual chunks of information. There are several ways to do this. We will start with the simplest one, `CharacterTextSplitter`, which splits our text by number of characters. Then, we will embed our data using the OpenAI Embeddings, and store all of that into a database (we're using [FAISS database](https://faiss.ai/)).

In [4]:
#chunk/split text
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(docs)

# save vector embeddings in a database
db = FAISS.from_documents(documents, OpenAIEmbeddings())

### Retreive information

Next, we need to set up a retriever to retrieve relevant information for us. Again, there are several different methods (see the Resources tab in the README for those), but we will be using a vector similarity search. This will compare our query to each chunk of text and pull out the most relevant one(s). 

The below cell shows how you can search across a database. Then, we set up our retriever in the next cell.

In [5]:
# find relevant documents
query = "What did the president say about Ketanji Brown Jackson"
relevant_docs = db.similarity_search(query)
print(relevant_docs[0].page_content)

# Note this is NOT querying a chat model - only finding the relevant info

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


In [None]:
# set up retriever

retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 6})

### Prompting the LLM

Now that we have set up our information corpus and a retriever, we can set up a prompt template for our model to use. This template allows us to organize our instructions to the model and insert the retrieved information and the query in the appropriate place.

In [15]:
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)
custom_rag_prompt

PromptTemplate(input_variables=['context', 'question'], template='Use the following pieces of context to answer the question at the end.\nIf you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\nUse three sentences maximum and keep the answer as concise as possible.\nAlways say "thanks for asking!" at the end of the answer.\n\n{context}\n\nQuestion: {question}\n\nHelpful Answer:')

Finally, we set up our RAG chain and pass in our model, retriever, prompt template, and query.

In [None]:
# helper function to format our documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [None]:
# load model
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [None]:
# set up RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke(query)

## Extension: QA over a PDF

Most of the time, our text corpus may not be in .txt files. Here is a tutorial on creating a corpus from a PDF. In this case, we use the `PyPDFLoader`. You might recognize the paper I am using for our data!

In [19]:
# Load a PDF

pdf_loader = PyPDFLoader("./prompt_engineering.pdf")
pdf_pages = pdf_loader.load_and_split()

In [20]:
# retrieve info from documents

faiss_index = FAISS.from_documents(pdf_pages, OpenAIEmbeddings())
docs = faiss_index.similarity_search("What are the different categories of prompts?", k=2)
for doc in docs:
    print(str(doc.metadata["page"]) + ":", doc.page_content[:300])

3: TABLE I
CLASSIFYING PROMPT PATTERNS
Pattern Category Prompt Pattern
Input Semantics Meta Language Creation
Output Output Automater
Customization Persona
Visualization Generator
Recipe
Template
Error Identiﬁcation Fact Check List
Reﬂection
Prompt Question Reﬁnement
Improvement Alternative Approaches

2: textual statements approach is that it is intentionally int uitive
to users. In particular, we expect users will understand how to
express and adapt the statements in a contextually appropri ate
way for their domain. Moreover, since the underlying ideas o f
the prompt are captured, these same ideas 


In [21]:
query = "What are the different categories of prompts?"

retriever = faiss_index.as_retriever(search_type="similarity", search_kwargs={"k": 6})

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke(query)

'The different categories of prompts are Input Semantics, Output Customization, Error Identification, Prompt Improvement, Interaction, and Context Control. Each category focuses on different aspects of prompt patterns in the context of conversational LLMs. Thanks for asking!'

## Playground

Now that you know how to use RAG, try it out on your own documents below. Experiment with different document loaders, load multiple documents, or change the vector store/prompt/model you use. 