# RAG (Retrieval Augmented Generation)

We will follow along with the [RAG Quickstart Guide](https://python.langchain.com/docs/use_cases/question_answering/quickstart/) from the LangChain documentation.

Building a RAG generally involves the following steps:

## Indexing

1. Load: Load the data
2. Split: Split the data into manageable chunks (helps it fit in the model's context window, ensures we don't include too much irrelevant context)
3. Store: Save the data in a searchable format (usually, we will generate embeddings and store them in a vector store)

## Retrieval and generation

1. Retrieve: given a user input, pull relevant data from our datastore
2. Generate: pass the relevant data to the model along with the user's question, generate a response

## Initial Setup

In [1]:
# Set up dependencies
%pip install --upgrade --quiet pip
%pip install --upgrade --quiet langchain langchain-community langchainhub langchain-openai langchain-chroma langchain-openai bs4

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Hard-coded for simplicity
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

In [3]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

## 1. Speed Run

In [4]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [5]:
# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [6]:
rag_chain.invoke("What is Task Decomposition?")

'Task Decomposition is a technique that involves breaking down complex tasks into smaller, more manageable steps. This allows agents or models to tackle difficult tasks by focusing on individual components. Task decomposition can be achieved through prompting techniques like Chain of Thought and Tree of Thoughts, as well as task-specific instructions or human inputs.'

In [7]:
# cleanup
vectorstore.delete_collection()

## 2. Detailed Walkthrough

In [8]:
# Load blog content from Lilian Wang's GitHub site
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

In [9]:
len(docs[0].page_content)

43131

In [10]:
print(docs[0].page_content[:500])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In


In [11]:
# - chunk-size breaks text into smaller chunks which will fit in a model's context window
# - chunk_overlap limits the chances of accidentally cutting off a statement from related context
# - add_start_index makes sure the starting point of a chunk (in the original document) is preserved as a metadata attribute
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

In [12]:
len(all_splits)

66

In [13]:
len(all_splits[0].page_content)

969

In [14]:
all_splits[10].metadata

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'start_index': 7056}

In [15]:
# Generate embeddings for each chunk and store in a vector store (Chroma)
# This way, we can search for chunks which are most similar to the user's input, and use those
# as context when responding to the user
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

## Rerieval

In [16]:
# Use the LangChain *Retriever* class to find relevant documents in our vectorstore
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [17]:
# Retrieve some docs, count them, and display an example (page content from doc 0)
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")
print(f'== Number of docs retrieved: {len(retrieved_docs)} ==')
print('== Content of first doc ==')
print(retrieved_docs[0].page_content)

== Number of docs retrieved: 6 ==
== Content of first doc ==
Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.


## Generation

In [18]:
# Pull a RAG prompt template from the LangChain hub
prompt = hub.pull("rlm/rag-prompt")

In [19]:
# Generate example messages (generic)
example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()
example_messages

[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:")]

In [20]:
# Examine the example message's content
print(example_messages[0].content)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:


In [21]:
# Build the RAG chain (gives us streaming, async, and batched calling out of the box, according to the LangChain docs)
# Note: code is a repeat of code earlier in the notebook (from the speedrun)
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [22]:
# Ok, now call the chain
for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task decomposition is a process of breaking down complex tasks into smaller, more manageable steps. This technique allows models to think step by step and utilize computation more efficiently. It can be done through simple prompting, task-specific instructions, or with human inputs.

In [23]:
# We do not have to use the prompt from the LangChain Hub, we can customize
from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

'Task decomposition is a technique where complex tasks are broken down into smaller and simpler steps to be completed more easily. This process involves transforming big tasks into manageable tasks through step-by-step thinking or exploration of multiple reasoning possibilities. Task decomposition can be done through simple prompting, task-specific instructions, or human inputs. Thanks for asking!'