# [RAG - Part 1](https://python.langchain.com/docs/tutorials/rag/)

* `Part 1` : **simple Q&A application** over **unstructured data**
* `Part 2` : extends to accommodate **conversation-style interactions** & **multi-step retrieval** processes
  * typical Q&A architecture
  * additional resources for more advanced Q&A techniques

In [None]:
###############
### ~Setup~ ###
###############

import os
from dotenv import load_dotenv

# Load secrets from file
with open('secrets.txt') as f:
    for line in f:
        if '=' in line:
            key, value = line.strip().split('=', 1)
            os.environ[key] = value

# Initialize LangChain
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

RAG has 2 parts:

1. Indexing: pipeline for ingesting data from source & indexing it (usually happens offline)

2. Retrieval & generation: actual **RAG chain**: 
   * takes user query at run time
   * retrieves  relevant data from index 
   * passes that to `model`

> ℹ **Note**: indexing portion of this tutorial will largely follow the `3_retreivers.ipynb`

Most common full sequence from raw data to answer:

#### 1️⃣ Indexing
---
1. **Load**: Done with [Document Loaders](https://python.langchain.com/docs/concepts/document_loaders/).
2. **Split**: [Text splitters](https://python.langchain.com/docs/concepts/text_splitters/) break large Documents into smaller chunks. Useful both for indexing data & passing it into model: **large chunks are harder to search over & won't fit in a model's finite context window**
3. **Store**: Need somewhere to store & index our splits, so that they can be searched over later. Often done usin [VectorStore](https://python.langchain.com/docs/concepts/vectorstores/) & [Embeddings models](https://python.langchain.com/docs/concepts/embedding_models/).

![Image URL](https://python.langchain.com/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png)

#### 2️⃣ Retreival & Generation
---
4. Retrieve: Given a user input, relevant splits are retrieved from storage using a [Retriever](https://python.langchain.com/docs/concepts/retrievers/)
5. Generate: A [ChatModel](https://python.langchain.com/docs/concepts/chat_models/) / [LLM](https://python.langchain.com/docs/concepts/text_llms/) **produces an answer** using prompt that includes **both the question with retrieved data**
![Image URL](https://python.langchain.com/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png)

> Once *data indexed*, we will use `LangGraph` as **orchestration framework** to RAG steps.

> #### `InMemoryVectorStore`

In [2]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

## Preview

Build app that answers questions about the website's content
Specific website: [**LLM Powered Autonomous Agents blog post by Lilian Weng**](https://lilianweng.github.io/posts/2023-06-23-agent/) → ask questions about the contents

> **Simple indexing pipeline & RAG chain in ~50 lines of code**

> **API Reference**: [`hub`]() | [`WebBaseLoader`]() | [`Document`]() | [`RecursiveCharacterTextSplitter`]() | [`StateGraph`]()

In [3]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200)

all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# Define application steps


def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke(
        {"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

Task Decomposition is the process of breaking down a complicated task into smaller, manageable steps to facilitate better planning and execution. Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) are used to enhance model performance by encouraging step-by-step reasoning. This can be achieved through simple prompting, task-specific instructions, or human inputs.


## Detailed walkthrough

#### Indexing

> Abbreviated version of `3_retreiver.ipynb`

#### Loading documents
* Load blog post contents
* [`DocumentLoaders`]() are objects that load in data from source returing list of [`Document`]() objects
  * In this example, [`WebBaseLoader`]() uses `urllib` to load HTML from web URL & `BeautifulSoup` to parse it to text
  * Customize HTML parsing in param in BS `BeautifulSoup` parser `bs_kwargs`
  * In this example, only HTML tags with class `“post-content”`, `“post-title”`, or `“post-header”` relevant, so we’ll remove all others

> **API Ref** [`WebBaseLoader`]()

In [5]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(
    class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

assert len(docs) == 1
print(f"Total characters: {len(docs[0].page_content)}")

Total characters: 43130


In [6]:
print(docs[0].page_content[:1000])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:

Planning

Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.


Memory

Short-term memory: I 

`DocumentLoader`: Object that loads data from a source as list of Documents.

* [Docs](https://python.langchain.com/docs/how_to/#document-loaders): Detailed documentation on how to use DocumentLoaders.
* [Integrations](https://python.langchain.com/docs/integrations/document_loaders/): 160+ integrations to choose from.
* [Interface](https://python.langchain.com/docs/tutorials/rag/#:~:text=160%2B%20integrations%20to%20choose%20from.-,Interface,-%3A%20API%20reference%20for%20the%20base): API reference for the base interface.

### Splitting documents

* Loaded document > 42k characters → too long for context window
* Even models that fit full post in context window, **models struggle to find information in long inputs**
* →  split `Document` into chunks for embedding & vector storage
  * Should help retreive only most relevant parts of blog post at run time

As in `3_retreivers.ipynb` use `RecursiveCharacterTextSplitter` 
    * --> Recursively split document using common separators (new lines etc) until each chunk is  appropriate size (recommended text splitter for generic text use cases)

In [7]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

print(f"Split blog post into {len(all_splits)} sub-documents.")

Split blog post into 66 sub-documents.


Go deeper
`TextSplitter`: Object that splits list of `Documents` into smaller chunks. Subclass of `DocumentTransformers`.

Learn more about splitting text using different methods by reading the how-to docs
* [Code (`py` or `js`)](https://python.langchain.com/docs/integrations/document_loaders/source_code/)
* [Scientific papers](https://python.langchain.com/docs/integrations/document_loaders/grobid/)
* [Interface](https://python.langchain.com/api_reference/text_splitters/base/langchain_text_splitters.base.TextSplitter.html): API reference for the base interface

`DocumentTransformer`: Object that performs a **transformation on a list of Document objects**.

  * [Docs](https://python.langchain.com/docs/how_to/#text-splitters): Detailed documentation on how to use `DocumentTransformers`
  * [Integrations](https://python.langchain.com/docs/integrations/document_transformers/)
  * [Interface](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.transformers.BaseDocumentTransformer.html): API reference for the base interface.