<a href="https://colab.research.google.com/github/micah-shull/RAG-LangChain/blob/main/LC_19_RAG_Langsmith.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## ✅ **`%pip` vs. `%%pip` vs. `!pip`**

1️⃣ **`!pip install ...`**

* This runs `pip` in a **shell subprocess** (like typing in a terminal).
* In Colab/Jupyter, this can sometimes install packages into a different Python environment than the one the notebook kernel uses — especially if multiple Python versions or virtual environments are involved.
* You might install a package but then `import` still fails → annoying!

2️⃣ **`%pip install ...`**

* `%pip` is an **IPython magic command** (single-line).
* It ensures that the package is installed in **the same Python environment as your notebook kernel**.
* It’s the recommended way for pip installs in Jupyter/Colab.

✅ Use `%pip` instead of `!pip` to avoid “module not found” surprises.

3️⃣ **`%%pip install ...`**

* `%%pip` is the **cell magic** version of `%pip`.
* It works exactly the same way but applies to the **entire cell**.
* Useful if you have multiple install lines in one cell.

---

## ⚡ **Example**

```python
# Good practice in Google Colab
%pip install langchain-openai langsmith

# Or, using cell magic:
%%pip install langchain-openai langsmith
```

---

## 🗂️ **What about `%%capture`?**

You wrote:

```python
%%capture --no-stderr
%pip install ...
```

* `%%capture` is another IPython cell magic that **captures stdout and stderr** (so you don’t see all the pip output).
* `--no-stderr` means it won’t capture errors → you’ll still see them if pip fails.

---

## ✅ **Best practice for Colab**

* Use `%pip` or `%%pip` instead of `!pip` for Python packages.
* Use `%%capture` if you want to hide noisy output.
* Always restart your runtime (`Runtime → Restart runtime`) after major installs, if needed.



In [5]:
# %%capture --no-stderr
# %pip install langsmith langchain-openai langchain-core langchain-community pydantic python-dotenv openai langgraph
# %pip install --upgrade langsmith

In [1]:
%pip install -Uq langsmith langchain-openai langchain-core langchain-community pydantic python-dotenv openai langgraph

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.4/70.4 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m441.4/441.4 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m63.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m755.1/755.1 kB[0m [31m36.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.7/143.7 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.1/50.1 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Environ Setup

In [3]:
# --- 1) Imports ---
import os
from dotenv import load_dotenv

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

# --- 2) Load environment variables ---
load_dotenv("/content/API_KEYS.env", override=True)

# Confirm keys
print("OPENAI_API_KEY:", os.getenv("OPENAI_API_KEY"))
print("LANGCHAIN_API_KEY:", os.getenv("LANGCHAIN_API_KEY"))
print("LANGCHAIN_PROJECT:", os.getenv("LANGCHAIN_PROJECT"))
print("LANGCHAIN_TRACING_V2:", os.getenv("LANGCHAIN_TRACING_V2"))

# Alternatively set project name for this run outside .env file
os.environ["LANGCHAIN_PROJECT"] = "project_00"
os.environ["USER_AGENT"] = "MyRAGApp/0.1"

OPENAI_API_KEY: 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
LANGCHAIN_API_KEY: None
LANGCHAIN_PROJECT: my_project_name
LANGCHAIN_TRACING_V2: true


## Model Selection

In [4]:
from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

## Embeddings

✅ Perfect — you’re doing it exactly right for embeddings!
Let me clarify how this works so you know you’re good to go:

---

## ✅ **Using `OpenAIEmbeddings`**

```python
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
```

**What’s happening?**

* `OpenAIEmbeddings` is the LangChain wrapper that calls OpenAI’s embedding endpoint.
* `model="text-embedding-3-large"` is the recommended newer embedding model (high quality).
* Your `OPENAI_API_KEY` must be set (which you have in your `.env`).

---

## 📌 **How it works**

When you call:

```python
vectorstore = FAISS.from_documents(docs, embeddings)
```

or:

```python
embeddings.embed_query("My question?")
```

LangChain calls OpenAI’s embedding endpoint, gets the vector, and returns it to your retriever or vector store.


In [5]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# ⚡️ Tip: inspect embedding
vector = embeddings.embed_query("How does LangSmith tracing work?")
print(len(vector))  # Should match the embedding dimension

3072


## Vector Store

In [6]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

In [7]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
# N.B. for non-US LangSmith endpoints, you may need to specify
# api_url="https://api.smith.langchain.com" in hub.pull.
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

## Code Breakdown

Let’s break this block down step-by-step so you really see **what’s happening and why** — because this is a **perfect small RAG + LangGraph example** and there’s a lot to learn here! 🚀

---

## ✅ **🔹 1) Imports**

```python
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
```

* `bs4` → BeautifulSoup: you’ll use it to parse the HTML blog post.
* `hub` → LangChain Hub: stores reusable prompts/templates.
* `WebBaseLoader` → loads text from a webpage.
* `Document` → LangChain’s document wrapper (metadata, page content).
* `RecursiveCharacterTextSplitter` → splits long docs into chunks.
* `StateGraph` → the new LCEL-based graph execution framework.
* `TypedDict` → defines your app state type for type safety.

---

## ✅ **🔹 2) Load and parse blog post**

```python
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()
```

👉 This:

* Loads **Lilian Weng’s “Agent” blog post**.
* Uses `bs4.SoupStrainer` to keep only useful HTML parts (post content, title, header) → more focused chunks.
* Wraps the text as `Document` objects.

✅ **Why?** You don’t want your embeddings polluted by navbars, footers, or ads!

---

## ✅ **🔹 3) Split into chunks**

```python
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
```

👉 This:

* Breaks long text into smaller, overlapping chunks (RAG best practice).
* `chunk_size=1000`: each chunk \~1000 characters.
* `chunk_overlap=200`: 200 chars overlap between chunks to preserve context.

✅ **Why?** Smaller chunks = better similarity search and fewer tokens sent to LLM.

---

## ✅ **🔹 4) Index chunks**

```python
_ = vector_store.add_documents(documents=all_splits)
```

👉 This:

* Adds the chunks to your vector store (`InMemoryVectorStore` or `FAISS`).
* Each chunk is embedded and stored for similarity search later.

✅ **Why?** This builds your local knowledge base to support retrieval.

---

## ✅ **🔹 5) Pull RAG prompt from LangChain Hub**

```python
prompt = hub.pull("rlm/rag-prompt")
```

👉 This:

* Loads a reusable RAG-style prompt from the LangChain Hub.
* The prompt will look like:

  ```
  Use the following context to answer the question.
  Question: {question}
  Context: {context}
  Answer:
  ```

✅ **Why?** Standardizes your RAG behavior without hardcoding your own prompt.

---

## ✅ **🔹 6) Define app state**

```python
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str
```

👉 This:

* Defines the **input, intermediate, and output state** for your graph.
* `question` → the user’s query.
* `context` → retrieved docs.
* `answer` → final LLM output.

✅ **Why?** `StateGraph` needs a clear state schema to pass data between steps.

---

## ✅ **🔹 7) Define steps**

### `retrieve`

```python
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}
```

👉 This:

* Takes the question.
* Uses similarity search to find relevant chunks.
* Returns them as `context` for the next step.

✅ **Why?** Classic RAG step: “given a question, find matching chunks.”

---

### `generate`

```python
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}
```

👉 This:

* Joins retrieved chunks into a single context block.
* Calls the prompt to format the input for the LLM.
* Calls the LLM to generate the answer.
* Returns the answer as your final state.

✅ **Why?** This is the generation half of RAG: “use the retrieved context to produce a grounded answer.”

---

## ✅ **🔹 8) Build & compile your `StateGraph`**

```python
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
```

👉 This:

* Creates a graph where:

  * **Start** → `retrieve` → `generate`
* `StateGraph` wires up your pipeline with a clear flow.

✅ **Why?** This modular graph pattern is reusable, composable, and debuggable — much better than messy nested calls!

---

## ✅ **🔹 9) Next step: Run it**

After this block, you run:

```python
result = graph.invoke({"question": "What are ReAct agents?"})
print(result)
```

* This runs `retrieve` → `generate` → gives you an answer.
* ✅ And because your env vars are set, the entire run logs to **LangSmith**:

  * Question
  * Retrieved docs
  * LLM output
  * Token usage

---

## ✅ **📌 What this block gives you**

✔️ A **clean RAG pipeline** with:

* **Data loading**
* **Chunking**
* **Embeddings + vector store**
* **Retrieval**
* **Prompt templating**
* **LLM generation**
* **Modular graph orchestration**

✔️ **Full observability** via LangSmith (if your API key & project are correct).

---

## 🎉 **Why this is powerful**

* You can swap the retriever, LLM, or prompt independently.
* You can add more steps — like evals, re-ranking, or feedback loops.
* You can see the entire flow in your LangSmith dashboard to debug or tune.




### Soup Strainer

In [14]:
# Inspect raw HTML (not using SoupStrainer)
import requests
html = requests.get("https://lilianweng.github.io/posts/2023-06-23-agent/").text

print(html[:1000])  # Show first 1000 chars

<!DOCTYPE html>
<html lang="en" dir="auto">

<head><meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="robots" content="index, follow">
<title>LLM Powered Autonomous Agents | Lil&#39;Log</title>
<meta name="keywords" content="nlp, language-model, agent, steerability, prompting" />
<meta name="description" content="Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview
In a LLM-powered autonomous agent system, LLM functions as the agent&rsquo;s brain, complemented by several key components:

Planning

Subgoal and decomposition: The agent breaks down lar

In [13]:
print(len(docs))
print(docs[0].page_content[:500])  # show a snippet
print(docs[0].metadata)

1


      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In
{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}


In [18]:
from bs4 import BeautifulSoup, SoupStrainer

# Define what you want to extract:
only_useful = SoupStrainer(class_=("post-content", "post-title", "post-header"))

# Parse HTML, filtering at parse time:
soup = BeautifulSoup(html, "html.parser", parse_only=only_useful)

# Check what you got:
print(soup.prettify()[:1000])

<header class="post-header">
 <h1 class="post-title">
  LLM Powered Autonomous Agents
 </h1>
 <div class="post-meta">
  Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng
 </div>
</header>
<div class="post-content">
 <p>
  Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as
  <a href="https://github.com/Significant-Gravitas/Auto-GPT">
   AutoGPT
  </a>
  ,
  <a href="https://github.com/AntonOsika/gpt-engineer">
   GPT-Engineer
  </a>
  and
  <a href="https://github.com/yoheinakajima/babyagi">
   BabyAGI
  </a>
  , serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
 </p>
 <h1 id="agent-system-overview">
  Agent System Overview
  <a aria-hidden="true" class="anchor" hidden="" href="#agent-system-overview">
   #
  </a>
 </h1>
 <p>
  In a LLM-powered au

In [22]:
import requests
from bs4 import BeautifulSoup, SoupStrainer

url = "https://lilianweng.github.io/posts/2023-06-23-agent/"

# Step 1: Get HTML
html = requests.get(url).text

# Step 2: Strain for useful parts
only_useful = SoupStrainer(class_=("post-content", "post-title", "post-header"))
soup = BeautifulSoup(html, "html.parser", parse_only=only_useful)

# Step 3: Inspect
print("EXTRACTED PARTS:")
extracted_parts = soup.find_all(True)

for idx, part in enumerate(extracted_parts[0:15]):
    print(f"\n--- PART {idx+1} ---")
    print(f"Tag: {part.name}")
    print(f"Class: {part.get('class')}")
    print(f"Content:\n{part.get_text(strip=True)[:500]}")


EXTRACTED PARTS:

--- PART 1 ---
Tag: header
Class: ['post-header']
Content:
LLM Powered Autonomous AgentsDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng

--- PART 2 ---
Tag: h1
Class: ['post-title']
Content:
LLM Powered Autonomous Agents

--- PART 3 ---
Tag: div
Class: ['post-meta']
Content:
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng

--- PART 4 ---
Tag: div
Class: ['post-content']
Content:
Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such asAutoGPT,GPT-EngineerandBabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.Agent System Overview#In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:PlanningSubgoal

--- PART 5 ---
Tag: p
Class: None
Co

Let’s unpack this because it’s *super important* for how the loader → splitter → vector store flow works.

---

### ✅ **What actually happens with multiple parts?**

When you do this:

```python
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4.SoupStrainer(
        class_=("post-content", "post-title", "post-header")
    )}
)
docs = loader.load()
```

👉 **`loader.load()` returns a list of `Document` objects** — *one per matched chunk*.

So if your page has:

* `<h1 class="post-title"> ... </h1>` → that’s **one Document**
* `<header class="post-header"> ... </header>` → another **Document**
* `<div class="post-content"> ... </div>` → another **Document**

✅ So you may get `len(docs) == 3` for this blog.

---

## 🔍 **Do they automatically combine?**

**No!** `WebBaseLoader` does not automatically merge them.
They stay separate in your `docs` list:

```python
[
    Document(page_content="Agents: What are they?..."),
    Document(page_content="Author: Lilian Weng..."),
    Document(page_content="This post discusses ReAct, ...")
]
```

---

## ✅ **When do they get combined?**

They **don’t** get literally merged into one big `Document`.
Instead, they go through your **TextSplitter**, which will:

* Take each `Document` individually.
* Chunk them into smaller overlapping parts.
* The result is one long list of **smaller chunks**, all from all the original docs.

---

## 📌 **Example:**

```python
# Suppose you have 3 docs:
docs = loader.load()
print(len(docs))  # 3

# Then split:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

print(len(splits))  # Maybe 20 or 30 chunks!
```

So in the end:
✅ The chunks **represent the combined source**
✅ But technically they come from separate source `Documents`
✅ Each chunk keeps `metadata` → so your retrieval knows which source it came from!

---

## 🗂️ **Why this is good**

This design means:

* You can pull exactly the HTML bits you want → fine control.
* You get clear metadata for each chunk.
* Your retriever can later show you: *“This passage came from the post-content div, or the title header, etc.”*




In [23]:
len(extracted_parts)

401

In [24]:
# Suppose you have 3 docs:
docs = loader.load()
print(len(docs))  # 3

# Then split:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

print(len(splits))  # Maybe 20 or 30 chunks!


1
63


In [42]:
all_splits[0]

Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refin

### Splits Inspection

✅ **Fantastic insight — you’re asking exactly the right question!**
You’re right: the *tidiness* of your final chunks **does directly affect** your RAG output quality. Let’s break down *why*, what’s happening, and how to clean it up if you want **higher-quality chunks**.

---

## 📌 **What’s in your current split?**

Your `all_splits[0]` is:

```
Document(
  metadata={...},
  page_content=(
    "LLM Powered Autonomous Agents\n\nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM..."
  )
)
```

So your chunk has:

* ✅ **Good content**: the intro, headings, paragraphs.
* ⚠️ **Extra noise**: metadata like “Date: … | Author: …” that repeats in other chunks.
* ⚠️ Extra `\n` → doesn’t usually hurt the LLM, but lots of them can waste tokens or break flow.

---

## ✅ **Why does this matter for RAG?**

* The embeddings you generate for similarity search are only as good as the text quality.
* If every chunk carries repetitive noise (like the date/author for each blog post), it can:

  * Dilute your semantic similarity.
  * Waste tokens in your context window when you pass retrieved docs to the LLM.
  * Make the final answer sound repetitive or cluttered.

---

## 📌 **Where does this come from?**

* This is from the **HTML structure**:

  * For example, the “post-header” block probably includes the title, date, author.
  * That’s perfectly fine *once* — but when you split the text, it can get copied into multiple chunks because the `RecursiveCharacterTextSplitter` doesn’t “know” what’s semantically important vs. repeated.

---

## ✅ **How tidy is the final product by default?**

👉 By default:

* It’s **pretty good** — you’re getting meaningful, readable text blocks.
* But you *can* get repeated headers or metadata you may not want.
* The newline characters `\n` are usually not a big deal — but for very clean retrieval, you might normalize them.

---

## ⚡️ **How to clean it up (best practices)**

### ✅ 1️⃣ Strip or filter unwanted text before splitting

**One-time removal is the best place:**

```python
# Inspect your docs before splitting:
for doc in docs:
    print("--- RAW DOC ---")
    print(doc.page_content[:500])

# Example cleanup:
for doc in docs:
    text = doc.page_content

    # Example: Remove the Date/Author line if it's always the same
    text = text.replace("Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng", "")

    # Collapse multiple newlines
    text = "\n".join([line.strip() for line in text.splitlines() if line.strip()])

    doc.page_content = text
```

Then split:

```python
all_splits = text_splitter.split_documents(docs)
```

✅ This means your **chunks will never contain that redundant metadata**, so your embeddings will be clearer.

---

### ✅ 2️⃣ Or use a more advanced cleaner

You could also:

* Use regex to strip repeated footers, headers, disclaimers, timestamps.
* Apply `strip()` to remove leading/trailing whitespace.
* Remove line breaks if you don’t need them for semantic structure.

---

### ✅ 3️⃣ Normalizing whitespace helps

LLMs handle `\n` fine — they even use them in chain-of-thought. But random extra line breaks can waste tokens.
One common trick:

```python
def clean_text(text):
    # Remove extra newlines
    lines = text.splitlines()
    lines = [line.strip() for line in lines if line.strip()]
    return " ".join(lines)

for doc in docs:
    doc.page_content = clean_text(doc.page_content)
```

---

## 📌 **Does this really improve RAG performance?**

✅ YES — for real-world projects, *cleaner* chunks:

* Improve embedding similarity.
* Reduce hallucination risk because retrieval is more on-point.
* Use your token budget more effectively → less wasted cost.

---

## 🟢 **So your instinct is 100% right**

| Noisy chunks               | Cleaned chunks                    |
| -------------------------- | --------------------------------- |
| Date/author in every chunk | Appears once, or only in metadata |
| Extra whitespace           | Normalized                        |
| Poor semantic flow         | Improved relevance                |

---

## ✅ ✅ ✅ **Quick takeaway**

* The default tutorial gives you a *decent* baseline.
* But a tiny bit of **text pre-processing** can noticeably improve your results — especially for larger RAG projects.




In [43]:
for i, split in enumerate(all_splits[:10]):  # First 10 splits
    print(f"--- Chunk {i+1} ---")
    print(f"Length (chars): {len(split.page_content)}")
    print(f"Metadata: {split.metadata}")
    print("\nCONTENT:\n")
    print(split.page_content[:500])  # Preview first 500 chars
    print('\n' + '-'*80 + '\n')


--- Chunk 1 ---
Length (chars): 969
Metadata: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}

CONTENT:

LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In a LLM-p

--------------------------------------------------------------------------------

--- Chunk 2 ---
Length (chars): 665
Metadata: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}

CONTENT:

Memory

Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.
Long-term memory:

In [55]:
def clean_text(text):
    # Split the text into lines
    lines = text.splitlines()

    # Strip each line of leading/trailing whitespace and drop empty lines
    lines = [line.strip() for line in lines if line.strip()]

    # Join the cleaned lines into one string with single spaces
    return " ".join(lines)


# Pick a doc to inspect
original = docs[0].page_content

# Clean it
cleaned = clean_text(original)

# Compare
import textwrap
from pprint import pprint  # For pretty-printing dicts if needed

# Compare side by side with nice wrapping
print("--- ORIGINAL ---")
print("\n".join(textwrap.wrap(original[:500], width=80)))

print("\n--- CLEANED ---")
print("\n".join(textwrap.wrap(cleaned[:500], width=80)))


--- ORIGINAL ---
LLM Powered Autonomous Agents Building agents with LLM (large language model) as
its core controller is a cool concept. Several proof-of-concepts demos, such as
AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality
of LLM extends beyond generating well-written copies, stories, essays and
programs; it can be framed as a powerful general problem solver. Agent System
Overview# In a LLM-powered autonomous agent system, LLM functions as the agent’s
brain, complemented by se

--- CLEANED ---
LLM Powered Autonomous Agents Building agents with LLM (large language model) as
its core controller is a cool concept. Several proof-of-concepts demos, such as
AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality
of LLM extends beyond generating well-written copies, stories, essays and
programs; it can be framed as a powerful general problem solver. Agent System
Overview# In a LLM-powered autonomous agent system, LLM functions as th

## Prompt

## 📌 **What is `hub.pull("rlm/rag-prompt")` doing?**

In the new LangChain 0.2+ framework, **`hub.pull`** loads a prompt **from the LangChain Hub**, which is a shared place to store reusable prompts, chains, and components.

```python
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")
```

So `rlm/rag-prompt` is:

* `rlm` → the username or org on the Hub.
* `rag-prompt` → the name of the prompt they published.

This lets you reuse **battle-tested** prompt templates instead of writing one from scratch.

---

## ✅ **Can you see the prompt online?**

✔️ Yes! Every public Hub asset has a **web page**.
👉 The URL pattern is:

```
https://smith.langchain.com/hub/<username>/<asset-name>
```

So in your case:

```
https://smith.langchain.com/hub/rlm/rag-prompt
```

If you open that, you’ll see:

* The full prompt template.
* Inputs it expects (e.g., `{question}` and `{context}`).
* Example usage.
* Versions, if any.

---

## 🟢 **What does this prompt contain?**

Here’s what a standard **RAG prompt** usually looks like:

```
Use the following pieces of context to answer the question at the end.
If you don’t know the answer, just say you don’t know — don’t try to make up an answer.

Context:
{context}

Question:
{question}

Helpful Answer:
```

✅ It ensures:

* The LLM grounds its answer in your retrieved chunks.
* It avoids hallucinations by telling the LLM not to guess.
* Your `{context}` and `{question}` are plugged in dynamically when you call `prompt.invoke(...)`.

---

## 🗂️ **How can you inspect it locally?**

After pulling it:

```python
print(prompt)
```

or:

```python
print(prompt.messages)
```

Depending on whether it’s a `PromptTemplate`, `ChatPromptTemplate`, or some other type, you’ll see its structure.

---

## ✅ **Key takeaway**

| Thing                                            | What it does                                                 |
| ------------------------------------------------ | ------------------------------------------------------------ |
| `hub.pull("rlm/rag-prompt")`                     | Loads a reusable prompt template from the Hub                |
| `https://smith.langchain.com/hub/rlm/rag-prompt` | Lets you view the exact prompt                               |
| `prompt.invoke({...})`                           | Fills in `{question}` and `{context}` when you run the graph |

---

## ✅ ✅ ✅ **Next**

* Go check it out at [smith.langchain.com/hub/rlm/rag-prompt](https://smith.langchain.com/hub/rlm/rag-prompt)


In [66]:
# The prompt is a ChatPromptTemplate with one or more messages
print(type(prompt))
# <class 'langchain_core.prompts.chat.ChatPromptTemplate'>

# The actual list of messages
print(prompt.messages)
# [HumanMessagePromptTemplate(...), ...]

# Grab the first HumanMessagePromptTemplate
human_message = prompt.messages[0]

print(type(human_message))
# <class 'langchain_core.prompts.chat.HumanMessagePromptTemplate'>

# That has a .prompt which is your underlying PromptTemplate
underlying_template = human_message.prompt

print(type(underlying_template))
# <class 'langchain_core.prompts.prompt.PromptTemplate'>

# Finally, get the raw template string!
print('\n')
print("\n".join(textwrap.wrap(underlying_template.template, width=80)))



<class 'langchain_core.prompts.chat.ChatPromptTemplate'>
[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})]
<class 'langchain_core.prompts.chat.HumanMessagePromptTemplate'>
<class 'langchain_core.prompts.prompt.PromptTemplate'>


You are an assistant for question-answering tasks. Use the following pieces of
retrieved context to answer the question. If you don't know the answer, just say
that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}  Context: {context}  Answer:


✅ **Good RAG prompts are often surprisingly simple.**

---

## 📌 **Why is it so simple?**

Your prompt:

```
You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.

Question: {question}  
Context: {context}  
Answer:
```

---

### 🔑 **What this does right**

1️⃣ **Grounds the LLM**

* It says: *“Use the context I gave you. Don’t hallucinate.”*

2️⃣ **Has a clear fallback**

* *“If you don’t know, just say you don’t know.”*
* This reduces confident nonsense when the context is irrelevant or missing.

3️⃣ **Keeps output short**

* *“Three sentences max, keep it concise.”*
* Saves tokens and keeps answers direct — especially important if you show the source passages to the user too.

4️⃣ **No unnecessary instructions**

* No fancy formatting, references, or style — just plain QA.
* The retriever does the heavy lifting: the better your chunks, the better this works.

---

## ✅ **Good RAG = simple prompt + good chunks**

You nailed it:

* The *real power* comes from **high-quality, well-chunked, relevant context**.
* A fancy prompt can’t fix irrelevant or noisy chunks.
* A simple, direct prompt works better because the LLM doesn’t waste tokens or logic on unnecessary instructions.

---

## 📌 **When you might expand it**

You’d only make your RAG prompt more complex when you need:

* ✅ Citations: *“Add \[1], \[2], \[3] next to facts.”*
* ✅ Structured output: JSON, bullet lists, or tables.
* ✅ A custom style or tone: e.g., summarizing in plain language for kids.
* ✅ Additional constraints: like always adding an explicit source URL.

But for baseline QA → **short, direct instructions are king**.

---

## 🟢 **Key takeaway**

| Good chunk quality          | Good retriever                                   | Simple grounding prompt                                |
| --------------------------- | ------------------------------------------------ | ------------------------------------------------------ |
| ✅ Clean text, minimal noise | ✅ Similarity search that finds relevant passages | ✅ “Use this context. Be concise. Don’t make stuff up.” |

This is what gives you reliable, traceable answers.





> *“How important is prepping and optimizing my source documentation for RAG?”*

The short answer: **It’s absolutely critical — maybe the single most important thing you control!**

---

## 📌 **Why your chunk quality matters more than a fancy prompt**

### 🗂️ **RAG is really just two big steps:**

1️⃣ **Retrieve** — Use embeddings + vector store to find relevant text chunks.
2️⃣ **Generate** — Ask the LLM to answer your question using those chunks.

---

### ⚡ **What does “garbage in, garbage out” mean for RAG?**

👉 If your source text is noisy, irrelevant, repetitive, or poorly chunked:

* The vector store will return off-topic or low-quality context.
* The LLM will either hallucinate to fill gaps **OR** parrot irrelevant details.
* No prompt in the world can fix “bad chunks in, bad answer out.”

---

### ✅ **The retrieval step is your “truth filter.”**

A clear, tight prompt just says:

> “Hey model, trust these chunks. Don’t guess.”

But if the chunks themselves are:

* Missing key facts
* Full of boilerplate (headers, footers, legal disclaimers)
* Or too big and vague

…the LLM has no good raw material to pull an accurate answer from.

---

## 🟢 **What makes a chunk “high quality”?**

✅ Clear, well-written text — free from extra noise like navbars, unrelated disclaimers.

✅ Right chunk size — small enough to stay relevant (\~500–1000 tokens), large enough to preserve meaning.

✅ Overlap for context — to prevent splitting related ideas mid-paragraph.

✅ Consistent format — so your embeddings stay semantically sharp.

✅ Enriched metadata — so you can filter and display source details later.

---

## 🧹 **How do people prep their docs well?**

📄 **Good examples:**

* Strip repeated boilerplate: e.g., headers/footers that repeat every page.
* Remove unrelated sections: e.g., site nav, social share buttons.
* Normalize whitespace: e.g., clean up newlines, markdown, code blocks.
* Keep logical units together: e.g., don’t break a section heading from its paragraph.

---

## ✅ **How big is the impact?**

There’s tons of evidence (and real-world LangChain teams see this daily!):

* Improving chunk quality can improve RAG accuracy dramatically — often more than tweaking your LLM or prompt.
* If your chunks are noisy, similarity search pulls bad matches.
* Better chunks = better semantic similarity = more relevant context = less hallucination.

---

## 🔑 **Your secret RAG formula**

| Part                          | How much control you have         | Impact      |
| ----------------------------- | --------------------------------- | ----------- |
| Chunk quality & text cleaning | ✅ 100%                            | 🔥 Massive  |
| Chunk size & overlap          | ✅ 100%                            | 🔥 High     |
| Prompt clarity                | ✅ 100%                            | 🔥 Medium   |
| LLM model choice              | ✅ 50% (API pricing & performance) | 🔥 Moderate |
| Vector store tech             | ✅ 100%                            | 🔥 Medium   |

---

## ✅ ✅ ✅ **Key takeaway**

> 🏆 *“RAG is mostly about smart retrieval.
> Smart retrieval is all about high-quality, clean chunks.”*

A fancy prompt is just the icing on the cake.
A good retriever *is the cake.* 🍰✨


In [59]:
prompt['template']

TypeError: list indices must be integers or slices, not str

You are an assistant for question-answering tasks. Use the following pieces of
retrieved context to answer the question. If you don't know the answer, just say
that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}  Context: {context}  Answer:
