# LangChain Essentials

---

## What You'll Learn

This notebook covers everything you need to know about LangChain for building production LLM applications:

1. ‚úÖ What is LangChain and when to use it
2. ‚úÖ LCEL (LangChain Expression Language) - pipe syntax fundamentals
3. ‚úÖ Document loaders and text splitters
4. ‚úÖ Vector stores integration
5. ‚úÖ Building RAG pipelines
6. ‚úÖ Advanced LCEL patterns (sequential, parallel, branching)
7. ‚úÖ Conversation memory with RunnableWithMessageHistory
8. ‚úÖ Production patterns and best practices

## Prerequisites

‚úÖ Completed environment setup notebook  
‚úÖ OpenAI API key  
‚úÖ Understanding of embeddings (helpful but not required)  

## What is LangChain?

LangChain is a framework for building LLM-powered applications. It provides:
- Pre-built components for common tasks
- Composable abstractions (chains, retrievers, memory)
- Production-ready patterns

**When to use LangChain**:
- Building RAG applications
- Creating conversational AI
- Implementing document processing pipelines
- Need for reusable, testable components

**When NOT to use LangChain**:
- Very simple single LLM calls (use direct API)
- Maximum performance critical (abstractions add overhead)
- Need for very custom logic (abstractions might be restrictive)

---

---

# Section 1: Package Installation

LangChain packages are now modular. We need separate packages for different integrations.

### Packages We'll Install:

| Package | Purpose |
|---------|----------|
| `langchain` | Core framework |
| `langchain-openai` | OpenAI integrations (ChatOpenAI, embeddings) |
| `langchain-chroma` | ChromaDB integration |
| `langchain-community` | Community integrations (loaders, FAISS) |
| `langchain-text-splitters` | Text splitting |
| `pypdf` | PDF parsing backend |
| `chromadb` | Vector database client |

Let's install them:

In [None]:
# Uninstall existing langchain packages (clean slate)
!pip uninstall -y langchain langchain-core langchain-community langchain-openai langchain-chroma langchain-text-splitters

# Install with compatible versions (let pip resolve dependencies)
!pip install -qU \
    langchain \
    langchain-openai \
    langchain-chroma \
    langchain-community \
    langchain-text-splitters \
    pypdf \
    chromadb

# Show installed versions for verification
!pip list | grep langchain

print("\n‚úÖ All packages installed successfully!")

### Verify Installation

Let's verify that the packages are installed correctly:

In [None]:
import langchain
import langchain_openai
import langchain_chroma
import langchain_community
import langchain_text_splitters
import chromadb

print(f"‚úÖ LangChain version: {langchain.__version__}")
print(f"‚úÖ ChromaDB version: {chromadb.__version__}")
print("\nüéâ All imports successful! Ready to build.")

### Verify LCEL Imports

After installation, let's verify that the critical LCEL imports work correctly:

In [None]:
# Verify LCEL imports (modern LangChain approach)
try:
    from langchain_core.prompts import ChatPromptTemplate
    from langchain_core.output_parsers import StrOutputParser
    from langchain_core.runnables import RunnablePassthrough
    print("‚úÖ LCEL imports successful!")
    print("‚ÑπÔ∏è  Note: LangChain uses LCEL (pipe syntax) as the standard approach")
except ImportError as e:
    print(f"‚ùå Import failed: {e}")
    print("\nüîß Troubleshooting:")
    print("1. Restart runtime")
    print("2. Re-run installation cell above")

### Setup API Keys

We'll need an OpenAI API key for this notebook:

In [None]:
import os
import getpass

# Set OpenAI API key
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key: ")

print("‚úÖ API key set successfully!")

---

# Section 2: LCEL vs Traditional Chains

## What is LCEL?

**LCEL (LangChain Expression Language)** is the way to build chains in LangChain.

**Key features**:
- Pipe syntax: `prompt | llm | parser`
- Streaming built-in
- Batch processing
- Async support

## Why LCEL?

‚ùå **Old way (deprecated)**:
```python
from langchain.chains import LLMChain  # DEPRECATED
chain = LLMChain(llm=llm, prompt=prompt)
result = chain.run("question")
```

‚úÖ **New way**:
```python
chain = prompt | llm | parser
result = chain.invoke({"question": "..."}) 
```

**Benefits**: Cleaner, more composable, streaming by default

---

## LCEL Fundamentals

Let's build a simple chain using LCEL:

## Understanding LCEL Core Concepts

Before we build our first chain, let's understand three foundational LCEL concepts.

### 1. What is a Runnable?

A **Runnable** is any component in LangChain that implements a standard interface with these methods:
- `invoke()` - Process single input
- `stream()` - Stream results
- `batch()` - Process multiple inputs

**Examples of Runnables**:
- Prompts (`ChatPromptTemplate`)
- LLMs (`ChatOpenAI`)
- Output parsers (`StrOutputParser`)
- Retrievers
- Custom components

**Key insight**: Everything in LCEL is a Runnable, so they all work the same way!

### 2. The Pipe Operator (`|`)

The **pipe operator** (`|`) chains Runnables together:

```python
chain = prompt | llm | parser
```

**How it works**:
1. Output of `prompt` becomes input to `llm`
2. Output of `llm` becomes input to `parser`
3. Final output is returned

**Data flow**:
```
Input ‚Üí prompt (creates formatted message) ‚Üí llm (generates text) ‚Üí parser (extracts string) ‚Üí Output
```

**Why use pipes?**
- ‚úÖ Clear data flow (left to right)
- ‚úÖ Composable (mix and match components)
- ‚úÖ Streaming built-in
- ‚úÖ Error handling automatic

### 3. What is StrOutputParser?

**StrOutputParser** extracts the text content from LLM responses.

**Without parser**:
```python
result = llm.invoke(...)
# Returns: AIMessage(content="text here", ...)
# Need to access: result.content
```

**With parser**:
```python
result = (llm | StrOutputParser()).invoke(...)
# Returns: "text here"  (just the string!)
```

**Why use it?**
- Simplifies code (no need to access `.content`)
- Consistent output format
- Works with all LangChain LLMs

Now let's see these concepts in action!

---

In [None]:
# Simple LCEL Example
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# Define components
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that explains concepts concisely."),
    ("human", "{question}")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
output_parser = StrOutputParser()

# Compose with pipe syntax
chain = prompt | llm | output_parser

# Invoke
result = chain.invoke({"question": "What is LCEL in one sentence?"})

print("Answer:", result)

### LCEL Supports Streaming

Streaming works out of the box with LCEL:

In [None]:
# LCEL supports streaming out of the box
print("Streaming response:")
print("="*50)

for chunk in chain.stream({"question": "Explain transformers in 3 sentences."}):
    print(chunk, end="", flush=True)

print("\n" + "="*50)
print("‚úÖ Streaming works automatically!")

### LCEL Supports Batch Processing

Process multiple inputs at once:

In [None]:
# LCEL supports batch processing
questions = [
    {"question": "What is Python?"},
    {"question": "What is JavaScript?"},
    {"question": "What is TypeScript?"}
]

results = chain.batch(questions)

for i, result in enumerate(results, 1):
    print(f"{i}. {result[:50]}...\n")

### Key Takeaways

‚úÖ **LCEL is the standard** - use pipe syntax for all new chains  
‚úÖ **Composable**: `prompt | llm | parser` - clear data flow  
‚úÖ **Streaming & batch** - built-in without extra code  
‚úÖ **Replace legacy chains** - LLMChain, ConversationChain are deprecated  

---

---

# Section 3: Advanced LCEL Patterns

Learn how to build complex chains:
- Sequential chains (multi-step)
- Parallel chains (concurrent execution)
- Branching logic (conditional)

## Sequential Chains

Chain multiple steps where each step feeds into the next:

## Understanding RunnablePassthrough

`RunnablePassthrough` is a special Runnable that passes data through unchanged while wrapping it in a dictionary.

### What It Does

```python
{"text": RunnablePassthrough()}
# Input: "Hello world"
# Output: {"text": "Hello world"}
```

### Why Do We Need It?

Prompts expect dictionary inputs with named variables:

```python
# Prompt template expects:
ChatPromptTemplate.from_messages([
    ("human", "{text}")  # ‚Üê Needs {"text": "..."}
])
```

But sometimes we have just a string from the previous step. RunnablePassthrough wraps it:

```python
# Without RunnablePassthrough:
"Hello" | prompt  # ‚ùå Error: prompt expects dict

# With RunnablePassthrough:
{"text": RunnablePassthrough()} | prompt  # ‚úÖ Creates {"text": "Hello"}
```

### Common Patterns

**Pattern 1: Wrap single input**
```python
{"text": RunnablePassthrough()}
```

**Pattern 2: Parallel composition** (RAG pattern from earlier!)
```python
{
    "context": retriever | format_docs,
    "question": RunnablePassthrough()
}
```

**Pattern 3: Sequential wrapping** (what we're about to do!)
```python
{"step1": RunnablePassthrough()} | process | {"step2": RunnablePassthrough()}
```

üéØ **Key insight**: RunnablePassthrough = "Wrap this data to fit the expected format"

---

In [None]:
# Sequential chain: Analyze ‚Üí Summarize
from langchain_core.runnables import RunnablePassthrough

# Step 1: Analyze text
analysis_prompt = ChatPromptTemplate.from_messages([
    ("system", "Analyze the following text and extract key themes."),
    ("human", "{text}")
])

# Step 2: Summarize analysis
summary_prompt = ChatPromptTemplate.from_messages([
    ("system", "Summarize the following analysis in one sentence."),
    ("human", "{analysis}")
])

# Build sequential chain
sequential_chain = (
    {"text": RunnablePassthrough()}
    | analysis_prompt
    | llm
    | StrOutputParser()
    | {"analysis": RunnablePassthrough()}
    | summary_prompt
    | llm
    | StrOutputParser()
)

result = sequential_chain.invoke(
    "Machine learning is transforming industries. From healthcare to finance, "
    "AI systems are making predictions and automating decisions."
)

print("Final summary:", result)

## Parallel Chains

Run multiple chains concurrently and combine results:

In [None]:
# Parallel chains: Analyze text in 3 ways simultaneously
from langchain_core.runnables import RunnableParallel

# Define 3 different analysis chains
summary_chain = (
    ChatPromptTemplate.from_messages([
        ("system", "Summarize this text in one sentence."),
        ("human", "{text}")
    ])
    | llm
    | StrOutputParser()
)

sentiment_chain = (
    ChatPromptTemplate.from_messages([
        ("system", "What is the sentiment of this text? (positive/negative/neutral)"),
        ("human", "{text}")
    ])
    | llm
    | StrOutputParser()
)

keywords_chain = (
    ChatPromptTemplate.from_messages([
        ("system", "Extract 3 keywords from this text."),
        ("human", "{text}")
    ])
    | llm
    | StrOutputParser()
)

# Run all 3 chains in parallel
parallel_chain = RunnableParallel({
    "summary": summary_chain,
    "sentiment": sentiment_chain,
    "keywords": keywords_chain
})

text = "LangChain makes building AI applications incredibly easy and fun. The community is helpful and the documentation is excellent!"

results = parallel_chain.invoke({"text": text})

print("Summary:", results["summary"])
print("Sentiment:", results["sentiment"])
print("Keywords:", results["keywords"])

## Branching with RunnableLambda

Add custom logic for conditional routing:

In [None]:
# Branching: Route based on text length
from langchain_core.runnables import RunnableLambda

def route_by_length(inputs):
    """Route to different prompts based on text length"""
    text = inputs["text"]
    if len(text) < 100:
        return {"text": text, "instruction": "This is short. Expand on it."}
    else:
        return {"text": text, "instruction": "This is long. Summarize it."}

routing_chain = (
    RunnableLambda(route_by_length)
    | ChatPromptTemplate.from_messages([
        ("system", "{instruction}"),
        ("human", "{text}")
    ])
    | llm
    | StrOutputParser()
)

short_text = "AI is the future."
long_text = "Artificial intelligence is revolutionizing every industry. From healthcare diagnostics to financial forecasting, AI systems are becoming indispensable tools for modern businesses."

print("Short text result:")
print(routing_chain.invoke({"text": short_text}))
print("\n" + "="*50 + "\n")
print("Long text result:")
print(routing_chain.invoke({"text": long_text}))

### Key Takeaways

‚úÖ **Sequential**: Chain steps with `|` operator  
‚úÖ **Parallel**: Use `RunnableParallel` for concurrent execution  
‚úÖ **Branching**: Add custom logic with `RunnableLambda`  
‚úÖ **Composable**: Mix and match patterns as needed  

---

---

# Section 4: Document Loading

The first step in RAG is loading documents. LangChain provides **Document Loaders** for this.

## What is a Document Loader?

A Document Loader:
- Reads files from various sources (PDF, TXT, web, databases)
- Extracts text content
- Preserves metadata (source, page numbers, etc.)

## Document Structure

Each loaded document has:
- `page_content`: The actual text
- `metadata`: Dictionary with source info (file path, page number, etc.)

## Common Document Loaders

| Loader | File Type | Use Case |
|--------|-----------|----------|
| `PyPDFLoader` | PDF | Research papers, reports |
| `TextLoader` | TXT | Plain text files |
| `DirectoryLoader` | Multiple files | Bulk loading |
| `WebBaseLoader` | Web pages | Scrape websites |

Let's load a sample document!

### Create a Sample Document

First, let's create a sample text file to work with:

In [None]:
# Create a sample document about Machine Learning
sample_content = """Machine Learning: A Comprehensive Guide

Introduction to Machine Learning
Machine learning is a subset of artificial intelligence (AI) that focuses on building systems that can learn from and make decisions based on data. Unlike traditional programming where rules are explicitly coded, machine learning algorithms learn patterns from data.

Types of Machine Learning
There are three main types of machine learning:

1. Supervised Learning: The algorithm learns from labeled data. Examples include classification and regression tasks. Common algorithms include linear regression, logistic regression, decision trees, and neural networks.

2. Unsupervised Learning: The algorithm finds patterns in unlabeled data. Examples include clustering and dimensionality reduction. Common algorithms include K-means clustering and principal component analysis (PCA).

3. Reinforcement Learning: The algorithm learns through trial and error by receiving rewards or penalties. This is commonly used in robotics, game playing, and autonomous systems.

Deep Learning
Deep learning is a subset of machine learning that uses neural networks with multiple layers (deep neural networks). It has revolutionized fields like computer vision, natural language processing, and speech recognition. Popular frameworks include TensorFlow, PyTorch, and Keras.

Applications of Machine Learning
Machine learning is used in various domains:
- Healthcare: Disease diagnosis, drug discovery
- Finance: Fraud detection, algorithmic trading
- E-commerce: Recommendation systems, demand forecasting
- Transportation: Autonomous vehicles, route optimization
- Natural Language Processing: Chatbots, translation, sentiment analysis

Challenges in Machine Learning
Despite its success, machine learning faces several challenges:
- Data quality and quantity requirements
- Model interpretability and explainability
- Bias and fairness concerns
- Computational resource requirements
- Overfitting and generalization issues

The Future of Machine Learning
The field continues to evolve with trends like AutoML, federated learning, and edge AI. As computing power increases and algorithms improve, machine learning will become even more integral to our daily lives.
"""

# Save to file
with open("ml_guide.txt", "w") as f:
    f.write(sample_content)

print("‚úÖ Sample document created: ml_guide.txt")
print(f"Document length: {len(sample_content)} characters")

### Load Document with TextLoader

Now let's load our sample document:

In [None]:
from langchain_community.document_loaders import TextLoader

# Load the document
loader = TextLoader("ml_guide.txt")
documents = loader.load()

print(f"‚úÖ Loaded {len(documents)} document(s)")
print(f"\nDocument structure:")
print(f"- page_content: {len(documents[0].page_content)} characters")
print(f"- metadata: {documents[0].metadata}")

print(f"\nFirst 300 characters:")
print(documents[0].page_content[:300] + "...")

### Understanding Document Metadata

Metadata is crucial for RAG because it enables:
- **Citations**: Show users where answers came from
- **Filtering**: Search only specific sources
- **Tracking**: Monitor which documents are most useful

In [None]:
# Inspect metadata
for doc in documents:
    print("Metadata:")
    for key, value in doc.metadata.items():
        print(f"  {key}: {value}")

### Key Takeaways

‚úÖ Document loaders extract text and preserve metadata  
‚úÖ Use `langchain_community.document_loaders` for imports  
‚úÖ Each document has `page_content` (text) and `metadata` (source info)  
‚úÖ Metadata enables citations and filtering  

**Next**: We'll chunk these documents into smaller pieces for better retrieval!

---

---

# Section 5: Text Splitting

Now let's see how **LangChain** makes text splitting production-ready!

## Why Use LangChain Text Splitters?

| Manual Chunking | LangChain Splitters |
|-----------------|---------------------|
| Write chunking logic yourself | Pre-built, tested splitters |
| Basic fixed-size or sentence split | Intelligent recursive splitting |
| Manual edge case handling | Handles edge cases automatically |
| Good for learning | Production-ready |

## RecursiveCharacterTextSplitter (Recommended)

This splitter:
- Tries to split on paragraphs (`\n\n`) first
- Falls back to sentences (`. `)
- Then words (` `)
- Finally characters

This preserves semantic meaning better!

Text splitters are now in `langchain_text_splitters` package:

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,  # Characters per chunk
    chunk_overlap=50,  # Overlap between chunks (10%)
    separators=["\n\n", "\n", ". ", " ", ""]  # Try these in order
)

# Split our documents
chunks = text_splitter.split_documents(documents)

print(f"‚úÖ Split {len(documents)} document(s) into {len(chunks)} chunks")
print(f"\nFirst chunk preview:")
print(chunks[0].page_content[:200] + "...")
print(f"\nChunk metadata: {chunks[0].metadata}")

### Why 512 Characters + 50 Overlap?

‚úÖ **Sweet spot**: 512 chars ‚âà 128 tokens (good balance)  
‚úÖ **Overlap**: Maintains context across chunks  
‚úÖ **Not too small**: Enough context for LLM  
‚úÖ **Not too large**: Precise retrieval  

In [None]:
# Inspect chunk sizes
chunk_sizes = [len(chunk.page_content) for chunk in chunks]

print(f"Chunk size statistics:")
print(f"- Average: {sum(chunk_sizes) / len(chunk_sizes):.0f} characters")
print(f"- Min: {min(chunk_sizes)} characters")
print(f"- Max: {max(chunk_sizes)} characters")
print(f"\nAll chunks have metadata: {all(chunk.metadata for chunk in chunks)}")

### Key Takeaways

‚úÖ LangChain splitters are **production-ready**  
‚úÖ Use `langchain_text_splitters.RecursiveCharacterTextSplitter`  
‚úÖ 512 characters + 50 overlap is a good default  
‚úÖ Splitters preserve metadata automatically  

**Next**: We'll store these chunks in a vector database!

---

---

# Section 6: Vector Stores with LangChain

Now let's use LangChain's wrapper for ChromaDB!

Chroma is now in `langchain_chroma` package:

## Understanding Vector Embeddings

Before we create a vector store, let's understand what **embeddings** are and why they're crucial for RAG.

### What are Embeddings?

**Embeddings** are numerical representations of text that capture semantic meaning.

```
Text: "Machine learning is amazing"
‚Üì
Embedding: [0.234, -0.891, 0.542, ..., 0.123]  (1536 numbers)
```

### Why Numbers?

Computers can't understand text directly, but they can:
- Compare numbers
- Calculate similarity
- Search efficiently

### How Similarity Works

Similar texts have similar embeddings:

```
"AI is transforming healthcare" ‚Üí [0.8, 0.2, ...]
"Machine learning in medicine"  ‚Üí [0.7, 0.3, ...]  (Similar!)

"I love pizza" ‚Üí [-0.3, 0.9, ...]  (Very different!)
```

### Why Needed for RAG?

1. **Semantic Search**: Find documents by meaning, not just keywords
   - "ML applications" matches "machine learning uses"
2. **Fast Retrieval**: Vector databases are optimized for similarity search
3. **Better Context**: Retrieve truly relevant chunks

### The RAG Flow with Embeddings

```
1. Chunk documents ‚Üí 2. Generate embeddings ‚Üí 3. Store in vector DB
‚Üì
User question ‚Üí Embed question ‚Üí Find similar chunks ‚Üí Send to LLM
```

**Model**: We'll use `text-embedding-3-small` (fast + accurate for most use cases)

Now let's create embeddings and store them!

---

In [None]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="langchain_essentials"
)

print(f"‚úÖ Created vector store with {len(chunks)} chunks")

### Test Similarity Search

In [None]:
# Test search
results = vector_store.similarity_search("What is deep learning?", k=3)

for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content[:100]}...\n")

### Key Takeaways

‚úÖ LangChain provides wrappers for vector databases  
‚úÖ Use `langchain_chroma.Chroma` for ChromaDB  
‚úÖ `from_documents` creates store and embeds in one step  
‚úÖ Supports similarity search out of the box  

---

---

# Section 7: RAG Pipeline Fundamentals

## Building a Complete RAG Pipeline with LCEL

Now let's combine everything into a complete RAG system using LCEL:

**LCEL RAG Pattern**:
```python
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt | llm | StrOutputParser()
)
```

Let's build it!

## Understanding Retrievers

A **Retriever** is a Runnable that fetches relevant documents based on a query.

### Vector Store vs Retriever

**Vector Store**: Storage + search capabilities
```python
vector_store.similarity_search("query", k=3)  # Manual search
```

**Retriever**: Runnable interface for vector store
```python
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
retriever.invoke("query")  # Same as similarity_search, but Runnable!
```

### Why Use Retriever?

**Because it's a Runnable**, you can use it in LCEL chains:

```python
# Can pipe retriever into other components!
chain = retriever | format_docs | prompt | llm
```

**Cannot do this with vector_store.similarity_search** (not a Runnable)

### Search Parameters

`search_kwargs={"k": 3}`:
- `k`: Number of documents to retrieve
- `k=3` means "get top 3 most similar chunks"

**Trade-off**:
- Higher k = More context, but more noise and cost
- Lower k = Less context, but more focused

**Good defaults**: k=3 for most RAG applications

Now let's build a complete RAG chain using retrievers!

---

In [None]:
# Build RAG chain using LCEL (LangChain Expression Language)
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Setup components
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Create prompt template
template = """Answer the question based on the following context:

Context: {context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

# Helper function to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build LCEL chain with retriever
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("‚úÖ RAG chain created using LCEL!")

### Test the RAG Chain

In [None]:
# Test the RAG chain with LCEL
question = "What are the types of machine learning?"

# With LCEL, we invoke with the question directly (simpler!)
answer = rag_chain.invoke(question)

print(f"Question: {question}\n")
print(f"Answer: {answer}")
print("\n‚úÖ LCEL makes RAG simple and clean!")

## Adding Citations

Let's enhance our RAG to show source documents:

In [None]:
# Citations with LCEL
def ask_with_citations(question):
    """Ask question and return answer with source citations"""
    # Get answer from chain
    answer = rag_chain.invoke(question)
    
    # Get source documents separately
    source_docs = retriever.invoke(question)
    
    # Format citations
    citations = []
    for i, doc in enumerate(source_docs, 1):
        source = doc.metadata.get("source", "Unknown")
        citations.append(f"[{i}] {source}: {doc.page_content[:100]}...")
    
    return f"{answer}\n\nSources:\n" + "\n".join(citations)

# Test with citations
result = ask_with_citations("What is deep learning?")
print(result)

## Production-Ready RAG Pipeline Class

In [None]:
# Imports (global scope - best practice)
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

class RAGPipeline:
    """Production-ready RAG pipeline using LCEL"""

    def __init__(self, file_path):
        # Load and chunk documents
        loader = TextLoader(file_path)
        documents = loader.load()

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)
        chunks = text_splitter.split_documents(documents)

        # Create vector store
        embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
        self.vector_store = Chroma.from_documents(chunks, embeddings)
        self.retriever = self.vector_store.as_retriever(search_kwargs={"k": 3})
        self.llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

        # Helper to format docs
        def format_docs(docs):
            return "\n\n".join(doc.page_content for doc in docs)

        # Build LCEL chain
        template = """Answer based on context:

Context: {context}

Question: {question}

Answer:"""

        prompt = ChatPromptTemplate.from_template(template)

        self.rag_chain = (
            {"context": self.retriever | format_docs, "question": RunnablePassthrough()}
            | prompt
            | self.llm
            | StrOutputParser()
        )

        print(f"‚úÖ RAG Pipeline ready with {len(chunks)} chunks (using LCEL)")

    def ask(self, question):
        """Ask a question and get answer with sources"""
        answer = self.rag_chain.invoke(question)
        # Get source docs separately for citations
        docs = self.retriever.invoke(question)
        return {"answer": answer, "sources": docs}

# Create pipeline
pipeline = RAGPipeline("ml_guide.txt")

In [None]:
# Test the pipeline
response = pipeline.ask("What are machine learning challenges?")

print("Answer:", response["answer"])
print(f"\n‚úÖ Retrieved {len(response['sources'])} source documents")

### Key Takeaways

‚úÖ **LCEL makes RAG simple**: `{context: retriever, question} | prompt | llm`  
‚úÖ **Retriever integration**: Vector store becomes retriever  
‚úÖ **Citations**: Retrieve source docs for attribution  
‚úÖ **Production-ready**: Encapsulate in reusable class  

---

---

# Section 8: Conversation Memory with RunnableWithMessageHistory

## Understanding Conversation Memory

LLMs are **stateless** - they don't remember previous interactions. We need to add memory using:

**RunnableWithMessageHistory** - the approach for conversational apps

## Memory Patterns

| Pattern | Implementation | Use Case |
|---------|----------------|----------|
| **Full History** | RunnableWithMessageHistory + ChatMessageHistory | Short conversations |
| **Sliding Window** | RunnableWithMessageHistory + custom windowed history | Keep last N messages |
| **Summarization** | Custom summarization logic | Long conversations |
| **LangGraph** | LangGraph with built-in checkpointers | Complex multi-agent apps |

Let's implement conversational memory!

## Full History Pattern

## Understanding Conversation Memory Components

To add memory to chains, we need three components working together.

### 1. MessagesPlaceholder

**What**: Placeholder in prompt template for chat history

```python
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful"),
    MessagesPlaceholder(variable_name="history"),  # ‚Üê Chat history goes here
    ("human", "{question}")
])
```

**Why needed**: Prompts are static, but chat history is dynamic (grows with each turn)

### 2. ChatMessageHistory

**What**: Stores conversation messages (user questions + AI responses)

```python
history = ChatMessageHistory()
history.add_user_message("Hi!")
history.add_ai_message("Hello! How can I help?")
# history.messages = [HumanMessage("Hi!"), AIMessage("Hello...")]
```

**Why needed**: Need to remember previous messages to maintain context

### 3. RunnableWithMessageHistory

**What**: Wraps any chain to automatically manage chat history

```python
chain_with_history = RunnableWithMessageHistory(
    chain,                      # Your LCEL chain
    get_session_history,        # Function to get/create history
    input_messages_key="question",   # What user types
    history_messages_key="history"   # Where history goes in prompt
)
```

**How it works**:
1. User sends message
2. RunnableWithMessageHistory retrieves session history
3. Injects history into prompt's MessagesPlaceholder
4. Chain executes with full context
5. Saves new messages to history

### The get_session_history Function

**Why a function?**: Different conversations need different histories

```python
store = {}  # session_id ‚Üí ChatMessageHistory

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]
```

**Multi-user example**:
- Alice's session: `session_id="alice"` ‚Üí separate history
- Bob's session: `session_id="bob"` ‚Üí separate history

Now let's see these components in action!

---

In [None]:
# Modern approach: RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.output_parsers import StrOutputParser

# In-memory store (for demo purposes)
store = {}

def get_session_history(session_id: str):
    """Retrieve or create chat history for a session"""
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# Build conversational chain with LCEL
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{question}")
])

chain = prompt | llm | StrOutputParser()

# Wrap with message history management
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)

print("‚úÖ Conversational chain created!")

### üí° Note: LangGraph for Complex Applications

**What you just learned** (RunnableWithMessageHistory) is the **correct approach for LCEL-based conversational applications**.

However, for **complex applications** with advanced state management needs, LangGraph provides an alternative approach:

**Use RunnableWithMessageHistory when**:
- ‚úÖ Building LCEL chains with conversation memory
- ‚úÖ Simple to moderate conversational flows
- ‚úÖ You want explicit control over history management

**Consider LangGraph when**:
- ‚úÖ Building multi-agent systems
- ‚úÖ Complex state management across multiple components
- ‚úÖ Need built-in checkpointing and persistence
- ‚úÖ Advanced features like "time travel" through conversation history

**For this training**: We're using RunnableWithMessageHistory because it's the standard for LCEL chains and perfect for most RAG applications.

**Learn more**: [LangGraph Memory Documentation](https://docs.langchain.com/oss/python/langgraph/memory)

### Test Conversational Memory

## Understanding Session Management

When invoking a chain with memory, you need to specify which session to use.

### The Config Pattern

```python
chain_with_history.invoke(
    {"question": "What's my name?"},
    config={"configurable": {"session_id": "alice"}}
)
```

### Why This Structure?

**`config`**: Reserved LangChain parameter for chain configuration

**`"configurable"`**: Nested dict for runtime-configurable parameters
- Parameters that change per invocation
- Not part of the chain definition

**`"session_id"`**: Your custom key to identify the conversation
- LangChain passes this to `get_session_history()`
- Different session_id = different conversation

### Visual Flow

```
invoke(question, config={"configurable": {"session_id": "alice"}})
                                                ‚Üì
                RunnableWithMessageHistory calls get_session_history("alice")
                                                ‚Üì
                        Returns Alice's ChatMessageHistory
                                                ‚Üì
                    Injects Alice's history into prompt
                                                ‚Üì
                                Executes chain
                                                ‚Üì
                    Saves new messages to Alice's history
```

### Multi-Session Example

```python
# Alice's conversation
chain_with_history.invoke(
    {"question": "My name is Alice"},
    config={"configurable": {"session_id": "alice"}}
)

# Bob's conversation (completely separate!)
chain_with_history.invoke(
    {"question": "My name is Bob"},
    config={"configurable": {"session_id": "bob"}}
)
```

Now let's test this pattern!

---

In [None]:
# Test conversational memory with sessions
session_id = "user-alice"

# Turn 1: Introduce yourself
response1 = chain_with_history.invoke(
    {"question": "My name is Alice"},
    config={"configurable": {"session_id": session_id}}
)
print(f"User: My name is Alice")
print(f"Assistant: {response1}\n")

# Turn 2: Share profession
response2 = chain_with_history.invoke(
    {"question": "I work as a data scientist"},
    config={"configurable": {"session_id": session_id}}
)
print(f"User: I work as a data scientist")
print(f"Assistant: {response2}\n")

# Turn 3: Test memory - ask about previous context
response3 = chain_with_history.invoke(
    {"question": "What's my name and profession?"},
    config={"configurable": {"session_id": session_id}}
)
print(f"User: What's my name and profession?")
print(f"Assistant: {response3}")

print("\n‚úÖ Memory working! Assistant remembers Alice is a data scientist")

## Sliding Window Memory

Keep only the last N messages to prevent unbounded memory growth:

In [None]:
# Sliding window approach: Keep last N messages
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.messages import BaseMessage

class SlidingWindowHistory(ChatMessageHistory):
    """Custom history that keeps only last N messages"""
    
    def __init__(self, window_size: int = 4):
        # Initialize with empty messages first
        super().__init__()
        # Store window size in a way that works with Pydantic
        object.__setattr__(self, '_window_size', window_size)
    
    def add_message(self, message: BaseMessage) -> None:
        """Add a message and trim to window size"""
        super().add_message(message)
        # Keep only last N messages
        if len(self.messages) > self._window_size:
            # Update messages directly
            self.messages = self.messages[-self._window_size:]

# Create store with sliding window
window_store = {}

def get_window_history(session_id: str):
    if session_id not in window_store:
        window_store[session_id] = SlidingWindowHistory(window_size=4)  # Last 2 exchanges
    return window_store[session_id]

# Create chain with window
chain_with_window = RunnableWithMessageHistory(
    chain,
    get_window_history,
    input_messages_key="question",
    history_messages_key="history",
)

# Test: Add 3 exchanges, oldest should be dropped
session = "window-demo"
chain_with_window.invoke({"question": "Hi"}, config={"configurable": {"session_id": session}})
chain_with_window.invoke({"question": "My name is Bob"}, config={"configurable": {"session_id": session}})
chain_with_window.invoke({"question": "I like Python"}, config={"configurable": {"session_id": session}})

# Check how many messages are kept
history = get_window_history(session)
print(f"Window size: {len(history.messages)} messages (kept last 4)")
print("‚úÖ Oldest messages automatically dropped!")

## Persistent Storage for Production

In production, you'll want to persist conversation history beyond in-memory storage. Here's how to use SQLite for persistent message history:

In [None]:
# Production: Persistent message history with SQLite
from langchain_community.chat_message_histories import SQLChatMessageHistory

# Create store with SQLite persistence
def get_persistent_history(session_id: str):
    """Get or create persistent chat history stored in SQLite"""
    return SQLChatMessageHistory(
        session_id=session_id,
        connection_string="sqlite:///chat_history.db"  # Persists to file
    )

# Create chain with persistent storage
chain_with_persistent_memory = RunnableWithMessageHistory(
    chain,
    get_persistent_history,
    input_messages_key="question",
    history_messages_key="history",
)

# Test: Conversation persists even after restart!
session = "persistent-demo"
response = chain_with_persistent_memory.invoke(
    {"question": "Remember: my favorite color is blue"},
    config={"configurable": {"session_id": session}}
)
print(f"Assistant: {response}")

# Simulate restart - history is still there!
response2 = chain_with_persistent_memory.invoke(
    {"question": "What's my favorite color?"},
    config={"configurable": {"session_id": session}}
)
print(f"\nAfter 'restart': {response2}")
print("\n‚úÖ Chat history persisted to database!")
print("üíæ Check chat_history.db file for stored conversations")

### Key Takeaways

‚úÖ **RunnableWithMessageHistory**: Standard LCEL approach for conversational memory  
‚úÖ **Session management**: Use `session_id` to track different conversations  
‚úÖ **Full history**: ChatMessageHistory stores complete conversation  
‚úÖ **Sliding window**: Custom history class to keep last N messages  
‚úÖ **Persistent storage**: Use SQLChatMessageHistory for production (survives restarts)  
‚úÖ **LangGraph alternative**: For complex multi-agent systems, consider LangGraph  

**Production note**: For persistence beyond in-memory, use Redis, PostgreSQL, or other backends with LangChain's message history integrations.

---

---

# Section 9: Production Patterns & Best Practices

## Production Best Practices

**1. Observability**
- ‚úÖ Use LangSmith for tracing and monitoring
- ‚úÖ Log all operations
- ‚úÖ Track token usage and costs

**2. Error Handling**
- ‚úÖ Try-except in critical paths
- ‚úÖ Retry with exponential backoff
- ‚úÖ Fallback paths for failures

**3. Performance**
- ‚úÖ Batch operations where possible
- ‚úÖ Use streaming for long responses
- ‚úÖ Cache embeddings when appropriate

**4. Testing**
- ‚úÖ Unit test each component
- ‚úÖ Integration tests for full chains
- ‚úÖ LLM-as-a-judge for quality evaluation

## Error Handling Example

In [None]:
# Production-ready RAG with error handling
def safe_rag_query(question: str, max_retries: int = 3):
    """RAG query with error handling and retry logic"""
    for attempt in range(max_retries):
        try:
            # Attempt the query
            answer = rag_chain.invoke(question)
            sources = retriever.invoke(question)
            
            return {
                "answer": answer,
                "sources": sources,
                "error": None
            }
            
        except Exception as e:
            if attempt < max_retries - 1:
                print(f"‚ö†Ô∏è  Attempt {attempt + 1} failed: {str(e)}")
                print(f"üîÑ Retrying... ({attempt + 2}/{max_retries})")
            else:
                print(f"‚ùå All retries exhausted")
                return {
                    "answer": None,
                    "sources": None,
                    "error": str(e)
                }

# Test
result = safe_rag_query("What is supervised learning?")
if result["error"]:
    print(f"Error: {result['error']}")
else:
    print(f"Answer: {result['answer']}")
    print(f"Sources: {len(result['sources'])} documents")

## Streaming for Better UX

In [None]:
# Stream RAG responses for better user experience
print("Streaming RAG response:")
print("="*50)

for chunk in rag_chain.stream("What are the applications of machine learning?"):
    print(chunk, end="", flush=True)

print("\n" + "="*50)
print("‚úÖ Streaming provides better UX for long responses")

## Exercise: Build Your Own RAG

**Task**: Create a RAG system for a domain of your choice

**Steps**:
1. Create a text file with domain knowledge
2. Load and chunk the document
3. Create a vector store
4. Build a RAG chain with LCEL
5. Add conversation memory
6. Test with multiple questions

**Bonus**:
- Add error handling
- Implement streaming
- Add source citations

Use the cells below for your implementation:

In [None]:
# Your implementation here
# Step 1: Create your domain content and save to file


In [None]:
# Step 2-4: Build RAG pipeline


In [None]:
# Step 5: Add conversation memory


In [None]:
# Step 6: Test your RAG system


---

# Section 10: Summary & Next Steps

## What You Learned

‚úÖ **LangChain fundamentals** - When to use it and when not to  
‚úÖ **LCEL (pipe syntax)** - The way to build chains  
‚úÖ **Advanced LCEL patterns** - Sequential, parallel, branching  
‚úÖ **Document loading** - TextLoader and metadata handling  
‚úÖ **Text splitting** - RecursiveCharacterTextSplitter  
‚úÖ **Vector stores** - ChromaDB integration  
‚úÖ **RAG pipelines** - Complete implementation with LCEL  
‚úÖ **Conversation memory** - RunnableWithMessageHistory  
‚úÖ **Production patterns** - Error handling, streaming, best practices  

## Key Takeaways

1. **LCEL is the standard** - Use pipe syntax for all new LangChain code
2. **Start simple** - Use direct APIs for simple tasks, LangChain for RAG and complex workflows
3. **Memory patterns** - Choose full history or sliding window based on use case
4. **Production-ready** - Add error handling, observability, and testing
5. **Composability** - Mix and match LCEL patterns as needed

## Resources

**Official Documentation**:
- [LangChain Documentation](https://python.langchain.com/)
- [LCEL Guide](https://python.langchain.com/docs/expression_language/)
- [LangSmith](https://www.langchain.com/langsmith) - Observability platform

**Learn More**:
- [LangChain Academy](https://academy.langchain.com/) - Free course
- [LangChain Best Practices](https://www.swarnendu.de/blog/langchain-best-practices/)
- [Building RAG Applications](https://blog.langchain.com/)

## Next Steps

1. **Practice**: Complete the exercise above
2. **Experiment**: Try different LCEL patterns
3. **Integrate**: Build a RAG chatbot for your domain
4. **Production**: Add observability with LangSmith
5. **Advanced**: Explore LangGraph for stateful workflows

## Congratulations!

You've mastered LangChain essentials! You can now:
- Build production-ready RAG applications
- Use LCEL to compose complex chains
- Add conversation memory to chatbots
- Apply best practices for production systems

**You're ready for advanced topics: LangGraph, Multi-Agent Systems, and Function Calling!**

---

**End of Notebook**