# Introduction to LCEL and LangGraph: LangChain Powered RAG

In the following notebook we're going to focus on learning how to navigate and build useful applications using LangChain, specifically LCEL, and how to integrate different APIs together into a coherent RAG application!

We'll be building a RAG system to answer questions about how people use AI, using the "How People Use AI" dataset.

In the notebook, you'll complete the following Tasks:

- 🤝 Breakout Room #2:
    1. LangChain and LCEL Concepts
    2. Understanding States and Nodes
    3. Introduction to QDrant Vector Databases
    4. Building a Basic Graph

Let's get started!



## Installation Requirements

Also, make sure Ollama is installed and running with the required models pulled (see instructions below).


## Optional: LangSmith Setup for Tracing and Monitoring

LangSmith provides powerful tracing, monitoring, and debugging capabilities for LangChain applications. While not required for this notebook, setting it up will give you valuable insights into your RAG system's performance.

### Getting LangSmith Credentials

1. **Sign up for LangSmith**: Visit [smith.langchain.com](https://smith.langchain.com) and create a free account
2. **Get your API Key**: 
   - Go to Settings → API Keys
   - Create a new API key and copy it
3. **Set your environment variables** (choose one method below):

**Option A: Set environment variables in your terminal before starting Jupyter:**
```bash
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY="your-api-key-here"
export LANGCHAIN_PROJECT="RAG-Assignment"
```

**Option B: Set them in the notebook (run the cell below):**


In [None]:
# Direct .env loading approach
import os
from pathlib import Path
from dotenv import load_dotenv

# Load .env file directly
env_path = Path("../../../../LEARNING/.env")
# Get the key directly
key = os.getenv('LANGCHAIN_AEI8_API_KEY')

# Set up LangSmith tracing
if key:
    os.environ["LANGCHAIN_TRACING_V2"] = "true"
    os.environ["LANGCHAIN_API_KEY"] = key
    os.environ["LANGCHAIN_PROJECT"] = "RAG-Assignment"
    print("✅ LangSmith tracing enabled")
else:
    print("⚠️ LangSmith API key not found")

# Show status
print("Tracing:", os.getenv("LANGCHAIN_TRACING_V2", "false"))
print("Project:", os.getenv("LANGCHAIN_PROJECT", "Not set"))

### What LangSmith Provides

Once set up, LangSmith will automatically trace your LangChain operations and provide:

- **Execution traces**: See exactly how your RAG pipeline processes each query
- **Performance metrics**: Monitor latency, token usage, and costs
- **Debugging tools**: Inspect intermediate outputs at each step
- **Error tracking**: Identify and debug issues in your chains
- **Dataset management**: Collect and organize your queries and responses

You can view all traces and analytics in your LangSmith dashboard at [smith.langchain.com](https://smith.langchain.com).

> **Note**: LangSmith is completely optional for this assignment. The notebook will work perfectly fine without it, but it's a valuable tool for production applications.


# 🤝 Breakout Room #2

## Set Up Ollama

We'll be using Ollama to run local LLM models. Make sure you have Ollama installed and running:

1. Install Ollama from https://ollama.ai (`curl https://ollama.ai/install.sh | sh`)
2. Make sure the output of `ollama -v` reads `0.11.10` or greater.
2. Pull the models we'll use:
   ```bash
   ollama pull gpt-oss:20b # For the chat model
   ollama pull embeddinggemma:latest  # For embeddings
   ```
3. Ensure Ollama is running (it should start automatically after installation)

### A Note On Runnables

# Understanding LangChain Runnables and LCEL

In LangChain, a Runnable is like a LEGO brick in your AI application - it's a standardized component that can be easily connected with other components. The real power of Runnables comes from their ability to be combined in flexible ways using LCEL (LangChain Expression Language).

## Key Features of Runnables

### 1. Universal Interface
Every Runnable in LangChain follows the same pattern:
- Takes an input
- Performs some operation
- Returns an output

This consistency means you can treat different components (like models, retrievers, or parsers) in the same way.

### 2. Built-in Parallelization
Runnables come with methods for handling multiple inputs efficiently:
```python
# Process inputs in parallel, maintain order
results = chain.batch([input1, input2, input3])

# Process inputs as they complete
for result in chain.batch_as_completed([input1, input2, input3]):
    print(result)
```

### 3. Streaming Support
Perfect for responsive applications:
```python
# Stream outputs as they're generated
for chunk in chain.stream({"query": "Tell me a story"}):
    print(chunk, end="", flush=True)
```

### 4. Easy Composition
The `|` operator makes building pipelines intuitive:
```python
# Create a basic RAG chain
rag_chain = retriever | prompt | model | output_parser
```

## Common Types of Runnables

- **Language Models**: Like our `ChatOllama` instance (running locally with Ollama)
- **Prompt Templates**: Format inputs consistently
- **Retrievers**: Get relevant context from a vector store
- **Output Parsers**: Structure model outputs
- **LangGraph Nodes**: Individual components in our graph

Think of Runnables as the building blocks of your LLM application. Just like how you can combine LEGO bricks in countless ways, you can mix and match Runnables to create increasingly sophisticated applications!



## LangGraph Based RAG

Now that we have a reasonable grasp of LCEL and the idea of Runnables - let's see how we can use LangGraph to build the same system!

### Primer: What is LangGraph?
LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

#### Why Cycles?
In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.

#### Why LangGraph?
Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

> NOTE: We're going to focus on building a simple DAG for today's assignment as an introduction to LangGraph

### Putting the State in Stateful

Earlier we used this phrasing:

> coordinated multi-actor and stateful applications

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

However, in our example here, we're focusing on a simpler `State` object:

```python
class State(TypedDict):
    question: str
    context: list[Document]
    response: str
```

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. **We initialize our state object**:
   ```python
   {
       "question": "",
       "context": [],
       "response": ""
   }
   ```

2. **Our user submits a query to our application.**  
   We store the user's question in `state["question"]`. Now we have:
   ```python
   {
       "question": "How tall is the Eiffel Tower?",
       "context": [],
       "response": ""
   }
   ```

3. **We pass our state object to an Agent node** which is able to read the current state. It will use the value of `state["question"]` as input and might retrieve some context documents related to the question. It then generates a response which it stores in `state["response"]`. For example:
   ```python
   {
       "question": "How tall is the Eiffel Tower?",
       "context": [Document(page_content="...some data...")],
       "response": "The Eiffel Tower is about 324 meters tall..."
   }
   ```

That's it! The important part is that we have a consistent object (`State`) that's passed around, holding the crucial information as we go from one node to the next. This ensures our application has a single source of truth about what has happened so far and what is happening now.



In [3]:
from langgraph.graph import START, StateGraph
from typing_extensions import TypedDict
from langchain_core.documents import Document

class State(TypedDict):
  question: str
  context: list[Document]
  response: str

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL Runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".  

### Building Nodes

We're going to need two nodes:

A node for retrieval, and a node for generation.

Let's start with our `retrieve` node!

Notice how we do not need to update the state object in the node, but can instead return a modification directly to our state.

#### Building a Retriever with LangChain

In order to build our `retrieve` node, we'll first need to build a retriever!

This will involve the following steps: 

1. Ingesting Data
2. Chunking the Data
3. Vectorizing the Data and Storing it in a Vector Database
4. Converting it to a Retriever

##### Retreiver Step 1: Ingesting Data

In today's lesson, we're going to be building a RAG system to answer questions about how people use AI - and we will pull information into our index (vectorized chunks stored in our vector store) through LangChain's [`PyMuPDFLoader`](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html)!

> NOTE: We'll be using an async loader during our document ingesting - but our Jupyter Kernel is already running in an asyc loop! This means we'll want the ability to *nest* async loops. 

In [4]:
import nest_asyncio

nest_asyncio.apply()

Now, we're good to load our documents through the [`PyMuPDFLoader`](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html)!

In [5]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyMuPDFLoader

directory_loader = DirectoryLoader("data", glob="**/*.pdf", loader_cls=PyMuPDFLoader)

ai_usage_knowledge_resources = directory_loader.load()

In [6]:
ai_usage_knowledge_resources[0].page_content[:1000]

'NBER WORKING PAPER SERIES\nHOW PEOPLE USE CHATGPT\nAaron Chatterji\nThomas Cunningham\nDavid J. Deming\nZoe Hitzig\nChristopher Ong\nCarl Yan Shan\nKevin Wadman\nWorking Paper 34255\nhttp://www.nber.org/papers/w34255\nNATIONAL BUREAU OF ECONOMIC RESEARCH\n1050 Massachusetts Avenue\nCambridge, MA 02138\nSeptember 2025\nWe acknowledge help and comments from Joshua Achiam, Hemanth Asirvatham, Ryan \nBeiermeister, Rachel Brown, Cassandra Duchan Solis, Jason Kwon, Elliott Mokski, Kevin Rao, \nHarrison Satcher, Gawesha Weeratunga, Hannah Wong, and Analytics & Insights team. We \nespecially thank Tyna Eloundou and Pamela Mishkin who in several ways laid the foundation for \nthis work. This study was approved by Harvard IRB (IRB25-0983). A repository containing all \ncode run to produce the analyses in this paper is available on request. The views expressed herein \nare those of the authors and do not necessarily reflect the views of the National Bureau of \nEconomic Research.\nAt least one c

#### TextSplitting aka Chunking

We'll use the `RecursiveCharacterTextSplitter` to create our toy example.

It will split based on the following rules:

- Each chunk has a maximum size of 1000 tokens
- It will try and split first on the `\n\n` character, then on the `\n`, then on the `<SPACE>` character, and finally it will split on individual tokens.

Let's implement it and see the results!

In [7]:
import tiktoken
from langchain.text_splitter import RecursiveCharacterTextSplitter

def tiktoken_len(text):
    # Using cl100k_base encoding which is a good general-purpose tokenizer
    # This works well for estimating token counts even with Ollama models
    tokens = tiktoken.get_encoding("cl100k_base").encode(
        text,
    )
    return len(tokens)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 750,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

In [8]:
ai_usage_knowledge_chunks = text_splitter.split_documents(ai_usage_knowledge_resources)

##### 🏗️ Activity #1:

While there's nothing specifically wrong with the chunking method used above - it is a naive approach that is not sensitive to specific data formats.

Brainstorm some ideas that would split large single documents into smaller documents.

1. `YOUR IDEA HERE`
2. `YOUR IDEA HERE`
3. `YOUR IDEA HERE`

# IDEA 1. Split by Sentences

In [9]:
# Split text into complete sentences instead of random pieces
def split_by_sentences(text):
    print("=== Splitting by Sentences ===")
    sentences = text.split('. ')
    print(f"Found {len(sentences)} sentences")
    
    chunks = []
    current_chunk = ""
    
    for sentence in sentences:
        if len(current_chunk + sentence) > 500:  # If too big, start new chunk
            if current_chunk:
                chunks.append(current_chunk)
                print(f"Chunk {len(chunks)}: {current_chunk[:50]}...")
            current_chunk = sentence
        else:
            current_chunk += ". " + sentence if current_chunk else sentence
    
    if current_chunk:
        chunks.append(current_chunk)
        print(f"Chunk {len(chunks)}: {current_chunk[:50]}...")
    
    print(f"Total: {len(chunks)} chunks")
    return chunks

# Test it
text = "Hello world. This is a test. How are you today? I am fine. Thank you for asking."
split_by_sentences(text)

=== Splitting by Sentences ===
Found 4 sentences
Chunk 1: Hello world. This is a test. How are you today? I ...
Total: 1 chunks


['Hello world. This is a test. How are you today? I am fine. Thank you for asking.']

# IDEA 2. Split by Paragraphs

In [10]:
# Split text into paragraphs (double line breaks)
def split_by_paragraphs(text):
    print("=== Splitting by Paragraphs ===")
    paragraphs = text.split('\n\n')
    print(f"Found {len(paragraphs)} paragraphs")
    
    chunks = []
    for i, paragraph in enumerate(paragraphs):
        if len(paragraph) > 20:  # Only keep long paragraphs
            chunks.append(paragraph)
            print(f"Chunk {len(chunks)}: {paragraph[:50]}...")
    
    print(f"Total: {len(chunks)} chunks")
    return chunks

# Test it
text = """First paragraph here. It has some content.

Second paragraph here. It also has content.

Third paragraph here."""
split_by_paragraphs(text)

=== Splitting by Paragraphs ===
Found 3 paragraphs
Chunk 1: First paragraph here. It has some content....
Chunk 2: Second paragraph here. It also has content....
Chunk 3: Third paragraph here....
Total: 3 chunks


['First paragraph here. It has some content.',
 'Second paragraph here. It also has content.',
 'Third paragraph here.']

# IDEA 3. Split with Overlap

In [11]:
# Split text but keep some overlap between chunks
def split_with_overlap(text, size=100, overlap=20):
    print("=== Splitting with Overlap ===")
    print(f"Chunk size: {size}, Overlap: {overlap}")
    
    chunks = []
    start = 0
    
    while start < len(text):
        end = start + size
        chunk = text[start:end]
        chunks.append(chunk)
        print(f"Chunk {len(chunks)}: {chunk[:50]}...")
        start = end - overlap  # Move forward but keep overlap
    
    print(f"Total: {len(chunks)} chunks")
    return chunks

# Test it
text = "This is a very long text that we want to split into smaller pieces with some overlap between them."
split_with_overlap(text)

=== Splitting with Overlap ===
Chunk size: 100, Overlap: 20
Chunk 1: This is a very long text that we want to split int...
Chunk 2: rlap between them....
Total: 2 chunks


['This is a very long text that we want to split into smaller pieces with some overlap between them.',
 'rlap between them.']

#### Embeddings and Dense Vector Search

Now that we have our individual chunks, we need a system to correctly select the relevant pieces of information to answer our query.

This sounds like a perfect job for embeddings!

We'll be using Ollama's `embeddinggemma` model as our embedding model today! This is a powerful open-source embedding model that runs locally.

Let's load it up through LangChain.

In [12]:
from langchain_ollama import OllamaEmbeddings
 
# Using embeddinggemma which is a powerful open-source embedding model
embedding_model = OllamaEmbeddings(model="embeddinggemma:latest")

##### ❓ Question #1:

What is the embedding dimension, given that we're using `embeddinggemma`?

You will need to fill the next cell out correctly with your embedding dimension for the rest of the notebook to run.

In [13]:
embedding_dim = 768  

#### Using A Vector Database - Intoduction to Qdrant

Up to this point, we've been using a dictionary to hold our embeddings - typically, we'll want to use a more robust strategy.

In this bootcamp - we'll be focusing on leveraging [Qdrant's vector database](https://qdrant.tech/qdrant-vector-database/).

Let's take a look at how we set-up Qdrant!

> NOTE: We'll be spending a lot of time learning about Qdrant throughout the remainder of our time together - but for an initial primer, please check out [this resource](https://qdrant.tech/articles/what-is-a-vector-database/)

We are going to be using an "in-memory" Qdrant client, which means that our vectors will be held in our system's memory (RAM) - this is useful for prototyping and developement at smaller scales - but would need to be modified when moving to production. Luckily for us, this modification is trivial!

> NOTE: While LangChain uses the terminology "VectorStore" (also known as a Vector Library), Qdrant is a "Vector Database" - more info. on that [here.](https://weaviate.io/blog/vector-library-vs-vector-database)

In [14]:
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

client = QdrantClient(":memory:")

Next, we need to create a collection - a collection is a specific...collection of vectors within the Qdrant client.

These are useful as they allow us to create multiple different "warehouses" in a single client, which can be leveraged for personalization and more!

Also notice that we define what our vector shapes are (embedding dim) as well as our desired distance metric.

In [15]:
client.create_collection(
    collection_name="ai_usage_knowledge_index",
    vectors_config=VectorParams(size=embedding_dim, distance=Distance.COSINE),
)

True

Now we can assemble our vector database! Notice that we provide our client, our created collection, and our embedding model!

In [16]:
vector_store = QdrantVectorStore(
    client=client,
    collection_name="ai_usage_knowledge_index",
    embedding=embedding_model,
)

Now that we have our vector database set-up, we can add our documents into it!

In [17]:
_ = vector_store.add_documents(documents=ai_usage_knowledge_chunks)

#### Creating a Retriever

Now that we have an idea of how we're getting our most relevant information - let's see how we could create a pipeline that would automatically extract the closest chunk to our query and use it as context for our prompt!

This will involve a popular LangChain interace known as `as_retriever`!

> NOTE: We can still specify how many documents we wish to retrieve per vector.

In [18]:
retriever = vector_store.as_retriever(search_kwargs={"k": 5})

In [19]:
retriever.invoke("How do people use AI in their daily work?")

[Document(metadata={'producer': 'macOS Version 15.4.1 (Build 24E263) Quartz PDFContext, AppendMode 1.1', 'creator': 'LaTeX with hyperref', 'creationdate': '2025-09-12T20:05:32+00:00', 'source': 'data/howpeopleuseai.pdf', 'file_path': 'data/howpeopleuseai.pdf', 'total_pages': 64, 'format': 'PDF 1.6', 'title': 'How People Use ChatGPT', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2025-09-15T10:32:36-04:00', 'trapped': '', 'modDate': "D:20250915103236-04'00'", 'creationDate': 'D:20250912200532Z', 'page': 34, '_id': 'ab54b37d84634b48b06d9e6246a104e4', '_collection_name': 'ai_usage_knowledge_index'}, page_content='Panel A. Work Related\nPanel B1. Asking.\nPanel B2. Doing.\nFigure 23: (continued on next page)\n33'),
 Document(metadata={'producer': 'macOS Version 15.4.1 (Build 24E263) Quartz PDFContext, AppendMode 1.1', 'creator': 'LaTeX with hyperref', 'creationdate': '2025-09-12T20:05:32+00:00', 'source': 'data/howpeopleuseai.pdf', 'file_path': 'data/howpeopleuseai.pdf', 'total_

#### Creating the Node

We're finally ready to create our node!

In [20]:
def retrieve(state: State) -> State:
  retrieved_docs = retriever.invoke(state["question"])
  return {"context" : retrieved_docs}

### Generate Node

Next, let's create our `generate` node - which will leverage LangChain and something called an "LCEL Chain" which you can read more about [here](https://python.langchain.com/docs/concepts/lcel/)!

We'll want to create a chain that does the following: 

1. Formats our inputs into a chat template suitable for RAG
2. Takes that chat template and sends it to an LLM
3. Parses that output into `str` format

Let's get chaining!

#### Chain Components: RAG Chat Template

We'll create a chat template that takes in some query and formats it as a RAG prompt using LangChain's prompt template!

In [21]:
from langchain_core.prompts import ChatPromptTemplate

HUMAN_TEMPLATE = """
#CONTEXT:
{context}

QUERY:
{query}

Use the provide context to answer the provided user query. Only use the provided context to answer the query. If you do not know the answer, or it's not contained in the provided context response with "I don't know"
"""

chat_prompt = ChatPromptTemplate.from_messages([
    ("human", HUMAN_TEMPLATE)
])

In [22]:
chat_prompt.invoke({"context" : "OUR CONTEXT HERE", "query" : "OUR QUERY HERE"}).messages[0].content

'\n#CONTEXT:\nOUR CONTEXT HERE\n\nQUERY:\nOUR QUERY HERE\n\nUse the provide context to answer the provided user query. Only use the provided context to answer the query. If you do not know the answer, or it\'s not contained in the provided context response with "I don\'t know"\n'

##### Chain Components: Generator

We'll next set-up the generator - which will be Ollama's `gpt-oss:20b` model running locally!

In [38]:
from langchain_ollama import ChatOllama

# Using llama3.2:3b which is a smaller, more reliable local model
# (Changed from gpt-oss:20b to avoid resource issues)
ollama_chat_model = ChatOllama(model="llama3.2:3b", temperature=0.6)

Let's now call our model with a formatted prompt.

Notice that we have some nested calls here - we'll see that this is made easier by LCEL.

In [39]:
ollama_chat_model.invoke(chat_prompt.invoke({"context" : "Paris is the capital of France", "query" : "What is the capital of France?"}))

AIMessage(content='The capital of France is Paris.', additional_kwargs={}, response_metadata={'model': 'llama3.2:3b', 'created_at': '2025-09-20T15:44:23.995901Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1306169250, 'load_duration': 830242583, 'prompt_eval_count': 90, 'prompt_eval_duration': 286602083, 'eval_count': 8, 'eval_duration': 188292792, 'model_name': 'llama3.2:3b'}, id='run--7f7a7307-6d5c-40ee-9135-7f8d499a1671-0', usage_metadata={'input_tokens': 90, 'output_tokens': 8, 'total_tokens': 98})

#### Chain Components: `str` Parser

Finally, let's set-up our `StrOutputParser()` which will transform our model's output into a simple `str` to be provided to the user.

> NOTE: You can see us leveraging LCEL in the example below to avoid needing to do nested calls.

In [40]:
from langchain_core.output_parsers import StrOutputParser

generator_chain = chat_prompt | ollama_chat_model | StrOutputParser()

generator_chain.invoke({"context" : "Paris is the capital of France", "query" : "What is the capital of France?"})

'According to the provided context, Paris is indeed the capital of France. Therefore, the answer to the query is:\n\nThe capital of France is Paris.'

### `generate` Node: 

Now we can create our `generate` Node!

In [41]:
def generate(state: State) -> State:
  generator_chain = chat_prompt | ollama_chat_model | StrOutputParser()
  response = generator_chain.invoke({"query" : state["question"], "context" : state["context"]})
  return {"response" : response}

Now we can start defining our graph!

Think of the graph's state as a blank canvas that we can add nodes and edges to.

Every graph starts with two special nodes - START and END - the act as the entry and exit point to the other nodes in the graphs.  

All valid graphs must start at the START node and end at the END node.

In [42]:
# Start with the blank canvas
graph_builder = StateGraph(State)

Now we can add a sequence to our "canvas" (graph) - this can be done by providing a list of nodes, the will automatically have edges that connect the i-th element to the i+1-th element in the list. The final element will be added to the END node unless otherwise specified.

In [43]:
graph_builder = graph_builder.add_sequence([retrieve, generate])

Next, let's connect our START node to our `retrieve` node by adding an edge.

In [44]:
graph_builder.add_edge(START, "retrieve")

<langgraph.graph.state.StateGraph at 0x11e4dff50>

Finally we can compile our graph! This will do basic verification to ensure that the Runnables have the correct inputs/outputs and can be matched.

In [45]:
graph = graph_builder.compile()

Finally, we can visualize our graph!

In [None]:
graph

Let's take it for a spin!

We invoke our graph like we do any other Runnable in LCEL!

> NOTE: That's right, even a compiled graph is a Runnable!

In [32]:
# Debug: Test the model with a simple query first
print("Testing Ollama model with simple query...")
try:
    simple_response = ollama_chat_model.invoke("Hello, how are you?")
    print("✅ Simple query successful!")
    print(f"Response: {simple_response.content}")
except Exception as e:
    print(f"❌ Simple query failed: {e}")

print("\n" + "="*50)
print("Testing with context formatting...")

# Test the prompt template
test_context = "This is a test context about AI usage in work."
test_query = "What is this about?"

try:
    formatted_prompt = chat_prompt.invoke({"context": test_context, "query": test_query})
    print("✅ Prompt formatting successful!")
    print(f"Formatted prompt: {formatted_prompt.messages[0].content[:200]}...")
except Exception as e:
    print(f"❌ Prompt formatting failed: {e}")

print("\n" + "="*50)
print("Testing full chain...")

try:
    test_chain = chat_prompt | ollama_chat_model | StrOutputParser()
    result = test_chain.invoke({"context": test_context, "query": test_query})
    print("✅ Full chain successful!")
    print(f"Result: {result}")
except Exception as e:
    print(f"❌ Full chain failed: {e}")
    print(f"Error type: {type(e).__name__}")
    print(f"Error details: {str(e)}")


Testing Ollama model with simple query...
✅ Simple query successful!
Response: Hello! I’m doing great—thanks for asking. How about you? Anything interesting on your mind today?

Testing with context formatting...
✅ Prompt formatting successful!
Formatted prompt: 
#CONTEXT:
This is a test context about AI usage in work.

QUERY:
What is this about?

Use the provide context to answer the provided user query. Only use the provided context to answer the query. If ...

Testing full chain...
✅ Full chain successful!
Result: It’s about AI usage in work.


In [33]:
# Debug: Test the retriever and context
print("Testing retriever...")
try:
    test_question = "What are the most common ways people use AI in their work?"
    retrieved_docs = retriever.invoke(test_question)
    print(f"✅ Retriever successful! Retrieved {len(retrieved_docs)} documents")
    
    # Check the context size and content
    total_context_length = sum(len(doc.page_content) for doc in retrieved_docs)
    print(f"Total context length: {total_context_length} characters")
    
    # Show first document preview
    if retrieved_docs:
        print(f"First document preview: {retrieved_docs[0].page_content[:200]}...")
        
    # Test with smaller context
    print("\n" + "="*50)
    print("Testing with smaller context...")
    
    # Create a smaller context for testing
    small_context = [retrieved_docs[0]] if retrieved_docs else []
    
    # Test the generate function with smaller context
    test_state = {
        "question": test_question,
        "context": small_context,
        "response": ""
    }
    
    print("Testing generate function with small context...")
    result = generate(test_state)
    print(f"✅ Generate function successful!")
    print(f"Response: {result['response'][:200]}...")
    
except Exception as e:
    print(f"❌ Retriever test failed: {e}")
    print(f"Error type: {type(e).__name__}")
    print(f"Error details: {str(e)}")


Testing retriever...
✅ Retriever successful! Retrieved 5 documents
Total context length: 837 characters
First document preview: Panel A. Work Related
Panel B1. Asking.
Panel B2. Doing.
Figure 23: (continued on next page)
33...

Testing with smaller context...
Testing generate function with small context...
✅ Generate function successful!
Response: I don’t know....


In [35]:
# Alternative: Test with a smaller model if gpt-oss:20b fails
print("Testing with alternative smaller model...")

# Check what other models are available
import subprocess
result = subprocess.run(['ollama', 'list'], capture_output=True, text=True)
print("Available models:")
print(result.stdout)

# Try with a smaller model if available
try:
    # Try llama3.2 if available, otherwise stick with gpt-oss:20b
    smaller_model = ChatOllama(model="llama3.2:3b", temperature=0.6)
    print("Testing with llama3.2:3b...")
    
    test_response = smaller_model.invoke("Hello, how are you?")
    print(f"✅ Smaller model works! Response: {test_response.content[:100]}...")
    
    # Update the model in the generate function
    print("Updating generate function to use smaller model...")
    def generate_with_smaller_model(state: State) -> State:
        generator_chain = chat_prompt | smaller_model | StrOutputParser()
        response = generator_chain.invoke({"query" : state["question"], "context" : state["context"]})
        return {"response" : response}
    
    print("✅ Smaller model setup complete!")
    
except Exception as e:
    print(f"❌ Smaller model test failed: {e}")
    print("Will continue with gpt-oss:20b")


Testing with alternative smaller model...
Available models:
NAME                     ID              SIZE      MODIFIED       
llama3.2:3b              a80c4f17acd5    2.0 GB    17 seconds ago    
gpt-oss:20b              aa4295ac10c3    13 GB     12 minutes ago    
embeddinggemma:latest    85462619ee72    621 MB    39 hours ago      

Testing with llama3.2:3b...
✅ Smaller model works! Response: I'm just a language model, so I don't have feelings or emotions like humans do. However, I'm functio...
Updating generate function to use smaller model...
✅ Smaller model setup complete!


In [36]:
# Create a more robust generate function that handles errors
def generate_robust(state: State) -> State:
    """Generate function with error handling and fallback model"""
    try:
        # Try with the original model first
        generator_chain = chat_prompt | ollama_chat_model | StrOutputParser()
        response = generator_chain.invoke({"query" : state["question"], "context" : state["context"]})
        return {"response" : response}
    except Exception as e:
        print(f"⚠️ Original model failed: {e}")
        print("Trying with smaller model...")
        
        try:
            # Fallback to smaller model
            smaller_model = ChatOllama(model="llama3.2:3b", temperature=0.6)
            generator_chain = chat_prompt | smaller_model | StrOutputParser()
            response = generator_chain.invoke({"query" : state["question"], "context" : state["context"]})
            return {"response" : response}
        except Exception as e2:
            print(f"❌ Both models failed: {e2}")
            return {"response" : "I apologize, but I'm experiencing technical difficulties and cannot process your request at the moment."}

# Rebuild the graph with the robust generate function
print("Rebuilding graph with robust generate function...")
graph_builder_robust = StateGraph(State)
graph_builder_robust = graph_builder_robust.add_sequence([retrieve, generate_robust])
graph_builder_robust.add_edge(START, "retrieve")
graph_robust = graph_builder_robust.compile()

print("✅ Robust graph created!")


Rebuilding graph with robust generate function...
✅ Robust graph created!


In [37]:
# Test the robust graph
print("Testing robust graph with a simple question...")
try:
    response = graph_robust.invoke({"question" : "What are the most common ways people use AI in their work?"})
    print("✅ Robust graph successful!")
    print(f"Response: {response['response'][:300]}...")
    
    # Display the response nicely
    from IPython.display import Markdown, display
    display(Markdown(response["response"]))
    
except Exception as e:
    print(f"❌ Robust graph failed: {e}")
    print(f"Error type: {type(e).__name__}")
    print(f"Error details: {str(e)}")


Testing robust graph with a simple question...
⚠️ Original model failed: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: -1)
Trying with smaller model...
✅ Robust graph successful!
Response: Based on the provided context, the most common ways people use AI in their work are:

1. Productivity: The PDF mentions that productivity is increasing in the quality of decision-making in knowledge-intensive jobs.
2. Decision-making: It also mentions that the IWA classifications were carried out by...


Based on the provided context, the most common ways people use AI in their work are:

1. Productivity: The PDF mentions that productivity is increasing in the quality of decision-making in knowledge-intensive jobs.
2. Decision-making: It also mentions that the IWA classifications were carried out by two annotators, while all other classifications had three, which implies that AI is being used to assist with classification and decision-making tasks.

Additionally, the context mentions that users are using AI for various tasks such as rewriting emails, asking for help fixing errors, and expressing themselves in a more friendly manner. However, it's not clear if these are specific examples of how people use AI in their work or just general examples of AI usage.

It's worth noting that the context does not provide a comprehensive list of common ways people use AI in their work, but rather provides a few snippets of information that suggest AI is being used to enhance productivity and decision-making.

In [46]:
from IPython.display import Markdown, display
response = graph.invoke({"question" : "What are the most common ways people use AI in their work?"})
display(Markdown(response["response"]))

Based on the provided context, there is no direct information about the most common ways people use AI in their work. However, we can infer some insights from the text.

From the page content, we see that:

* Users are using AI for tasks such as rewriting emails to be more friendly (Document 2, page 31).
* There are IWA classifications that were carried out by two annotators, while all other classifications had three (Document 1, page 33).
* The text mentions "knowledge-intensive jobs" and "increasing in the quality of decision-making" (Document 3, page 36), which might imply that AI is being used to improve productivity and decision-making in certain work settings.

However, we cannot conclude with certainty what the most common ways people use AI in their work are based on this context alone.

In [47]:
response = graph.invoke({"question" : "Do people use AI for their personal lives?"})
display(Markdown(response["response"]))

Based on the provided context, there is no clear indication that people use AI for their personal lives. The documents appear to be related to a study or analysis of how people use AI in various contexts, such as work-related tasks.

The documents mention panels, figures, and classifications related to AI usage in knowledge-intensive jobs, but do not provide information about personal use of AI. Therefore, I would answer the query with "I don't know".

In [48]:
response = graph.invoke({"question" : "What concerns or challenges do people have when using AI?"})
display(Markdown(response["response"]))

Based on the provided context, I couldn't find any information that directly answers the question of what concerns or challenges people have when using AI.

However, one possible concern mentioned is social desirability bias (mentioned in the document with page 38), which refers to the tendency for people to provide more socially acceptable responses than their true thoughts or feelings. This bias can affect the accuracy and reliability of data collected on AI usage.

Additionally, the document mentions that "in knowledge-intensive jobs where productivity is increasing in the quality of decision-making" (page 36), but this doesn't necessarily imply a concern or challenge specific to using AI itself.

Therefore, I must respond with:

"I don't know."

In [49]:
response = graph.invoke({"question" : "Who is Batman?"})
display(Markdown(response["response"]))

I don't know who Batman is.

#### ❓ Question #2:
LangGraph's graph-based approach lets us visualize and manage complex flows naturally. How could we extend our current implementation to handle edge cases? For example:
- What if the retriever finds no relevant context?  
- What if the response needs fact-checking?
Consider how you would modify the graph to handle these scenarios.




##### ✅ Answers

2.1 # To handle cases where the retriever finds no relevant context, we can add a **validation node** that checks the quality and relevance of retrieved documents before generation:


In [56]:

def validate_context(state: State) -> State:
    """Validate that we have sufficient relevant context"""
    context = state["context"]
    
    # Check if context is empty or too short
    if not context or len(context) == 0:
        return {
            "context": [],
            "response": "I don't have enough relevant information in my knowledge base to answer your question. Please try rephrasing your question or asking about a different topic.",
            "needs_human": True
        }
    
    # Check if context is too short (less than 50 characters)
    total_content = " ".join([doc.page_content for doc in context])
    if len(total_content.strip()) < 50:
        return {
            "context": context,
            "response": "I found some information, but it may not be sufficient to provide a complete answer. Here's what I found:",
            "needs_human": True
        }
    
    return {"context": context, "needs_human": False}



##### ✅ Answers


2.2 # For fact-checking, we can add a **verification node** that cross-references the generated response with the retrieved context:

In [55]:

def fact_check_response(state: State) -> State:
    """Verify the response against the retrieved context"""
    response = state["response"]
    context = state["context"]
    
    # Extract key claims from the response
    response_lower = response.lower()
    context_text = " ".join([doc.page_content.lower() for doc in context])
    
    # Check if response contains information not in context
    suspicious_phrases = ["definitely", "always", "never", "all", "every"]
    contains_unsupported_claims = any(phrase in response_lower for phrase in suspicious_phrases)
    
    if contains_unsupported_claims and len(context_text) < 100:
        return {
            "response": f"⚠️ {response}\n\n*Note: This response may contain unsupported claims. Please verify with additional sources.*",
            "confidence_score": 0.3
        }
    
    return {
        "response": response,
        "confidence_score": 0.8
    }


### **Enhanced Graph Structure**


# Enhanced State with validation flags
class EnhancedState(TypedDict):
    question: str
    context: list[Document]
    response: str
    needs_human: bool
    confidence_score: float

# Enhanced graph with validation and fact-checking
def create_enhanced_rag_graph():
    graph_builder = StateGraph(EnhancedState)
    
    # Add all nodes
    graph_builder.add_node("retrieve", retrieve)
    graph_builder.add_node("validate_context", validate_context)
    graph_builder.add_node("generate", generate)
    graph_builder.add_node("fact_check", fact_check_response)
    
    # Add conditional edges
    def should_continue(state: EnhancedState) -> str:
        if state.get("needs_human", False):
            return "end"
        return "generate"
    
    # Build the flow
    graph_builder.add_edge(START, "retrieve")
    graph_builder.add_edge("retrieve", "validate_context")
    graph_builder.add_conditional_edges("validate_context", should_continue, {
        "generate": "generate",
        "end": END
    })
    graph_builder.add_edge("generate", "fact_check")
    graph_builder.add_edge("fact_check", END)
    
    return graph_builder.compile()
