# TP -- Introduction to LangGraph

## What is LangGraph?

[LangGraph](https://langchain-ai.github.io/langgraph/) is a Python framework for building **stateful, multi-step workflows** (often called *agentic* workflows). It models your application as a **directed graph** where:

- **State** is a shared data structure that flows through the graph.
- **Nodes** are Python functions that read from the state and return updates.
- **Edges** define the order in which nodes execute, including conditional branching.

This is particularly useful for AI pipelines, but the concepts apply to any multi-step process.

## What we're building

A simplified **email threat classifier** inspired by a real-world spam detection system. The pipeline:

1. Checks email URLs against a blocklist
2. Analyzes the email content for phishing keywords
3. Classifies the email as `safe`, `suspicious`, or `dangerous`
4. Generates a response message

You'll work in the `email_classifier/` package. Each part has `TODO` sections to fill in. Run the verification cells in this notebook to check your work.

**No API keys are required** -- everything uses mock data.

---
## Setup

Run the cell below to install dependencies and verify the environment.

In [None]:
%pip install -q langgraph langchain-core

In [None]:
# Quick sanity check
import langgraph
from importlib.metadata import version

print(f"LangGraph version: {version('langgraph')}")
print("Setup OK")

---
## Part 1 -- State Schema

### Concept

In LangGraph, the **state** is the single source of truth that every node can read and write to. It is defined as a Pydantic `BaseModel`. Each field in the model represents a piece of data that flows through the graph.

When a node returns `{"threat_level": "dangerous"}`, LangGraph merges that into the current state, updating only the `threat_level` field and leaving everything else untouched.

### TODO

Open **`email_classifier/state.py`**. Most fields are already declared. Fill in the **three TODO fields**:
- `email_id` — a `str` with default `""`
- `has_attachments` — a `bool` with default `False`
- `content_analysis` — an `Optional[dict]` with default `None`

Use the existing fields as examples. Then run the verification cell below.

In [None]:
from email_classifier.checks import check_part1
check_part1()

---
## Part 2 -- Function Nodes

### Concept

A **node** in LangGraph is just a regular Python function with this signature:

```python
def my_node(state: EmailState) -> dict:
    # read from state using dot notation
    value = state.some_key
    # do some work
    result = process(value)
    # return ONLY the keys you want to update
    return {"output_key": result}
```

Key rules:
- The function receives the **full current state** as a Pydantic model instance.
- Access fields with **dot notation** (e.g. `state.urls`, not `state["urls"]`).
- It returns a dict with **only the keys it wants to update** -- not the entire state.
- LangGraph merges the returned dict into the state automatically.

### TODO

Open **`email_classifier/nodes.py`**. Node 1 (`check_urls`) is already implemented -- read it to understand the pattern. Then implement:
- `analyze_content` -- analyze the email content and set the threat level
- `generate_response` -- return the appropriate message string

Then run the verification cells below.

In [None]:
from email_classifier.checks import check_analyze_content
check_analyze_content()

In [None]:
from email_classifier.checks import check_generate_response
check_generate_response()

---
## Part 3 -- Graph Declaration

### Concept

Now that we have a state schema and node functions, we connect them into a **graph**.

```
check_urls --?--> analyze_content --> generate_response --> END
              |                              ^
              +-- (malicious URLs) ----------+
```

The `?` represents a **conditional edge**: after `check_urls`, if malicious URLs were found, we skip straight to `generate_response` (since the verdict is already `dangerous`). Otherwise we continue through `analyze_content`.

Key API:
```python
from langgraph.graph import StateGraph, START, END

workflow = StateGraph(MyState)
workflow.add_node("name", my_function)
workflow.add_edge(START, "first_node")   # set the entry point
workflow.add_edge("a", "b")              # a always goes to b
workflow.add_conditional_edges(
    "source_node",
    routing_function,                     # returns a string
    {"option1": "node1", "option2": "node2"}
)
graph = workflow.compile()
```

### TODO

Open **`email_classifier/graph.py`**. Most of the graph is already wired up. Fill in the two TODOs:
1. Add the `"analyze_content"` node (follow the pattern of the other `add_node` calls)
2. Add a normal edge from `"analyze_content"` to `"generate_response"`

Then run the verification cells below.

In [None]:
from email_classifier.checks import check_graph_build
graph = check_graph_build()

In [None]:
from email_classifier.checks import check_graph_results
check_graph_results(graph)

In [None]:
# ===== Bonus: Visualize the graph =====
# This prints a Mermaid diagram of your graph.
# You can paste it into https://mermaid.live to see it rendered.

print(graph.get_graph().draw_mermaid())

---
## Part 4 -- Human-in-the-Loop

### Concept

In real-world systems, some decisions should not be fully automated. LangGraph supports **human-in-the-loop** (HITL) patterns where the graph can **pause** execution, let a human inspect and modify the state, and then **resume**.

This is done with:
- A **checkpointer** (e.g. `InMemorySaver`) that saves the graph state between runs.
- **`interrupt_before`** (or `interrupt_after`): a list of node names where the graph should pause.
- **`graph.invoke()`** to start or resume execution (with a `thread_id` config).
- After the interrupt, you can **update the state** and call `invoke` again with `None` to resume.

```python
from langgraph.checkpoint.memory import InMemorySaver

checkpointer = InMemorySaver()
graph = workflow.compile(
    checkpointer=checkpointer,
    interrupt_before=["human_review"]   # pause BEFORE this node
)

# Run until interrupt
config = {"configurable": {"thread_id": "my-thread"}}
state = graph.invoke(input_data, config)
# --> graph pauses before human_review

# Human inspects state, optionally updates it
graph.update_state(config, {"threat_level": "safe"})

# Resume
final_state = graph.invoke(None, config)
```

### TODO

Open **`email_classifier/hitl.py`** and implement:
1. `route_after_analysis` -- routing function that sends suspicious emails to human review
2. `build_hitl_graph` -- graph builder with interrupt support

Then run the verification cells below.

In [None]:
from email_classifier.checks import check_hitl_build
hitl_graph = check_hitl_build()

In [None]:
from email_classifier.checks import check_hitl_safe
check_hitl_safe(hitl_graph)

In [None]:
from email_classifier.checks import check_hitl_interrupt
config = check_hitl_interrupt(hitl_graph)

In [None]:
# The graph is paused. You decide what happens next.
decision = input("What is your verdict? (safe / suspicious / dangerous): ").strip().lower()
while decision not in ("safe", "suspicious", "dangerous"):
    decision = input("Please enter one of: safe, suspicious, dangerous: ").strip().lower()

hitl_graph.update_state(config, {"threat_level": decision})
final_state = hitl_graph.invoke(None, config)

print(f"\nYou chose: {decision}")
print(f"Final response: {final_state['response']}")
print("\n=== Human-in-the-Loop PASSED ===")

In [None]:
# ===== Bonus: Visualize the HITL graph =====
print(hitl_graph.get_graph().draw_mermaid())

---
## Summary

That's it! Here's what we covered:

| Concept | What you saw |
|---------|-------------|
| **State Schema** | Define a Pydantic `BaseModel` that holds all data flowing through the graph |
| **Function Nodes** | Write functions that read state and return partial updates |
| **Graph Declaration** | Connect nodes with edges (linear and conditional) and compile |
| **Human-in-the-Loop** | Use `interrupt_before` + checkpointer to pause, inspect, and resume |

### Going further

- **Subgraphs**: nest one graph inside another
- **Reducers**: use `Annotated[list, operator.add]` to accumulate values instead of overwriting
- **Persistence**: replace `InMemorySaver` with a database-backed checkpointer for production
- **Streaming**: use `graph.stream()` to see node-by-node output in real time
- **LangGraph docs**: https://langchain-ai.github.io/langgraph/