# Options to format the State Schema: TypedDict, Dataclass, and Pydantic

## Setup

#### After you download the code from the github repository in your computer
In terminal:
* cd project_name
* pyenv local 3.11.4
* poetry install
* poetry shell

#### To open the notebook with Jupyter Notebooks
In terminal:
* jupyter lab

Go to the folder of notebooks and open the right notebook.

#### To see the code in Virtual Studio Code or your editor of choice.
* open Virtual Studio Code or your editor of choice.
* open the project-folder
* open the 009-schema-with-pydantic.py file

## Create your .env file
* In the github repo we have included a file named .env.example
* Rename that file to .env file and here is where you will add your confidential api keys. Remember to include:
* OPENAI_API_KEY=your_openai_api_key
* LANGCHAIN_TRACING_V2=true
* LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
* LANGCHAIN_API_KEY=your_langchain_api_key
* LANGCHAIN_PROJECT=your_project_name

## Track operations
From now on, we can track the operations **and the cost** of this project from LangSmith:
* [smith.langchain.com](https://smith.langchain.com)

## Connect with the .env file located in the same directory of this notebook

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [1]:
#pip install python-dotenv

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

#### Install LangChain

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [3]:
#!pip install langchain

## Connect with an LLM

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [4]:
#!pip install langchain-openai

In [5]:
from langchain_openai import ChatOpenAI

chatModel35 = ChatOpenAI(model="gpt-3.5-turbo-0125")
chatModel4o = ChatOpenAI(model="gpt-4o")

## State and State Schema in LangGraph
* We use a schema to define the format of the state data. The schema represents the structure and types of data that the app (aka graph) will use.
* All nodes are expected to communicate with that schema.
* LangGraph offers flexibility in how you define your state schema, accommodating various Python types and validation approaches.
* Two simple ways of defining the state schema are:
    * TypedDict.
    * Dataclass.

#### Examples of State Schema with TypeDict
* TypeDict allows you to specify keys and their corresponding value types, but they are not enforced at runtime.

In [6]:
from typing_extensions import TypedDict

class TypedDictState(TypedDict):
    foo: str
    bar: str

* For more specific value constraints, you can use things like the Literal type hint. In the following example, mood can only be either "happy" or "sad".

In [7]:
from typing import Literal

class TypedDictState(TypedDict):
    name: str
    mood: Literal["happy","sad"]

#### Examples of State Schema with Dataclass

In [8]:
from dataclasses import dataclass

@dataclass
class DataclassState:
    name: str
    mood: Literal["happy","sad"]

#### Typedict vs. Dataclass

The main difference between Typedict and Dataclass lies in **what kind of structure they create** and **how they're used**. Here's a simple explanation of each:

1. **`TypedDict` (`TypedDictState`)**
- **What it is**: A `TypedDict` is a way to define the expected structure of a dictionary in Python.
- **Purpose**: It's mostly for **type checking**, not actual runtime behavior. It helps to ensure that the dictionary has the correct keys and value types.
- **Behavior**: At runtime, it's still just a dictionary. You can add or remove keys dynamically, even if that breaks the type.

**Example Usage:**
```python
state: TypedDictState = {"name": "Alice", "mood": "happy"}  # Works fine
state["age"] = 25  # This won't raise an error at runtime but will fail type checking.
```

2. **Dataclass (`DataclassState`)**
- **What it is**: A `dataclass` is a blueprint for creating Python objects (classes) with predefined fields.
- **Purpose**: It's a full-featured Python class with automatic support for common operations like creating objects, comparisons, and string representation.
- **Behavior**: Unlike a dictionary, it has a fixed structure. You can't add or remove attributes dynamically.

**Example Usage:**
```python
state = DataclassState(name="Alice", mood="happy")  # Works fine
state.name = "Bob"  # Allowed, as `name` is a defined attribute
state.age = 25  # This raises an error because `age` isn't defined in the class.
```

**Key Differences:**
| Feature                    | `TypedDictState`                  | `DataclassState`                |
|----------------------------|-----------------------------------|---------------------------------|
| **Type**                  | Dictionary                        | Python class (object)          |
| **Used for**              | Type-checking dictionary structure | Defining structured objects    |
| **Dynamic attributes**    | Allowed at runtime                | Not allowed without definition |
| **Immutable structure**   | No, keys can be added/removed     | Yes, only predefined fields    |
| **Extra features**        | None                              | Supports methods, defaults, comparisons |

---

**Which one to use?**
- Use `TypedDict` when working with dictionaries but want strict type checking.
- Use `dataclass` when you need more structured, reusable, and feature-rich objects.

* The previous options are not solid enough for production-level apps since they sometimes do not detect validation errors. Because of that, the most frequent way to define state schema in professional LangGraph apps is:
    * Pydantic.

#### Examples of State Schema with Pydantic

In [9]:
from pydantic import BaseModel, field_validator, ValidationError

class PydanticState(BaseModel):
    name: str
    mood: str # "happy" or "sad" 

    @field_validator('mood')
    @classmethod
    def validate_mood(cls, value):
        # Ensure the mood is either "happy" or "sad"
        if value not in ["happy", "sad"]:
            raise ValueError("Each mood must be either 'happy' or 'sad'")
        return value

try:
    state = PydanticState(name="John Doe", mood="mad")
except ValidationError as e:
    print("Validation Error:", e)

Validation Error: 1 validation error for PydanticState
mood
  Value error, Each mood must be either 'happy' or 'sad' [type=value_error, input_value='mad', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error


#### Pydantic vs. Typedict and Dataclass
Pydantic introduces **data validation at runtime**, which neither `TypedDict` nor `dataclass` provide by themselves. In simple terms:

**Key Features of `PydanticState`:**
1. **Based on `BaseModel`**:
   - `PydanticState` is not a plain dictionary or a simple class. It’s a **model** that validates its data when an object is created.
   - If the data is invalid, it raises a **`ValidationError`** immediately, ensuring the data is clean and correct.

2. **Runtime Validation**:
   - In the example, the `@field_validator` ensures that `mood` can only be `"happy"` or `"sad"`. If you try to set `mood` to any other value, it raises an error at runtime.

3. **Strict Type Enforcement**:
   - While `TypedDict` and `dataclass` rely on static type checkers, `Pydantic` actually enforces types and constraints **when the program is running**.

4. **Better Error Handling**:
   - If invalid data is provided, `Pydantic` raises a detailed error, which is very helpful for debugging or user feedback.

**Comparison with Previous Examples:**

| Feature                     | `TypedDictState`            | `DataclassState`          | `PydanticState`           |
|-----------------------------|-----------------------------|---------------------------|---------------------------|
| **Type Enforcement**        | Checked only by tools like `mypy` | Checked by tools like `mypy` | Enforced at runtime       |
| **Validation Rules**         | None                       | None                      | Fully supported (customizable) |
| **Dynamic Attributes**       | Allowed (runtime)          | Not allowed               | Not allowed               |
| **Error Handling**           | None at runtime            | None at runtime           | Raises `ValidationError`  |
| **Extra Features**           | None                       | Comparisons, methods, etc. | Validation, serialization, etc. |
| **Library Dependency**       | None                       | None                      | Requires `pydantic`       |


**Example Usage:**
- **Good Input**:
    ```python
    state = PydanticState(name="Alice", mood="happy")  # Works fine
    print(state)
    ```
    Output:
    ```
    name='Alice' mood='happy'
    ```

- **Bad Input**:
    ```python
    state = PydanticState(name="Bob", mood="angry")  # Raises ValidationError
    ```
    Output:
    ```
    Validation Error: 1 validation error for PydanticState
    mood
      Each mood must be either 'happy' or 'sad' (type=value_error)
    ```

**When to Use Each:**
- **`TypedDict`**: When you just want type checking for dictionaries without runtime enforcement.
- **`dataclass`**: When you need structured, reusable objects but don’t need runtime validation.
- **`Pydantic`**: When you need robust runtime validation, especially for APIs, form inputs, or external data sources.

## See how TypedDict and Dataclass do not detect validation errors

#### Example with TypedDict

In [24]:
import random
from IPython.display import Image, display
from langgraph.graph import StateGraph, START, END

def node_1(state):
    print("---Node 1---")
    return {"name": state['name'] + " is ... "}

def node_2(state):
    print("---Node 2---")
    return {"mood": "happy"}

def node_3(state):
    print("---Node 3---")
    return {"mood": "sad"}

def decide_mood(state) -> Literal["node_2", "node_3"]:
        
    # Here, let's just do a 50 / 50 split between nodes 2, 3
    if random.random() < 0.5:

        # 50% of the time, we return Node 2
        return "node_2"
    
    # 50% of the time, we return Node 3
    return "node_3"

# Build graph
builder = StateGraph(TypedDictState)
builder.add_node("node_1", node_1)
builder.add_node("node_2", node_2)
builder.add_node("node_3", node_3)

# Logic
builder.add_edge(START, "node_1")
builder.add_conditional_edges("node_1", decide_mood)
builder.add_edge("node_2", END)
builder.add_edge("node_3", END)

# Add
graph = builder.compile()

In [11]:
graph.invoke({"name":"Julio", "mood": "driven"})

---Node 1---
---Node 3---


{'name': 'Julio is ... ', 'mood': 'sad'}

* As you can see, we introduced a mood that is not valid but we did not get any validation error.

#### Example with Dataclass

In [23]:
def node_1(state):
    print("---Node 1---")
    return {"name": state.name + " is ... "}

# Build graph
builder = StateGraph(DataclassState)
builder.add_node("node_1", node_1)
builder.add_node("node_2", node_2)
builder.add_node("node_3", node_3)

# Logic
builder.add_edge(START, "node_1")
builder.add_conditional_edges("node_1", decide_mood)
builder.add_edge("node_2", END)
builder.add_edge("node_3", END)

# Add
graph = builder.compile()

In [13]:
graph.invoke(DataclassState(name="Julio",mood="driven"))

---Node 1---
---Node 3---


{'name': 'Julio is ... ', 'mood': 'sad'}

## See how Pydantic detect validation errors

In [25]:
# Build graph
builder = StateGraph(PydanticState)
builder.add_node("node_1", node_1)
builder.add_node("node_2", node_2)
builder.add_node("node_3", node_3)

# Logic
builder.add_edge(START, "node_1")
builder.add_conditional_edges("node_1", decide_mood)
builder.add_edge("node_2", END)
builder.add_edge("node_3", END)

# Add
graph = builder.compile()

In [26]:
graph.invoke(PydanticState(name="Julio",mood="driven"))

ValidationError: 1 validation error for PydanticState
mood
  Value error, Each mood must be either 'happy' or 'sad' [type=value_error, input_value='driven', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

* As you can see, we introduced a mood that is not valid but **this time we did get a validation error**.

## How to execute the code from Visual Studio Code
* In Visual Studio Code, see the file 009-schema-with-pydantic.py
* In terminal, make sure you are in the directory of the file and run:
    * python 009-schema-with-pydantic.py