# Run-Time Data Validation Frameworks in Python: Exeternal-Facing Frameworks



## Pandera: Data Validation for DataFrames

Pandera brings schema validation to pandas, Polars, and other DataFrame libraries.
It lets you define expectations for columns—types, ranges, nullability, custom checks—and validates entire datasets at runtime. It’s essentially *unit tests for data pipelines*.



**Example: Validate a Simple DataFrame Schema**


In [6]:
import pandera.pandas as pa
import pandas as pd

class PersonSchema(pa.DataFrameModel):
    name: pa.typing.Series[str] = pa.Field(nullable=False)
    age: pa.typing.Series[int] = pa.Field(ge=0, le=120)

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [30, 41, -5],   # <- invalid, age < 0
})

PersonSchema.validate(df)



SchemaError: Column 'age' failed element-wise validator number 0: greater_than_or_equal_to(0) failure cases: -5


**Exercise**: Create a schema for a DataFrame of rectangles, and validate the DataFrame below:

* columns: `length: float`, `width: float`
* both must be **positive**
* add a custom check: `area = length * width` must be **less than 100**

In [7]:
df = pd.DataFrame({
    "length": [3.0, 20.0],
    "width": [4.0, 1.0]
})



## Pydantic-AI: Validated Inputs + LLM Reasoning

Pydantic-AI extends Pydantic models into “agents” that control LLM inputs and outputs.
It enforces strong structure around prompts, validated parameters, and model reasoning steps. It’s helpful when you need reproducible LLM workflows instead of loose free-form strings.



**Example**: A Validated Agent Input Model

In [None]:
from getpass import getpass
import os

# Needs an OpenAI API Key: https://platform.openai.com/login
if not os.getenv("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass()


In [None]:
from pydantic_ai import Agent
from pydantic import BaseModel, Field

class Rectangle(BaseModel):
    length: float = Field(gt=0)
    width: float = Field(gt=0)

agent = Agent("gpt-4o-mini")

result = await agent.run(
    user_prompt="Compute area.",
    deps=Rectangle(length=3, width=4),
)


print(result.output)   # LLM output
print(result.input)    # validated input


**Exercise**: Add a `Person` Model and Ask the LLM.
* Create an agent
* Ask the model: *"How old will this person be in five years?"*
* Run it with a `Person(name="Emma", age=3)` object
* Confirm Pydantic blocks invalid values like `age=-10`

## **msgspec: Fast, Typed, and Strict Structured Data**

`msgspec` provides ultra-fast, typed data structures with built-in validation when encoding or decoding.
Think of it as **dataclasses + validation + serialization**, all optimized in C.

It’s especially good for:

* JSON / MessagePack APIs
* high-performance pipelines
* applications needing strict schemas but minimal overhead



**Example**: Define a Strict Typed Structure



import msgspec

class Person(msgspec.Struct):
    name: str
    age: int

    def __post_init__(self):
        if self.age < 0:
            raise ValueError("age must be non-negative")

data = b'{"name": "Alice", "age": -5}'  # <- invalid in your domain

person = msgspec.json.decode(data, type=Person)
print(person)



**Exercise**: Write JSON validation code to reject this invalid Rectangle:

## Typer: Validation and Structure for Command-Line Interfaces

Typer is a modern library for building command-line interfaces using Python type hints.
It automatically parses arguments, enforces basic validation (types, required/optional values), and generates helpful error messages and documentation.
While Typer isn’t a “data validation” library in the traditional sense, it *does* validate user input at the command boundary — one of the most critical validation layers in real applications.

Type will build a CLI and:

* convert values
* reject invalid types
* show a nice help message if you pass invalid flags

**Example**: A CLI Command With Typed Arguments

Put this into a file called `rectangle.py` and run it:

```bash
python app.py rectangle-area --length 3 --width 4
```


```python
import typer

app = typer.Typer()

@app.command()
def rectangle_area(length: float, width: float):
    """
    Compute the area of a rectangle.
    """
    if length <= 0 or width <= 0:
        typer.echo("Both length and width must be positive!")
        raise typer.Exit(code=1)

    area = length * width
    typer.echo(f"Area: {area}")

if __name__ == "__main__":
    app()
```




**Exercise**: Build a Validated `Person` CLI Tool

Create a CLI command:

```
python app.py create-person --name "Emma" --age 3
```

Requirements:

1. The command should define parameters with type hints:

   ```python
   name: str  
   age: int  
   ```
2. Validate inside the function that:

   * name is not empty
   * age ≥ 0
3. On success, print:
   `"Person(name='Emma', age=3) created!"`
4. On failure, print an error message and exit with `typer.Exit(code=1)`.

