# Validating DataFrames with `pandera`

We have a DataFrame containing data we would like to validate:

In [1]:
import pandas as pd

df = pd.DataFrame({
    "column1": [1, 4, 0, 10, 9],
    "column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
    "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"],
})

df

Unnamed: 0,column1,column2,column3
0,1,-1.3,value_1
1,4,-1.4,value_2
2,0,-2.9,value_3
3,10,-10.1,value_2
4,9,-20.4,value_1


The validation will basically apply some rules. The rules we want to apply are:

- **column1** — integers less than or equal to 10.
- **column2** — floats between -20 and 0.
- **column3** — strings starting "value_" with a second part after the underscore.

The `pandera` tool has 2 APIs: the 'quick' API and the object-oriented API. We'll do the same thing in both.

## "Quick" API

Uses a custom checker — such a function (or `lambda`, as here) must take a series and return either a boolean, or a boolean series.

In [2]:
import pandera.pandas as pa

schema = pa.DataFrameSchema({
    "column1": pa.Column(int, checks=pa.Check.le(10)),
    "column2": pa.Column(float, checks=[
        pa.Check.lt(0),
        pa.Check.gt(-20),   # SHOULD FAIL
    ]),
    "column3": pa.Column(str, checks=[
        pa.Check.str_startswith("value_"),
        pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
    ]),
})

schema(df)

SchemaError: Column 'column2' failed element-wise validator number 1: greater_than(-20) failure cases: -20.4

## Object-oriented API

In [3]:
from pandera.typing import Series

class Schema(pa.DataFrameModel):

    column1: Series[int] = pa.Field(le=10)
    column2: Series[float] = pa.Field(lt=0, gt=-20)   # SHOULD FAIL
    column3: Series[str] = pa.Field(str_startswith="value_")

    @pa.check("column3")
    def column_3_check(cls, series: Series[str]) -> Series[bool]:
        """Check that column3 values have two elements after being split with '_'"""
        return series.str.split("_", expand=True).shape[1] == 2

Schema.validate(df)    

SchemaError: Column 'column2' failed element-wise validator number 0: greater_than(-20) failure cases: -20.4

<hr />

© 2025 Matt Hall, licensed CC BY-SA