# âœ… Quality Validation

Check data against natural-language rules using `loclean.validate_quality`.

**Use case:** Instead of writing complex regex or SQL constraints, describe your data quality rules in plain English and let the LLM evaluate compliance.

In [None]:
import polars as pl

import loclean

## Create a dataset with quality issues

Intentionally includes empty names, invalid emails, out-of-range ages, and negative salaries:

In [None]:
df = pl.DataFrame(
    {
        "name": [
            "Alice Johnson",
            "Bob Smith",
            "",
            "Diana Prince",
            "Eve Adams",
            None,
            "Grace Hopper",
            "Hank Hill",
            "Ivy Chen",
            "Jack Black",
        ],
        "email": [
            "alice@example.com",
            "bob@corp.io",
            "invalid-email",
            "diana@example.com",
            "eve@",
            "frank@example.com",
            "grace@navy.mil",
            "hank@propane.com",
            "ivychen.com",
            "jack@rock.com",
        ],
        "age": [28, 35, -5, 42, 150, 31, 85, 47, 23, 0],
        "salary": [
            65_000,
            82_000,
            45_000,
            0,
            95_000,
            71_000,
            120_000,
            58_000,
            -1_000,
            53_000,
        ],
    }
)

df

## Define quality rules in plain English

In [None]:
rules = [
    "Name must not be empty or null",
    "Email must contain exactly one '@' followed by a domain with a dot",
    "Age must be between 1 and 120",
    "Salary must be a positive number greater than zero",
]

for i, rule in enumerate(rules, 1):
    print(f"  {i}. {rule}")

## Run validation

In [None]:
report = loclean.validate_quality(df, rules, batch_size=10, sample_size=10)
report