# Structured Extraction with Loclean

This notebook demonstrates how to extract structured data from unstructured text with 100% schema compliance using Pydantic models and GBNF grammars.

> **ðŸ“š Full Documentation:** [Structured Extraction Guide](https://nxank4.github.io/loclean/guides/extraction/)

In [None]:
import loclean
import polars as pl
from pydantic import BaseModel
from typing import List, Optional, Union

## Basic Example

Extract structured data from unstructured text:

In [None]:
class Product(BaseModel):
    name: str
    price: int
    color: str

# Extract from text
item = loclean.extract("Selling red t-shirt for 50k", schema=Product)
print(f"Name: {item.name}")
print(f"Price: {item.price}")
print(f"Color: {item.color}")

## Working with DataFrames

Extract structured data from DataFrame columns:

In [None]:
df = pl.DataFrame({
    "description": [
        "Selling red t-shirt for 50k",
        "Blue jeans available for 30k"
    ]
})

result = loclean.extract(df, schema=Product, target_col="description")

print("Extracted Data:")
print(result)

# Query extracted data using Polars Struct
filtered = result.filter(
    pl.col("description_extracted").struct.field("price") > 40000
)
print("\nProducts with price > 40k:")
print(filtered)

## Complex Schema Examples

### Nested Schemas

In [None]:
class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class Person(BaseModel):
    name: str
    age: int
    email: str
    address: Address
    phone_numbers: List[str]
    notes: Optional[str] = None

text = """
John Doe, age 35, email: john@example.com
Lives at 123 Main St, New York, NY 10001
Phones: 555-1234, 555-5678
Notes: Preferred contact method is email
"""

person = loclean.extract(text, schema=Person)
print(f"Name: {person.name}")
print(f"Age: {person.age}")
print(f"Email: {person.email}")
print(f"Address: {person.address.street}, {person.address.city}, {person.address.state} {person.address.zip_code}")
print(f"Phone Numbers: {person.phone_numbers}")
print(f"Notes: {person.notes}")