# ðŸ§¬ Feature Discovery

Automatically discover feature crosses using `loclean.discover_features`.

**Use case:** Given a housing dataset, the LLM proposes mathematical transformations (e.g. `price_per_sqft = price / square_feet`) that maximise mutual information with the target variable.

In [None]:
import polars as pl

import loclean

## Create housing dataset

In [None]:
df = pl.DataFrame(
    {
        "square_feet": [1200, 1800, 2400, 950, 3100, 1600, 2800, 1100, 2000, 1450],
        "bedrooms": [2, 3, 4, 1, 5, 3, 4, 2, 3, 2],
        "bathrooms": [1, 2, 3, 1, 3, 2, 3, 1, 2, 2],
        "year_built": [1990, 2005, 2018, 1975, 2022, 2000, 2015, 1985, 2010, 1995],
        "lot_size_acres": [0.15, 0.25, 0.40, 0.10, 0.60, 0.20, 0.35, 0.12, 0.30, 0.18],
        "price": [
            250_000,
            380_000,
            520_000,
            180_000,
            720_000,
            310_000,
            480_000,
            220_000,
            400_000,
            280_000,
        ],
    }
)

print(f"Original columns: {df.columns}")
df

## Discover new features

The LLM analyses column types, sample values, and correlations to propose transformations:

In [None]:
result = loclean.discover_features(
    df, "price", n_features=5, max_retries=5, model="qwen2.5-coder:1.5b"
)

new_cols = [c for c in result.columns if c not in df.columns]
print(f"Discovered {len(new_cols)} new features: {new_cols}")
print(f"Shape: {df.shape} â†’ {result.shape}")
result

## Inspect new features

In [None]:
if new_cols:
    result.select(new_cols).describe()