# ðŸª¤ Trap Feature Pruning

Trap features are columns of random noise that masquerade as valid signals.
They inflate model complexity and hurt generalisation. `prune_traps()` uses
statistical profiling and LLM verification to detect and remove them.

In [None]:
import numpy as np
import polars as pl

import loclean

## Create dataset with hidden traps

We build a small housing dataset with two **real** features (`square_feet`,
`bedrooms`) and two **trap** columns (`noise_a`, `noise_b`) â€” pure Gaussian
noise that has zero predictive value.

In [None]:
rng = np.random.default_rng(42)

n = 20
sqft = rng.integers(800, 3000, size=n)
beds = rng.integers(1, 6, size=n)
price = sqft * 150 + beds * 10_000 + rng.normal(0, 5000, size=n)

df = pl.DataFrame(
    {
        "square_feet": sqft,
        "bedrooms": beds,
        "noise_a": rng.standard_normal(n).round(4),
        "noise_b": rng.standard_normal(n).round(4),
        "price": price.astype(int),
    }
)

print(f"Columns before: {df.columns}")
df.head()

## Prune trap features

The pruner profiles each numeric column's distribution and correlation with
the target, then asks the LLM to confirm whether flagged columns look like
injected noise.

In [None]:
pruned, summary = loclean.prune_traps(
    df,
    target_col="price",
    correlation_threshold=0.05,
)

print(f"Columns after:  {pruned.columns}")
print(f"Dropped:        {summary['dropped_columns']}")

## Inspect verdicts

The summary includes per-column verdicts with the LLM's reasoning.

In [None]:
for v in summary.get("verdicts", []):
    status = "ðŸª¤ TRAP" if v["is_trap"] else "âœ… KEEP"
    print(f"{status}  {v['column']}: {v['reason']}")