# üîç Missingness Recognition

Not all missing data is random. `recognize_missingness()` detects **Missing
Not At Random (MNAR)** patterns ‚Äî where the *reason* for missingness
contains signal ‚Äî and encodes them as boolean feature flags.

In [None]:
import polars as pl

import loclean

## Create dataset with MNAR pattern

Imagine a clinical trial where **income** is missing *because* the patient
is unemployed. The missingness itself carries information that a simple
imputation would destroy.

In [None]:
df = pl.DataFrame(
    {
        "patient_id": list(range(1, 11)),
        "age": [28, 45, 62, 33, 51, 29, 70, 38, 55, 41],
        "employment": [
            "employed",
            "employed",
            "retired",
            "unemployed",
            "employed",
            "unemployed",
            "retired",
            "employed",
            "employed",
            "unemployed",
        ],
        "income": [
            55000,
            82000,
            None,
            None,
            91000,
            None,
            None,
            67000,
            73000,
            None,
        ],
        "diagnosis": [
            "healthy",
            "diabetes",
            "healthy",
            "diabetes",
            "healthy",
            "diabetes",
            "healthy",
            "healthy",
            "diabetes",
            "diabetes",
        ],
    }
)

print(f"Null counts: {df.null_count().to_dicts()[0]}")
df

## Detect MNAR patterns

The recogniser samples null vs. non-null rows, asks the LLM to explain
**why** missingness occurs, then compiles a boolean encoder if the pattern
is MNAR.

In [None]:
augmented, summary = loclean.recognize_missingness(df)

print(f"Original columns:  {df.columns}")
print(f"Augmented columns: {augmented.columns}")
augmented

## Inspect pattern summaries

The summary maps each analysed column to the LLM's explanation of the
missingness pattern.

In [None]:
for col, info in summary.items():
    print(f"Column: {col}")
    if isinstance(info, dict):
        for k, v in info.items():
            print(f"  {k}: {v}")
    else:
        print(f"  {info}")
    print()