# Subclassification Impact Estimation

This notebook demonstrates **subclassification (stratification) impact estimation** using `evaluate_impact()`.

Subclassification stratifies observations into strata based on covariate quantiles, computes within-stratum treatment effects, and aggregates via weighted average.

## Workflow Overview

1. User provides `products.csv`
2. User configures `DATA.ENRICHMENT` for treatment assignment
3. User calls `evaluate_impact(config.yaml)`
4. Engine handles everything internally (adapter, enrichment, model)

## Setup

In [None]:
import json
from pathlib import Path

import pandas as pd
from impact_engine import evaluate_impact
from online_retail_simulator import simulate

## Step 1: Create Products Catalog

In production, this would be your actual product catalog.

In [None]:
output_path = Path("output/demo_subclassification")
output_path.mkdir(parents=True, exist_ok=True)

job_info = simulate("configs/demo_subclassification_catalog.yaml", job_id="catalog")
products = job_info.load_df("products")

print(f"Products catalog: {job_info.get_store().full_path('products.csv')}")
print(f"Products: {len(products)}")
products

## Step 2: Configure Subclassification

Configure the impact engine with:
- **ENRICHMENT**: Treatment assignment via quality boost (50/50 split)
- **MODEL**: `subclassification` with price as covariate

Single-day simulation (`start_date = end_date`) produces cross-sectional data required by subclassification.

In [None]:
config_path = "configs/demo_subclassification.yaml"

## Step 3: Run Impact Evaluation

A single call to `evaluate_impact()` handles everything:
- Engine creates CatalogSimulatorAdapter
- Adapter simulates metrics (single-day, cross-sectional)
- Adapter applies enrichment (treatment assignment + revenue boost)
- SubclassificationAdapter stratifies on price, computes per-stratum effects

In [None]:
results_path = evaluate_impact(config_path, str(output_path), job_id="results")
print(f"Results saved to: {results_path}")

## Step 4: Review Results

In [None]:
with open(results_path) as f:
    results = json.load(f)

data = results["data"]
estimates = data["impact_estimates"]
summary = data["model_summary"]

print("=" * 60)
print("SUBCLASSIFICATION IMPACT ESTIMATION RESULTS")
print("=" * 60)

print(f"\nModel Type: {results['model_type']}")
print(f"Estimand:   {summary['estimand']}")

print("\n--- Impact Estimates ---")
print(f"Treatment Effect:    {estimates['treatment_effect']:.4f}")
print(f"Strata Used:         {estimates['n_strata']}")
print(f"Strata Dropped:      {estimates['n_strata_dropped']}")

print("\n--- Model Summary ---")
print(f"Observations:        {summary['n_observations']}")
print(f"Treated:             {summary['n_treated']}")
print(f"Control:             {summary['n_control']}")

In [None]:
# Per-stratum details artifact
results_dir = Path(results_path).parent
stratum_path = results_dir / "subclassification__stratum_details.parquet"
stratum_df = pd.read_parquet(stratum_path)

print("--- Per-Stratum Breakdown ---")
print("-" * 70)
print(
    f"{'Stratum':<10} {'Treated':<10} {'Control':<10} {'Mean T':<12} {'Mean C':<12} {'Effect':<12}"
)
print("-" * 70)
for _, row in stratum_df.iterrows():
    print(
        f"{row['stratum']:<10} {row['n_treated']:<10} {row['n_control']:<10} "
        f"{row['mean_treated']:<12.2f} {row['mean_control']:<12.2f} {row['effect']:<12.2f}"
    )

print("\n" + "=" * 60)
print("Demo Complete!")
print("=" * 60)