# Parameter Sensitivity

Every measurement model has tuning parameters that influence the treatment effect estimate. Subclassification requires choosing the number of strata; nearest neighbour matching requires setting a caliper distance. How sensitive are results to these choices?

This notebook answers two questions:

1. **Single-seed sensitivity** — How does the estimate change as we sweep a tuning parameter?
2. **Sensitivity with uncertainty** — Are the observed patterns robust to sampling variation, or just noise?

We use the same A/A test design (true effect = 0) so that any deviation from 0 reflects estimator behavior, not a real treatment effect.

## Setup

In [None]:
import copy
import os
from pathlib import Path

import numpy as np
import pandas as pd
import yaml
from impact_engine_measure import evaluate_impact, load_results, parse_config_file
from online_retail_simulator import simulate

In [None]:
# Configurable via environment variables for CI (reduced values speed up execution)
NUM_PRODUCTS = int(os.environ.get("IE_DEMO_NUM_PRODUCTS", 20000))
N_REPS = int(os.environ.get("IE_DEMO_N_REPS", 10))

output_path = Path("output/demo_parameter_sensitivity")
output_path.mkdir(parents=True, exist_ok=True)

## Shared Data

All parameter sweeps use the same product catalog.

In [None]:
with open("configs/demo_model_selection_catalog.yaml") as f:
    catalog_config = yaml.safe_load(f)
catalog_config["RULE"]["PRODUCTS"]["PARAMS"]["num_products"] = NUM_PRODUCTS

tmp_catalog = output_path / "catalog_config.yaml"
with open(tmp_catalog, "w") as f:
    yaml.dump(catalog_config, f, default_flow_style=False)

catalog_job = simulate(str(tmp_catalog), job_id="catalog")
products = catalog_job.load_df("products")

print(f"Generated {len(products)} products")
products.head()

In [None]:
config_path = "configs/demo_model_selection.yaml"
true_te = 0  # A/A design: no treatment effect by construction

base_config = parse_config_file(config_path)

In [None]:
def run_with_override(base_config, measurement_override, storage_url, job_id, source_seed=None):
    """Override MEASUREMENT in base config, write temp YAML, run evaluate_impact().

    Optionally override the data-generating seed for Monte Carlo replications.
    """
    config = copy.deepcopy(base_config)
    config["MEASUREMENT"] = measurement_override
    if source_seed is not None:
        config["DATA"]["SOURCE"]["CONFIG"]["seed"] = source_seed

    tmp_config_path = Path(storage_url) / f"config_{job_id}.yaml"
    tmp_config_path.parent.mkdir(parents=True, exist_ok=True)
    with open(tmp_config_path, "w") as f:
        yaml.dump(config, f, default_flow_style=False)

    job_info = evaluate_impact(str(tmp_config_path), storage_url, job_id=job_id)
    result = load_results(job_info)
    return result.impact_results

## Part 1: Parameter Sensitivity (Single Seed)

For a given model and data, how sensitive is the treatment effect estimate to tuning parameters?
We sweep one parameter at a time while keeping everything else fixed.

### 1a. Subclassification: `n_strata`

More strata means finer partitioning of the covariate space.
This can improve precision but may leave strata without common support.

In [None]:
n_strata_values = [2, 3, 5, 10, 20, 50, 100]
subclass_estimates = []
strata_used = []
strata_dropped = []

for n in n_strata_values:
    measurement = {
        "MODEL": "subclassification",
        "PARAMS": {
            "treatment_column": "enriched",
            "covariate_columns": ["price"],
            "n_strata": n,
            "estimand": "att",
            "dependent_variable": "revenue",
        },
    }
    results = run_with_override(base_config, measurement, str(output_path), f"subclass_strata_{n}")
    estimates = results["data"]["impact_estimates"]

    subclass_estimates.append(estimates["treatment_effect"])
    strata_used.append(estimates["n_strata"])
    strata_dropped.append(estimates["n_strata_dropped"])

subclass_sensitivity = pd.DataFrame(
    {
        "n_strata (requested)": n_strata_values,
        "Strata Used": strata_used,
        "Strata Dropped": strata_dropped,
        "Treatment Effect": subclass_estimates,
        "Absolute Error": [abs(est - true_te) for est in subclass_estimates],
    }
)

print("Subclassification: n_strata Sensitivity")
print("-" * 70)
print(subclass_sensitivity.to_string(index=False, float_format=lambda x: f"{x:.4f}"))

In [None]:
from notebook_support import plot_parameter_sensitivity

plot_parameter_sensitivity(
    param_values=n_strata_values,
    estimates=subclass_estimates,
    true_effect=true_te,
    xlabel="Number of Strata (n_strata)",
    ylabel="Treatment Effect",
    title="Subclassification: Sensitivity to n_strata",
)

### 1b. Nearest Neighbour Matching: `caliper`

The caliper controls the maximum allowed distance between a treated unit and its matched control.
Smaller values enforce tighter matches but may discard units, while larger values allow more matches with worse balance.

In [None]:
caliper_values = [0.01, 0.05, 0.1, 0.2, 0.5, 1.0, 2.0]
matching_estimates = []
n_matched_att_list = []

for cal in caliper_values:
    measurement = {
        "MODEL": "nearest_neighbour_matching",
        "PARAMS": {
            "treatment_column": "enriched",
            "covariate_columns": ["price"],
            "dependent_variable": "revenue",
            "caliper": cal,
            "replace": True,
            "ratio": 1,
        },
    }
    results = run_with_override(base_config, measurement, str(output_path), f"matching_caliper_{cal}")
    estimates = results["data"]["impact_estimates"]
    summary = results["data"]["model_summary"]

    matching_estimates.append(estimates["att"])
    n_matched_att_list.append(summary["n_matched_att"])

matching_sensitivity = pd.DataFrame(
    {
        "Caliper": caliper_values,
        "N Matched (ATT)": n_matched_att_list,
        "Treatment Effect (ATT)": matching_estimates,
        "Absolute Error": [abs(est - true_te) for est in matching_estimates],
    }
)

print("Nearest Neighbour Matching: Caliper Sensitivity")
print("-" * 70)
print(matching_sensitivity.to_string(index=False, float_format=lambda x: f"{x:.4f}"))

In [None]:
plot_parameter_sensitivity(
    param_values=caliper_values,
    estimates=matching_estimates,
    true_effect=true_te,
    xlabel="Caliper",
    ylabel="Treatment Effect (ATT)",
    title="Nearest Neighbour Matching: Sensitivity to Caliper",
)

## Part 2: Parameter Sensitivity with Uncertainty

Part 1 showed how estimates change with tuning parameters using a single seed.
Here we add uncertainty bands by running each parameter value across multiple replications.
This reveals whether apparent sensitivity is real or just noise.

In [None]:
rng = np.random.default_rng(seed=2024)
mc_seeds = rng.integers(low=0, high=2**31, size=N_REPS).tolist()

### 2a. Subclassification: `n_strata`

In [None]:
n_strata_mc = {n: [] for n in n_strata_values}

for i, seed in enumerate(mc_seeds):
    for n in n_strata_values:
        measurement = {
            "MODEL": "subclassification",
            "PARAMS": {
                "treatment_column": "enriched",
                "covariate_columns": ["price"],
                "n_strata": n,
                "estimand": "att",
                "dependent_variable": "revenue",
            },
        }
        results = run_with_override(
            base_config,
            measurement,
            str(output_path),
            f"mc_subclass_{n}_rep{i}",
            source_seed=seed,
        )
        n_strata_mc[n].append(results["data"]["impact_estimates"]["treatment_effect"])

    if (i + 1) % 5 == 0:
        print(f"Subclassification sweep: {i + 1}/{N_REPS} replications")

In [None]:
from notebook_support import plot_parameter_sensitivity_mc

strata_means = [np.mean(n_strata_mc[n]) for n in n_strata_values]
strata_stds = [np.std(n_strata_mc[n], ddof=1) for n in n_strata_values]
strata_lower = [m - s for m, s in zip(strata_means, strata_stds)]
strata_upper = [m + s for m, s in zip(strata_means, strata_stds)]

plot_parameter_sensitivity_mc(
    param_values=n_strata_values,
    mean_estimates=strata_means,
    lower_band=strata_lower,
    upper_band=strata_upper,
    true_effect=true_te,
    xlabel="Number of Strata (n_strata)",
    ylabel="Treatment Effect",
    title=f"Subclassification: n_strata Sensitivity ({N_REPS} replications)",
)

### 2b. Nearest Neighbour Matching: `caliper`

In [None]:
caliper_mc = {c: [] for c in caliper_values}

for i, seed in enumerate(mc_seeds):
    for cal in caliper_values:
        measurement = {
            "MODEL": "nearest_neighbour_matching",
            "PARAMS": {
                "treatment_column": "enriched",
                "covariate_columns": ["price"],
                "dependent_variable": "revenue",
                "caliper": cal,
                "replace": True,
                "ratio": 1,
            },
        }
        results = run_with_override(
            base_config,
            measurement,
            str(output_path),
            f"mc_matching_{cal}_rep{i}",
            source_seed=seed,
        )
        caliper_mc[cal].append(results["data"]["impact_estimates"]["att"])

    if (i + 1) % 5 == 0:
        print(f"Matching sweep: {i + 1}/{N_REPS} replications")

In [None]:
cal_means = [np.mean(caliper_mc[c]) for c in caliper_values]
cal_stds = [np.std(caliper_mc[c], ddof=1) for c in caliper_values]
cal_lower = [m - s for m, s in zip(cal_means, cal_stds)]
cal_upper = [m + s for m, s in zip(cal_means, cal_stds)]

plot_parameter_sensitivity_mc(
    param_values=caliper_values,
    mean_estimates=cal_means,
    lower_band=cal_lower,
    upper_band=cal_upper,
    true_effect=true_te,
    xlabel="Caliper",
    ylabel="Treatment Effect (ATT)",
    title=f"Nearest Neighbour Matching: Caliper Sensitivity ({N_REPS} replications)",
)

## Key Takeaways

**Parameter sensitivity.**
- **Subclassification** is relatively stable across `n_strata` values. Very low values may under-partition, while very high values may drop strata with insufficient common support.
- **Nearest neighbour matching** is more sensitive to `caliper`. Very small calipers may discard too many units, while very large calipers degrade match quality.

**Uncertainty bands.**
- Parameter sensitivity plots with uncertainty bands show which apparent patterns are robust to sampling variation and which are noise.
- With `N_REPS=10`, the bands are informative but coarse. For publication-quality analysis, increase to `N_REPS >= 500`.