# 08 — Estimator Selection and Debugging

We show a diagnostics-driven estimator selection workflow and a debugging
checklist when overlap or model fit is poor.

## Setup

```
pip install "causalrl[plots]"
```

In [1]:
from __future__ import annotations

import numpy as np

import pandas as pd

from crl.assumptions import AssumptionSet
from crl.assumptions_catalog import MARKOV, OVERLAP, SEQUENTIAL_IGNORABILITY
from crl.benchmarks.mdp_synth import SyntheticMDP, SyntheticMDPConfig
from crl.estimands.policy_value import PolicyValueEstimand
from crl.selectors import SelectionResult, select_estimator
from crl.utils.seeding import set_seed

In [2]:
set_seed(0)
np.random.seed(0)

## Run selection

We use a heuristic score that favors stable importance weights and reasonable
model fit.

In [3]:
benchmark = SyntheticMDP(SyntheticMDPConfig(seed=0, horizon=5))
dataset = benchmark.sample(num_trajectories=200, seed=1)

estimand = PolicyValueEstimand(
    policy=benchmark.target_policy,
    discount=dataset.discount,
    horizon=dataset.horizon,
    assumptions=AssumptionSet([SEQUENTIAL_IGNORABILITY, OVERLAP, MARKOV]),
)

selection = select_estimator(
    dataset,
    estimand,
    candidates=["is", "wis", "pdis", "dr", "wdr", "mrdr", "fqe"],
    return_scores=True,
)
selection.best, isinstance(selection, SelectionResult)

  weights_norm = np.divide(weights, weights_sum, where=weights_sum > 0)


(ISEstimator(run_diagnostics=True), True)

In [4]:
pd.DataFrame(selection.scores).sort_values("score", ascending=False)

Unnamed: 0,estimator,score,diagnostics,warnings
0,IS,-0.005,{'overlap': {'min_behavior_prob': 0.0140824109...,[Effective sample size ratio below threshold; ...
1,WIS,-0.005,{'overlap': {'min_behavior_prob': 0.0140824109...,[Effective sample size ratio below threshold; ...
2,PDIS,-0.005,{'overlap': {'min_behavior_prob': 0.0140824109...,[Effective sample size ratio below threshold; ...
6,FQE,-0.005,{'overlap': {'min_behavior_prob': 0.0140824109...,[Effective sample size ratio below threshold; ...
3,DR,-0.039173,{'overlap': {'min_behavior_prob': 0.0140824109...,[Effective sample size ratio below threshold; ...
4,WDR,-0.039173,{'overlap': {'min_behavior_prob': 0.0140824109...,[Effective sample size ratio below threshold; ...
5,MRDR,-0.039183,{'overlap': {'min_behavior_prob': 0.0140824109...,[Effective sample size ratio below threshold; ...


## Debug playbook

- **Overlap bad** → inspect ESS and weight tails, consider WIS/DR, or collect
  more coverage.
- **Model fit bad** → check Q-model MSE, increase model capacity, or switch to
  IS-based estimators.
- **Propensities unknown** → estimate behavior policy or use model-based OPE.

## Takeaways

- Estimator selection is heuristic, but diagnostics make it principled.
- Always triangulate with multiple estimators and failure-mode checks.