# 01 â€” Estimands and Assumptions

CausalRL requires you to be explicit about **what** you estimate (the estimand)
and **which assumptions** identify that estimand. This notebook demonstrates:

- How to define a `PolicyValueEstimand`.
- How assumptions are enforced by estimators.
- How diagnostics flag overlap violations.

## Setup

Suggested environment:

```
pip install "causalrl[plots]"
```

In [1]:
from __future__ import annotations

import numpy as np

from crl.assumptions import AssumptionSet
from crl.assumptions_catalog import MARKOV, OVERLAP, SEQUENTIAL_IGNORABILITY
from crl.benchmarks.bandit_synth import SyntheticBandit, SyntheticBanditConfig
from crl.estimands.policy_value import PolicyValueEstimand
from crl.estimators.importance_sampling import ISEstimator
from crl.utils.seeding import set_seed

In [2]:
set_seed(0)
np.random.seed(0)

## Define an estimand

The policy value estimand encodes the target policy, horizon, discount, and
identification assumptions. Estimators check these assumptions before
running.

In [3]:
benchmark = SyntheticBandit(SyntheticBanditConfig(seed=0))
dataset = benchmark.sample(num_samples=500, seed=1)

estimand = PolicyValueEstimand(
    policy=benchmark.target_policy,
    discount=1.0,
    horizon=1,
    assumptions=AssumptionSet([SEQUENTIAL_IGNORABILITY, OVERLAP]),
)

estimand

PolicyValueEstimand(policy=TabularPolicy, discount=1.0, horizon=1, assumptions=['overlap', 'sequential_ignorability'])

## Overlap violations

If the logging policy never takes some actions that the target policy would
take, importance-weighted estimators become unstable. We'll create a toy
overlap violation and inspect the diagnostics.

In [4]:
dataset_bad = benchmark.sample(num_samples=500, seed=2)
dataset_bad.behavior_action_probs = np.clip(
    dataset_bad.behavior_action_probs, 0.0, 0.02
)

report = ISEstimator(estimand).estimate(dataset_bad)
report.diagnostics

{'overlap': {'min_behavior_prob': 0.02,
  'min_target_prob': 0.038790349205150024,
  'fraction_behavior_below_threshold': 0.0,
  'fraction_target_below_threshold': 0.0,
  'ratio_min': 1.9395174602575012,
  'ratio_max': 35.57378517851286,
  'ratio_q50': 11.083778973023602,
  'ratio_q90': 28.88163828666249,
  'ratio_q99': 35.57378517851286,
  'support_violations': 0},
 'ess': {'ess': 291.0519313570731, 'ess_ratio': 0.5821038627141462},
 'weights': {'min': 1.9395174602575012,
  'max': 35.57378517851286,
  'mean': 13.368479302599834,
  'std': 11.327027264041313,
  'q95': 35.57378517851286,
  'q99': 35.57378517851286,
  'skew': 0.6636527695733089,
  'kurtosis': -0.9647032937905231,
  'tail_fraction': 0.508},
 'max_weight': 35.57378517851286,
 'model': {}}

## Takeaways

- Estimands make assumptions explicit and enforceable.
- Diagnostics are tied to assumptions (e.g., overlap).
- When overlap is poor, estimators warn and flag the assumption.