# IHDP Dataset Analysis

Analyzing the IHDP (Infant Health and Development Program) dataset using causal inference methods.



In [1]:
import sys
sys.path.append('..')

import numpy as np
import pandas as pd
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set random seed
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)


In [2]:
from src.data_loader import download_ihdp
from src.dowhy_pipeline import run_full_pipeline

# Download data
data_path = download_ihdp()
data = pd.read_csv(data_path)
print(f"Loaded {len(data)} rows, {len(data.columns)} columns")


2025-10-30 16:05:14,188 - src.data_loader - INFO - IHDP dataset already exists at data\ihdp\ihdp_npci_1.csv


Loaded 746 rows, 30 columns


In [3]:
# Run pipeline
results = run_full_pipeline(
    dataset_name="ihdp",
    estimators=["ipw", "psm", "dr", "dml"],
    output_dir=Path("../results"),
    random_state=RANDOM_SEED
)

print("\nResults:")
print(results.to_string())


2025-10-30 16:05:14,263 - src.dowhy_pipeline - INFO - Running pipeline for ihdp


2025-10-30 16:05:14,264 - src.data_loader - INFO - IHDP dataset already exists at data\ihdp\ihdp_npci_1.csv






2025-10-30 16:05:14,281 - dowhy.causal_model - INFO - Model to find the causal effect of treatment ['treatment'] on outcome ['y_factual']


2025-10-30 16:05:14,285 - dowhy.causal_identifier - INFO - Causal effect can be identified.


2025-10-30 16:05:14,288 - dowhy.causal_identifier - INFO - Instrumental variables for treatment and outcome:[]


2025-10-30 16:05:14,289 - dowhy.causal_identifier - INFO - Frontdoor variables for treatment and outcome:[]


2025-10-30 16:05:14,300 - src.dowhy_pipeline - ERROR - Failed to run ipw: "None of [Index(['treatment'], dtype='object')] are in the [columns]"






2025-10-30 16:05:14,303 - dowhy.causal_model - INFO - Model to find the causal effect of treatment ['treatment'] on outcome ['y_factual']


2025-10-30 16:05:14,305 - dowhy.causal_identifier - INFO - Causal effect can be identified.


2025-10-30 16:05:14,306 - dowhy.causal_identifier - INFO - Instrumental variables for treatment and outcome:[]


2025-10-30 16:05:14,307 - dowhy.causal_identifier - INFO - Frontdoor variables for treatment and outcome:[]


2025-10-30 16:05:14,311 - src.dowhy_pipeline - ERROR - Failed to run psm: "None of [Index(['treatment'], dtype='object')] are in the [columns]"






2025-10-30 16:05:14,313 - dowhy.causal_model - INFO - Model to find the causal effect of treatment ['treatment'] on outcome ['y_factual']


2025-10-30 16:05:14,315 - dowhy.causal_identifier - INFO - Causal effect can be identified.


2025-10-30 16:05:14,317 - dowhy.causal_identifier - INFO - Instrumental variables for treatment and outcome:[]


2025-10-30 16:05:14,319 - dowhy.causal_identifier - INFO - Frontdoor variables for treatment and outcome:[]


2025-10-30 16:05:14,323 - src.dowhy_pipeline - ERROR - Failed to run dr: econometric.doubly_robust is not an existing causal estimator.


2025-10-30 16:05:14,325 - src.dowhy_pipeline - ERROR - EconML pipeline failed: 'treatment'


2025-10-30 16:05:14,334 - src.dowhy_pipeline - INFO - Saved results to ..\results\ihdp\estimators_summary.csv


propensity_score_weighting
propensity_score_matching

Results:
  estimator  ate  runtime_seconds                                                                  error
0       ipw  NaN                0  "None of [Index(['treatment'], dtype='object')] are in the [columns]"
1       psm  NaN                0  "None of [Index(['treatment'], dtype='object')] are in the [columns]"
2        dr  NaN                0         econometric.doubly_robust is not an existing causal estimator.
