# CISD Framework Tutorial: Causal-Intervention Scenario Design for Active Transportation

This tutorial demonstrates the implementation of the Causal-Intervention Scenario Design (CISD) framework and AI-augmented causal inference approaches described in the paper "Causality in Active Transportation: Exploring Travel Behavior and Well-being".

## Overview

CISD treats policy analysis as a two-stage act:
1. Choose an explicit scenario vector that bundles mediating and moderating features
2. Apply a treatment indicator to the population, asking what would happen if the same individuals experienced different treatments while scenario elements remain pinned to user-specified references

The canonical CISD estimand is defined as:

$$\text{CISD} = E_P[Y_i(1,\sigma(i)) - Y_i(0,\sigma(i))]$$

Where:
- $Y_i(d,s)$ is the potential well-being outcome for commuter $i$ under treatment $d$ and scenario $s$
- $P$ is the target population
- $\sigma: P \rightarrow S$ maps each unit to the scenario of interest

This tutorial will cover:
1. Basic CISD implementation with synthetic data
2. Stochastic scenarios for policy simulation
3. AI-augmented causal inference with multimodal data

## Setup and Dependencies

First, let's import the necessary libraries and set up our environment.

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.model_selection import train_test_split

# Set random seed for reproducibility
np.random.seed(42)

# Import CISD package
import sys
sys.path.append('..')
from cisd.core import CISD

# Configure plotting
plt.style.use('seaborn-whitegrid')
sns.set_context("talk")

## 1. Generating Synthetic Transportation Data

Let's generate synthetic data for active transportation research, simulating a dataset that includes:

- **Features (X)**: 
  - Socioeconomic status (SES)
  - Distance to work/school
  - Age
  - Gender
  - Neighborhood walkability
  
- **Treatment (D)**: Exposure to Pedestrian-Friendly Infrastructure Project (PFIP)

- **Mediators (M)**:
  - Travel mode choice (active vs motorized)
  - Physical activity level
  - Perceived safety
  
- **Outcome (Y)**: Well-being score (combination of physical and mental well-being)

- **Scenario (S)**: Different combinations of mediator values that could be targeted by policies

In [None]:
def generate_synthetic_data(n_samples=1000):
    """Generate synthetic data for active transportation research."""
    
    # Generate baseline covariates (X)
    ses = np.random.normal(0, 1, n_samples)  # Socioeconomic status
    distance = np.random.gamma(2, 2, n_samples)  # Distance to work/school in km
    age = np.random.normal(40, 10, n_samples)  # Age in years
    gender = np.random.binomial(1, 0.5, n_samples)  # 0: male, 1: female
    walkability = 0.3*ses + 0.2*np.random.normal(0, 1, n_samples)  # Neighborhood walkability
    
    # Combine covariates
    X = np.column_stack([ses, distance, age, gender, walkability])
    
    # Generate treatment assignment (PFIP exposure)
    propensity = 1 / (1 + np.exp(-(0.5 + 0.7*ses + 0.3*walkability - 0.2*distance)))
    D = np.random.binomial(1, propensity)
    
    # Generate mediators
    # Travel mode (1: active, 0: motorized)
    active_mode_prob = 1 / (1 + np.exp(-(- 1.5 + 0.1*ses - 0.5*distance + 
                                         0.8*walkability + 1.2*D)))
    active_mode = np.random.binomial(1, active_mode_prob)
    
    # Physical activity (standardized)
    physical_activity = 0.2*ses - 0.3*distance + 0.5*active_mode + 
                        0.4*D + 0.2*np.random.normal(0, 1, n_samples)
    
    # Perceived safety
    perceived_safety = 0.3*ses + 0.5*walkability + 0.7*D + 0.2*np.random.normal(0, 1, n_samples)
    
    # Combine mediators
    M = np.column_stack([active_mode, physical_activity, perceived_safety])
    
    # Generate well-being outcome
    Y = 0.2*ses + 0.1*age - 0.2*distance + 0.4*active_mode + 
        0.3*physical_activity + 0.2*perceived_safety + 
        0.5*D + 0.5*D*active_mode + 0.3*np.random.normal(0, 1, n_samples)
    
    # Standardize Y to have mean 0 and std 1
    Y = (Y - np.mean(Y)) / np.std(Y)
    
    # Create a DataFrame for easier manipulation
    df = pd.DataFrame({
        'ses': ses,
        'distance': distance,
        'age': age,
        'gender': gender,
        'walkability': walkability,
        'treatment': D,
        'active_mode': active_mode,
        'physical_activity': physical_activity, 
        'perceived_safety': perceived_safety,
        'well_being': Y
    })
    
    return df

# Generate data
df = generate_synthetic_data(n_samples=2000)

# Display the first few rows
print(f"Dataset shape: {df.shape}")
df.head()

Let's examine the distributions and relationships in our synthetic data.

In [None]:
# Basic data exploration
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()

# Distribution of well-being by treatment
sns.boxplot(x='treatment', y='well_being', data=df, ax=axes[0])
axes[0].set_title('Well-being by PFIP Treatment')
axes[0].set_xlabel('PFIP (1: Yes, 0: No)')

# Distribution of active mode by treatment
pd.crosstab(df.treatment, df.active_mode).plot(kind='bar', ax=axes[1])
axes[1].set_title('Active Mode by PFIP Treatment')
axes[1].set_xlabel('PFIP (1: Yes, 0: No)')
axes[1].set_ylabel('Count')

# Relationship between physical activity and well-being
sns.scatterplot(x='physical_activity', y='well_being', hue='treatment', data=df, ax=axes[2])
axes[2].set_title('Well-being vs Physical Activity')

# Effect of distance on active mode
sns.boxplot(x='active_mode', y='distance', data=df, ax=axes[3])
axes[3].set_title('Distance by Travel Mode')
axes[3].set_xlabel('Active Mode (1: Yes, 0: No)')
axes[3].set_ylabel('Distance (km)')

# Distribution of safety perception by treatment
sns.boxplot(x='treatment', y='perceived_safety', data=df, ax=axes[4])
axes[4].set_title('Safety Perception by PFIP Treatment')
axes[4].set_xlabel('PFIP (1: Yes, 0: No)')

# Relationship between SES and treatment
sns.boxplot(x='treatment', y='ses', data=df, ax=axes[5])
axes[5].set_title('SES by PFIP Treatment')
axes[5].set_xlabel('PFIP (1: Yes, 0: No)')

plt.tight_layout()
plt.show()

# Show correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

## 2. Basic CISD Implementation

Now let's implement the CISD framework to estimate the causal effect of PFIP on well-being, while controlling for specific scenarios of interest.

In [None]:
# Prepare data for CISD
X = df[['ses', 'distance', 'age', 'gender', 'walkability']].values
D = df['treatment'].values
Y = df['well_being'].values
M = df[['active_mode', 'physical_activity', 'perceived_safety']].values

# Train-test split
X_train, X_test, D_train, D_test, Y_train, Y_test, M_train, M_test = \
    train_test_split(X, D, Y, M, test_size=0.2, random_state=42)

### 2.1 Define a Scenario Selector

We'll define a few different scenario selectors to illustrate the flexibility of CISD.

In [None]:
# 1. Factual scenario (equivalent to standard ATE)
def factual_scenario_selector(X):
    """Return M as is (factual mediator values)."""
    return None  # None indicates factual scenario

# 2. Fixed scenario (everyone gets the same mediator values)
def fixed_scenario_selector(X):
    """Set all mediators to fixed values."""
    n_samples = X.shape[0]
    S = np.zeros((n_samples, 3))
    # [active_mode=1, physical_activity=mean, perceived_safety=mean]
    S[:, 0] = 1  # Everyone uses active mode
    S[:, 1] = 0  # Mean physical activity
    S[:, 2] = 0  # Mean perceived safety
    return S

# 3. Safety improvement scenario (increased safety perception)
def safety_improvement_selector(X):
    """Increase perceived safety by 1 standard deviation."""
    n_samples = X.shape[0]
    S = np.zeros((n_samples, 3))
    S[:, 0] = np.nan  # Keep factual active mode (will be determined by model)
    S[:, 1] = np.nan  # Keep factual physical activity
    S[:, 2] = 1.0     # Increase perceived safety by 1 std
    return S

# 4. Distance-adaptive scenario (different mediators based on commute distance)
def distance_adaptive_selector(X):
    """Set mediators based on commute distance."""
    n_samples = X.shape[0]
    S = np.zeros((n_samples, 3))
    
    # Short distance (< 3km): active mode
    short_dist = (X[:, 1] < 3)
    S[short_dist, 0] = 1  # Active mode
    
    # Medium distance (3-10km): based on SES
    medium_dist = (X[:, 1] >= 3) & (X[:, 1] < 10)
    high_ses = X[:, 0] > 0
    S[medium_dist & high_ses, 0] = 1  # Active mode for high SES
    S[medium_dist & ~high_ses, 0] = 0  # Motorized mode for low SES
    
    # Long distance (>= 10km): motorized mode
    long_dist = (X[:, 1] >= 10)
    S[long_dist, 0] = 0  # Motorized mode
    
    # Keep physical activity as factual
    S[:, 1] = np.nan
    
    # Set perceived safety to positive for all
    S[:, 2] = 0.5
    
    return S

### 2.2 Set Up and Run CISD

In [None]:
# Initialize models for CISD
propensity_model = LogisticRegression()
outcome_models = {
    '0': RandomForestRegressor(n_estimators=100, random_state=42),
    '1': RandomForestRegressor(n_estimators=100, random_state=42)
}
# Mediator model
mediator_model = RandomForestRegressor(n_estimators=100, random_state=42)

# Initialize CISD with different scenario selectors
cisd_factual = CISD(
    scenario_selector=factual_scenario_selector,
    propensity_model=propensity_model,
    outcome_model=outcome_models,
    mediator_model=mediator_model,
    random_state=42
)

cisd_fixed = CISD(
    scenario_selector=fixed_scenario_selector,
    propensity_model=propensity_model,
    outcome_model=outcome_models,
    mediator_model=mediator_model,
    random_state=42
)

cisd_safety = CISD(
    scenario_selector=safety_improvement_selector,
    propensity_model=propensity_model,
    outcome_model=outcome_models,
    mediator_model=mediator_model,
    random_state=42
)

cisd_distance = CISD(
    scenario_selector=distance_adaptive_selector,
    propensity_model=propensity_model,
    outcome_model=outcome_models,
    mediator_model=mediator_model,
    random_state=42
)

In [None]:
# Fit CISD models
cisd_factual.fit(X_train, D_train, Y_train, M_train)
cisd_fixed.fit(X_train, D_train, Y_train, M_train)
cisd_safety.fit(X_train, D_train, Y_train, M_train)
cisd_distance.fit(X_train, D_train, Y_train, M_train)

In [None]:
# Estimate effects on test data
factual_effect = cisd_factual.estimate(X_test, D_test, Y_test, M_test)
fixed_effect = cisd_fixed.estimate(X_test, D_test, Y_test, M_test)
safety_effect = cisd_safety.estimate(X_test, D_test, Y_test, M_test)
distance_effect = cisd_distance.estimate(X_test, D_test, Y_test, M_test)

# Display results
results = pd.DataFrame({
    'Scenario': ['Factual (ATE)', 'Fixed Active Mode', 'Safety Improvement', 'Distance-Adaptive'],
    'Effect Estimate': [
        factual_effect['estimate'],
        fixed_effect['estimate'],
        safety_effect['estimate'],
        distance_effect['estimate']
    ],
    'Lower CI': [
        factual_effect['conf_int_lower'],
        fixed_effect['conf_int_lower'],
        safety_effect['conf_int_lower'],
        distance_effect['conf_int_lower']
    ],
    'Upper CI': [
        factual_effect['conf_int_upper'],
        fixed_effect['conf_int_upper'],
        safety_effect['conf_int_upper'],
        distance_effect['conf_int_upper']
    ]
})

results

In [None]:
# Visualize results
plt.figure(figsize=(12, 8))
sns.barplot(x='Scenario', y='Effect Estimate', data=results, color='skyblue')
plt.errorbar(
    x=np.arange(len(results)),
    y=results['Effect Estimate'],
    yerr=[(results['Effect Estimate'] - results['Lower CI']), 
          (results['Upper CI'] - results['Effect Estimate'])],
    fmt='none', color='black', capsize=5
)
plt.axhline(y=0, color='red', linestyle='--')
plt.title('CISD Estimates for Different Scenarios')
plt.ylabel('Effect on Well-being (standardized)')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

### 2.3 Incremental Scenario Effect

Now we'll demonstrate how to estimate the incremental scenario effect (δ(s)) as defined in Equation (29) from the paper:

In [None]:
# Define base and new scenarios
base_scenario = factual_scenario_selector(X_test)
safety_scenario = safety_improvement_selector(X_test)

# Compute incremental scenario effect
incremental_effect = cisd_factual.incremental_scenario_effect(
    X_test, base_scenario, safety_scenario
)

print(f"Incremental effect of safety improvement: {incremental_effect['estimate']:.4f}")
print(f"95% CI: [{incremental_effect['conf_int_lower']:.4f}, {incremental_effect['conf_int_upper']:.4f}]")

## 3. Stochastic Scenarios for Policy Simulation

One of the key features of CISD is the ability to define stochastic scenarios where mediator values are drawn from a distribution rather than fixed. Let's implement this capability.

In [None]:
# First, let's create a simple model to predict mediators
class StochasticScenarioModel:
    def __init__(self):
        self.active_mode_model = LogisticRegression()
        self.physical_activity_model = LinearRegression()
        self.safety_model = LinearRegression()
    
    def fit(self, X, M):
        """Fit models to predict mediators from covariates."""
        # Active mode model
        self.active_mode_model.fit(X, M[:, 0])
        
        # Physical activity model
        self.physical_activity_model.fit(X, M[:, 1])
        
        # Safety perception model
        self.safety_model.fit(X, M[:, 2])
        
        return self
    
    def predict(self, X):
        """Predict mediator values."""
        n_samples = X.shape[0]
        M_pred = np.zeros((n_samples, 3))
        
        # Predict active mode probabilities
        active_probs = self.active_mode_model.predict_proba(X)[:, 1]
        M_pred[:, 0] = np.random.binomial(1, active_probs)
        
        # Predict physical activity
        pa_pred = self.physical_activity_model.predict(X)
        M_pred[:, 1] = pa_pred + np.random.normal(0, 0.5, n_samples)
        
        # Predict safety perception
        safety_pred = self.safety_model.predict(X)
        M_pred[:, 2] = safety_pred + np.random.normal(0, 0.5, n_samples)
        
        return M_pred

# Train the stochastic scenario model
stochastic_model = StochasticScenarioModel().fit(X_train, M_train)

# Define a stochastic scenario selector
def stochastic_scenario_selector(X):
    """Generate mediator values stochastically."""
    return stochastic_model.predict(X)

# Modified stochastic scenario with safety intervention
def stochastic_safety_intervention(X):
    """Generate mediator values stochastically with safety intervention."""
    M_pred = stochastic_model.predict(X)
    # Add 1 std to safety perception
    M_pred[:, 2] += 1.0
    return M_pred

In [None]:
# Initialize CISD with stochastic scenarios
cisd_stochastic = CISD(
    scenario_selector=stochastic_scenario_selector,
    propensity_model=propensity_model,
    outcome_model=outcome_models,
    mediator_model=mediator_model,
    random_state=42
)

cisd_stochastic_safety = CISD(
    scenario_selector=stochastic_safety_intervention,
    propensity_model=propensity_model,
    outcome_model=outcome_models,
    mediator_model=mediator_model,
    random_state=42
)

# Fit CISD models
cisd_stochastic.fit(X_train, D_train, Y_train, M_train)
cisd_stochastic_safety.fit(X_train, D_train, Y_train, M_train)

# Estimate effects
stochastic_effect = cisd_stochastic.estimate(X_test, D_test, Y_test, M_test)
stochastic_safety_effect = cisd_stochastic_safety.estimate(X_test, D_test, Y_test, M_test)

# Add results to our table
stochastic_results = pd.DataFrame({
    'Scenario': ['Stochastic Baseline', 'Stochastic with Safety Intervention'],
    'Effect Estimate': [
        stochastic_effect['estimate'],
        stochastic_safety_effect['estimate']
    ],
    'Lower CI': [
        stochastic_effect['conf_int_lower'],
        stochastic_safety_effect['conf_int_lower']
    ],
    'Upper CI': [
        stochastic_effect['conf_int_upper'],
        stochastic_safety_effect['conf_int_upper']
    ]
})

# Combine with previous results
all_results = pd.concat([results, stochastic_results], ignore_index=True)
all_results

In [None]:
# Visualize all results
plt.figure(figsize=(14, 8))
sns.barplot(x='Scenario', y='Effect Estimate', data=all_results, color='skyblue')
plt.errorbar(
    x=np.arange(len(all_results)),
    y=all_results['Effect Estimate'],
    yerr=[(all_results['Effect Estimate'] - all_results['Lower CI']), 
          (all_results['Upper CI'] - all_results['Effect Estimate'])],
    fmt='none', color='black', capsize=5
)
plt.axhline(y=0, color='red', linestyle='--')
plt.title('CISD Estimates for Different Scenarios Including Stochastic Scenarios')
plt.ylabel('Effect on Well-being (standardized)')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

## 4. AI-Augmented Causal Inference with Multimodal Data

Now let's demonstrate the three-layer architecture (Φ → Ψ → Γ) for causal inference with multimodal data, as described in Section 5 of the paper. In a real-world application, we would use actual images, text, GPS traces, etc., but for this tutorial, we'll simulate these modalities.

Note: The implementation below is simplified for demonstration purposes. In a real application, you would use proper deep learning models for the representation learning component.

In [None]:
# Import the three-layer architecture
from cisd.ai_pipeline import ThreeLayerArchitecture, ActiveBERTDML
from cisd.representation import StreetviewEncoder, TextEncoder, MultiModalEncoder
from cisd.balancing import KernelMMD
from cisd.causal import DoublyRobust

# Simulate multimodal data
def generate_multimodal_data(df, n_samples=None):
    """Generate simulated multimodal data based on the dataframe."""
    if n_samples is None:
        n_samples = len(df)
    else:
        n_samples = min(n_samples, len(df))
        
    # Simulated streetview images (224x224 RGB)
    # In reality, these would be loaded from files or an API
    images = np.random.rand(n_samples, 224, 224, 3)
    
    # Simulated textual comments
    # In reality, these would be actual text data
    sentences = [
        "Clean sidewalks and good lighting make walking feel safe.",
        "No bike lanes and lots of traffic, very stressful commute.",
        "Beautiful greenery along the path, really enjoy my walk.",
        "Crosswalks are well-marked, but drivers don't always stop.",
        "Too many cars and narrow sidewalks, don't feel comfortable walking.",
        "New infrastructure has made biking much safer in my neighborhood.",
        "Love the separated bike lanes that were recently added.",
        "Sidewalks are broken and uneven, difficult to navigate.",
        "Pedestrian signals give plenty of time to cross safely.",
        "Traffic calming measures have made walking much more pleasant."
    ]
    
    # Generate text based on treatment and active_mode
    texts = []
    for i in range(n_samples):
        if df.iloc[i]['treatment'] == 1 and df.iloc[i]['active_mode'] == 1:
            # PFIP + active mode: generally positive
            idx = np.random.choice([0, 2, 5, 6, 8, 9])
        elif df.iloc[i]['treatment'] == 1 and df.iloc[i]['active_mode'] == 0:
            # PFIP + motorized: mixed
            idx = np.random.choice([0, 3, 8, 9])
        elif df.iloc[i]['treatment'] == 0 and df.iloc[i]['active_mode'] == 1:
            # No PFIP + active mode: mixed
            idx = np.random.choice([2, 3, 4, 7])
        else:
            # No PFIP + motorized: generally negative
            idx = np.random.choice([1, 4, 7])
        texts.append(sentences[idx])
    
    return images, texts

# Generate multimodal data
images_train, texts_train = generate_multimodal_data(df.iloc[train_test_split(range(len(df)), 
                                                              test_size=0.2, 
                                                              random_state=42)[0]])
images_test, texts_test = generate_multimodal_data(df.iloc[train_test_split(range(len(df)), 
                                                            test_size=0.2, 
                                                            random_state=42)[1]])

# Show some example text data
for i in range(5):
    print(f"Example {i+1}: {texts_train[i]}")

In [None]:
# Initialize the ActiveBERTDML model
active_bert_dml = ActiveBERTDML(
    image_encoder=StreetviewEncoder(pretrained=True, embedding_dim=128),
    text_encoder=TextEncoder(embedding_dim=128, model_name='bert-base-uncased'),
    balancer=KernelMMD(kernel='rbf', n_iterations=100),  # Simplified for tutorial
    causal_learner=None,  # Use default
    fusion_method='concatenate',
    latent_dim=64
)

# Fit the model
print("Fitting the ActiveBERTDML model...")
active_bert_dml.fit(
    images=images_train,
    texts=texts_train,
    D=D_train,
    Y=Y_train
)

In [None]:
# Estimate the effect
print("Estimating causal effects...")
ai_effect = active_bert_dml.estimate(
    images=images_test,
    texts=texts_test,
    D=D_test,
    Y=Y_test
)

# Display results
print(f"\nAI-augmented causal effect estimate: {ai_effect['ate']:.4f}")
print(f"Standard error: {ai_effect['std_err']:.4f}")
print(f"95% CI: [{ai_effect['conf_int'][0]:.4f}, {ai_effect['conf_int'][1]:.4f}]")

## 5. Policy Simulations and Recommendations

Let's use our CISD framework to simulate different policy scenarios and make recommendations for active transportation interventions.

In [None]:
# Define different policy scenarios

# 1. Base PFIP
def base_pfip_scenario(X):
    return None  # Factual scenario

# 2. PFIP + Safety improvements
def pfip_safety_scenario(X):
    n_samples = X.shape[0]
    S = np.zeros((n_samples, 3))
    S[:, 0] = np.nan  # Keep factual active mode
    S[:, 1] = np.nan  # Keep factual physical activity
    S[:, 2] = 1.5     # Major safety improvement
    return S

# 3. PFIP + Distance reduction (e.g., through mixed land use)
def pfip_distance_scenario(X):
    # Simulate reduced distance (by 30%)
    X_mod = X.copy()
    X_mod[:, 1] = X[:, 1] * 0.7  # Reduce distance
    n_samples = X.shape[0]
    S = np.zeros((n_samples, 3))
    S[:, 0] = np.nan  # Will be determined by model
    S[:, 1] = np.nan  # Will be determined by model
    S[:, 2] = np.nan  # Will be determined by model
    return S, X_mod  # Return modified features as well

# 4. PFIP + Combined intervention (safety + distance)
def pfip_combined_scenario(X):
    # Simulate reduced distance (by 30%)
    X_mod = X.copy()
    X_mod[:, 1] = X[:, 1] * 0.7  # Reduce distance
    n_samples = X.shape[0]
    S = np.zeros((n_samples, 3))
    S[:, 0] = np.nan  # Will be determined by model
    S[:, 1] = np.nan  # Will be determined by model
    S[:, 2] = 1.5     # Major safety improvement
    return S, X_mod  # Return modified features as well

In [None]:
# For simplicity, let's use our existing CISD model and just modify the scenario selector
cisd_model = cisd_factual  # Using the model we already trained

# Get base scenario effect
base_effect = cisd_model.estimate(X_test, D_test, Y_test, M_test)

# Get safety scenario effect (only changing the mediators)
safety_scenario = pfip_safety_scenario(X_test)
safety_effect = cisd_model.estimate(X_test, D_test, Y_test, safety_scenario)

# Get distance scenario effect (changing both mediators and features)
distance_scenario, X_distance = pfip_distance_scenario(X_test)
distance_effect = cisd_model.estimate(X_distance, D_test, Y_test, distance_scenario)

# Get combined scenario effect
combined_scenario, X_combined = pfip_combined_scenario(X_test)
combined_effect = cisd_model.estimate(X_combined, D_test, Y_test, combined_scenario)

# Create a policy summary table
policy_results = pd.DataFrame({
    'Policy Scenario': [
        'Base PFIP Only',
        'PFIP + Safety Improvements',
        'PFIP + Distance Reduction',
        'PFIP + Combined Intervention'
    ],
    'Well-being Effect': [
        base_effect['estimate'],
        safety_effect['estimate'],
        distance_effect['estimate'],
        combined_effect['estimate']
    ],
    '95% CI': [
        f"[{base_effect['conf_int_lower']:.4f}, {base_effect['conf_int_upper']:.4f}]",
        f"[{safety_effect['conf_int_lower']:.4f}, {safety_effect['conf_int_upper']:.4f}]",
        f"[{distance_effect['conf_int_lower']:.4f}, {distance_effect['conf_int_upper']:.4f}]",
        f"[{combined_effect['conf_int_lower']:.4f}, {combined_effect['conf_int_upper']:.4f}]"
    ],
    'Implementation Complexity': [
        'Low',
        'Medium',
        'High',
        'Very High'
    ],
    'Cost Estimate': [
        '$',
        '$$',
        '$$$',
        '$$$$'
    ]
})

policy_results

In [None]:
# Visualize policy scenarios
plt.figure(figsize=(12, 8))
bar_colors = ['skyblue', 'lightgreen', 'salmon', 'mediumpurple']
bars = plt.bar(policy_results['Policy Scenario'], policy_results['Well-being Effect'], color=bar_colors)

# Add confidence intervals
ci_lower = [base_effect['conf_int_lower'], safety_effect['conf_int_lower'], 
            distance_effect['conf_int_lower'], combined_effect['conf_int_lower']]
ci_upper = [base_effect['conf_int_upper'], safety_effect['conf_int_upper'], 
            distance_effect['conf_int_upper'], combined_effect['conf_int_upper']]

plt.errorbar(
    x=np.arange(len(policy_results)),
    y=policy_results['Well-being Effect'],
    yerr=[(policy_results['Well-being Effect'] - ci_lower), 
          (np.array(ci_upper) - policy_results['Well-being Effect'])],
    fmt='none', color='black', capsize=5
)

# Add labels showing cost and complexity
for i, bar in enumerate(bars):
    plt.text(
        bar.get_x() + bar.get_width()/2,
        bar.get_height() + 0.03,
        f"{policy_results['Cost Estimate'].iloc[i]} | {policy_results['Implementation Complexity'].iloc[i]}",
        ha='center', va='bottom', fontweight='bold'
    )

plt.axhline(y=0, color='red', linestyle='--')
plt.title('Policy Scenario Comparison: Well-being Effects')
plt.ylabel('Effect on Well-being (standardized)')
plt.xticks(rotation=15, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

# Output recommendation
best_policy_idx = policy_results['Well-being Effect'].argmax()
best_policy = policy_results['Policy Scenario'].iloc[best_policy_idx]
best_effect = policy_results['Well-being Effect'].iloc[best_policy_idx]

print(f"\n🏆 Recommended Policy: {best_policy}")
print(f"   Estimated Well-being Effect: {best_effect:.4f}")
print(f"   95% CI: {policy_results['95% CI'].iloc[best_policy_idx]}")
print(f"   Implementation Complexity: {policy_results['Implementation Complexity'].iloc[best_policy_idx]}")
print(f"   Cost Estimate: {policy_results['Cost Estimate'].iloc[best_policy_idx]}")

# Calculate efficiency (effect per cost unit)
cost_mapping = {'$': 1, '$$': 2, '$$$': 3, '$$$$': 4}
policy_results['Cost Number'] = policy_results['Cost Estimate'].map(cost_mapping)
policy_results['Efficiency'] = policy_results['Well-being Effect'] / policy_results['Cost Number']

most_efficient_idx = policy_results['Efficiency'].argmax()
most_efficient_policy = policy_results['Policy Scenario'].iloc[most_efficient_idx]

print(f"\n💰 Most Cost-Efficient Policy: {most_efficient_policy}")
print(f"   Efficiency Ratio: {policy_results['Efficiency'].iloc[most_efficient_idx]:.4f} effect per cost unit")

## 6. Conclusion and Further Work

In this tutorial, we've demonstrated the implementation of the Causal-Intervention Scenario Design (CISD) framework and AI-augmented causal inference approach for active transportation research. We've covered:

1. The core CISD framework for estimating treatment effects conditional on specific scenarios
2. Different types of scenario selectors (fixed, stochastic, adaptive)
3. Incremental scenario effects for measuring marginal policy benefits
4. AI-augmented causal inference with multimodal data using the three-layer architecture
5. Policy simulation and recommendation based on well-being effects

This implementation demonstrates the potential of CISD for evidence-based transportation policy design, enabling researchers to answer complex questions about the causal pathways linking infrastructure interventions to well-being outcomes.

Further work could include:
- Extending the framework to handle spatial dependencies in transportation networks
- Incorporating longitudinal data to capture dynamic effects over time
- Scaling the AI pipeline to handle real-world image and GPS data
- Developing causal discovery methods to learn the underlying DAG structure from data
- Implementing sensitivity analyses to test the robustness of findings to unmeasured confounding

The CISD framework provides a unifying mathematical language for causal narratives in transportation research while remaining compatible with advanced machine learning estimators for efficient causal effect estimation.