# Day 3: ML Predictions to ABM Inputs

**WISE Workshop | Addis Ababa, Feb 2026**

This notebook bridges machine learning and agent-based modeling. You'll learn how to:
- Generate demand forecasts from your trained ML model
- Add uncertainty bounds to predictions
- Format outputs for simulation inputs
- See a simple ABM demonstration

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sysylvia/ethiopia-ds-workshop-2026/blob/main/notebooks/04-ml-to-abm.ipynb)

## Setup

In [None]:
# Import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)
print("Packages loaded!")

## Part 1: Recap - Train the ML Model

First, we'll quickly retrain our best model from Day 2.

In [None]:
# Load data
url = "https://raw.githubusercontent.com/sysylvia/ethiopia-ds-workshop-2026/main/data/supply-chain-sample.csv"
df = pd.read_csv(url)

print(f"Loaded {len(df)} records from {df['facility_id'].nunique()} facilities")
print(f"Regions: {df['region'].unique().tolist()}")

In [None]:
# Prepare features
le_region = LabelEncoder()
le_facility = LabelEncoder()
le_season = LabelEncoder()

df['region_encoded'] = le_region.fit_transform(df['region'])
df['facility_encoded'] = le_facility.fit_transform(df['facility_type'])
df['season_encoded'] = le_season.fit_transform(df['season'])

feature_cols = [
    'population_served', 'month', 'previous_demand', 
    'distance_to_warehouse', 'stockout_last_month', 
    'avg_delivery_days', 'storage_capacity',
    'region_encoded', 'facility_encoded', 'season_encoded'
]

X = df[feature_cols]
y = df['actual_demand']

In [None]:
# Train model with best hyperparameters from Day 2
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestRegressor(
    n_estimators=100,
    max_depth=10,
    min_samples_leaf=2,
    random_state=42,
    n_jobs=-1
)
model.fit(X_train, y_train)

# Evaluate
test_pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, test_pred))
print(f"Model trained! Test RMSE: {rmse:.2f}")

## Part 2: Generate Predictions for All Facilities

The ABM needs demand forecasts for every facility to drive agent behaviors.

In [None]:
# Generate predictions for the full dataset
df['predicted_demand'] = model.predict(X)

# Compare predictions to actuals
print("Predictions generated!")
print(f"\nCorrelation (predicted vs actual): {df['predicted_demand'].corr(df['actual_demand']):.3f}")

In [None]:
# View sample predictions
sample = df[['facility_name', 'region', 'month', 'actual_demand', 'predicted_demand']].head(15)
sample['error'] = sample['predicted_demand'] - sample['actual_demand']
display(sample)

## Part 3: Add Uncertainty to Predictions

**Key insight:** ABMs benefit from uncertainty estimates, not just point predictions.

With Random Forest, we can estimate uncertainty using the variance across trees.

In [None]:
# Get predictions from each tree in the forest
all_tree_predictions = np.array([tree.predict(X) for tree in model.estimators_])

# Calculate statistics across trees
df['pred_mean'] = all_tree_predictions.mean(axis=0)
df['pred_std'] = all_tree_predictions.std(axis=0)
df['pred_lower'] = df['pred_mean'] - 1.96 * df['pred_std']  # 95% CI lower
df['pred_upper'] = df['pred_mean'] + 1.96 * df['pred_std']  # 95% CI upper

# Ensure non-negative predictions
df['pred_lower'] = df['pred_lower'].clip(lower=0)

print("Uncertainty estimates added!")
print(f"Average prediction uncertainty (std): {df['pred_std'].mean():.2f}")

In [None]:
# Visualize predictions with uncertainty for one facility
facility = df[df['facility_name'] == 'Yekatit 12 Hospital'].copy()

fig, ax = plt.subplots(figsize=(12, 5))

months = facility['month']
ax.plot(months, facility['actual_demand'], 'ko-', label='Actual', markersize=8)
ax.plot(months, facility['pred_mean'], 'b-', label='Predicted', linewidth=2)
ax.fill_between(months, facility['pred_lower'], facility['pred_upper'], 
                alpha=0.3, color='blue', label='95% CI')

ax.set_xlabel('Month')
ax.set_ylabel('Demand (units)')
ax.set_title('Yekatit 12 Hospital: Demand Predictions with Uncertainty')
ax.legend()
ax.set_xticks(range(1, 13))
plt.tight_layout()
plt.show()

## Part 4: Format Predictions for ABM Input

Agent-based models typically need predictions in a specific format:
- One row per facility-time period
- Point estimate + uncertainty bounds
- Additional context (facility type, region, etc.)

In [None]:
# Create ABM input file
abm_input = df[[
    'facility_id', 'facility_name', 'facility_type', 'region',
    'month', 'year',
    'pred_mean', 'pred_lower', 'pred_upper', 'pred_std',
    'distance_to_warehouse', 'storage_capacity', 'avg_delivery_days'
]].copy()

# Rename for clarity
abm_input = abm_input.rename(columns={
    'pred_mean': 'expected_demand',
    'pred_lower': 'demand_95ci_lower',
    'pred_upper': 'demand_95ci_upper',
    'pred_std': 'demand_uncertainty'
})

print("ABM Input Format:")
print(f"Shape: {abm_input.shape}")
display(abm_input.head(10))

In [None]:
# Save ABM input file
abm_input.to_csv('abm_demand_forecasts.csv', index=False)
print("Saved: abm_demand_forecasts.csv")

## Part 5: Aggregate Forecasts by Region

ABMs often need aggregate views for regional planning.

In [None]:
# Regional demand summary
regional_demand = df.groupby(['region', 'month']).agg({
    'actual_demand': 'sum',
    'pred_mean': 'sum',
    'pred_std': lambda x: np.sqrt((x**2).sum())  # Combine uncertainties
}).reset_index()

regional_demand.columns = ['region', 'month', 'actual_total', 'predicted_total', 'uncertainty']

print("Regional Demand Summary:")
display(regional_demand.head(15))

In [None]:
# Visualize regional patterns
fig, ax = plt.subplots(figsize=(12, 6))

for region in df['region'].unique():
    region_data = regional_demand[regional_demand['region'] == region]
    ax.plot(region_data['month'], region_data['predicted_total'], 
            marker='o', label=region)

ax.set_xlabel('Month')
ax.set_ylabel('Total Predicted Demand')
ax.set_title('Predicted Regional Demand by Month')
ax.legend(title='Region')
ax.set_xticks(range(1, 13))
plt.tight_layout()
plt.show()

## Part 6: Simple ABM Demonstration

Here's a simple agent-based model that shows how ML predictions drive agent behavior.

**Scenario:** Facilities order supplies from a regional warehouse based on predicted demand.

In [None]:
class Facility:
    """A health facility agent that orders supplies based on predicted demand."""
    
    def __init__(self, facility_id, name, storage_capacity, initial_stock=None):
        self.id = facility_id
        self.name = name
        self.storage_capacity = storage_capacity
        self.stock = initial_stock if initial_stock else storage_capacity * 0.5
        self.stockout_days = 0
        self.total_ordered = 0
        
    def decide_order(self, predicted_demand, uncertainty):
        """Decide how much to order based on ML predictions."""
        # Order enough to meet predicted demand + safety buffer
        safety_buffer = 1.5 * uncertainty  # Conservative: order extra based on uncertainty
        target_stock = predicted_demand + safety_buffer
        
        # Don't exceed storage capacity
        target_stock = min(target_stock, self.storage_capacity)
        
        # Order if current stock is below target
        order_qty = max(0, target_stock - self.stock)
        return order_qty
    
    def receive_delivery(self, qty):
        """Receive a delivery from the warehouse."""
        self.stock += qty
        self.stock = min(self.stock, self.storage_capacity)  # Can't exceed capacity
        
    def serve_demand(self, actual_demand):
        """Serve patient demand, track stockouts."""
        served = min(self.stock, actual_demand)
        unmet = actual_demand - served
        self.stock -= served
        
        if unmet > 0:
            self.stockout_days += 1
            
        return served, unmet


class Warehouse:
    """A regional warehouse that fulfills facility orders."""
    
    def __init__(self, capacity=10000, initial_stock=8000):
        self.capacity = capacity
        self.stock = initial_stock
        self.unfulfilled_orders = 0
        
    def process_order(self, facility, qty):
        """Process an order from a facility."""
        fulfilled = min(self.stock, qty)
        self.stock -= fulfilled
        self.unfulfilled_orders += (qty - fulfilled)
        facility.total_ordered += qty
        facility.receive_delivery(fulfilled)
        return fulfilled
    
    def restock(self, qty):
        """Receive stock from central supply."""
        self.stock = min(self.stock + qty, self.capacity)


print("ABM classes defined!")

In [None]:
# Run a simple simulation for one region
def run_simulation(region_name, months=12):
    """Run ABM simulation for facilities in a region."""
    
    # Get facilities in this region
    region_data = df[df['region'] == region_name].copy()
    facility_ids = region_data['facility_id'].unique()
    
    # Create facility agents
    facilities = {}
    for fid in facility_ids:
        fac_data = region_data[region_data['facility_id'] == fid].iloc[0]
        facilities[fid] = Facility(
            fid, 
            fac_data['facility_name'],
            fac_data['storage_capacity']
        )
    
    # Create warehouse
    warehouse = Warehouse()
    
    # Track metrics
    results = []
    
    # Simulate each month
    for month in range(1, months + 1):
        month_data = region_data[region_data['month'] == month]
        
        total_demand = 0
        total_served = 0
        total_stockouts = 0
        
        for fid, facility in facilities.items():
            fac_month = month_data[month_data['facility_id'] == fid]
            if len(fac_month) == 0:
                continue
                
            fac_month = fac_month.iloc[0]
            
            # 1. Facility decides order based on ML prediction
            order_qty = facility.decide_order(
                fac_month['pred_mean'], 
                fac_month['pred_std']
            )
            
            # 2. Warehouse processes order
            warehouse.process_order(facility, order_qty)
            
            # 3. Facility serves actual demand
            actual = fac_month['actual_demand']
            served, unmet = facility.serve_demand(actual)
            
            total_demand += actual
            total_served += served
            if unmet > 0:
                total_stockouts += 1
        
        # Monthly warehouse restock (simplified)
        warehouse.restock(int(total_demand * 1.1))
        
        results.append({
            'month': month,
            'total_demand': total_demand,
            'total_served': total_served,
            'service_level': total_served / total_demand if total_demand > 0 else 1,
            'facilities_with_stockout': total_stockouts,
            'warehouse_stock': warehouse.stock
        })
    
    return pd.DataFrame(results), facilities, warehouse

# Run simulation for Addis Ababa
results, facilities, warehouse = run_simulation('Addis Ababa')

print("Simulation complete!")
display(results)

In [None]:
# Visualize simulation results
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Service level over time
axes[0, 0].plot(results['month'], results['service_level'] * 100, 'g-o')
axes[0, 0].axhline(y=95, color='r', linestyle='--', label='95% target')
axes[0, 0].set_xlabel('Month')
axes[0, 0].set_ylabel('Service Level (%)')
axes[0, 0].set_title('Service Level Over Time')
axes[0, 0].legend()
axes[0, 0].set_ylim([80, 102])

# Demand vs served
axes[0, 1].bar(results['month'] - 0.2, results['total_demand'], 0.4, label='Demand', alpha=0.7)
axes[0, 1].bar(results['month'] + 0.2, results['total_served'], 0.4, label='Served', alpha=0.7)
axes[0, 1].set_xlabel('Month')
axes[0, 1].set_ylabel('Units')
axes[0, 1].set_title('Demand vs Served')
axes[0, 1].legend()

# Warehouse stock
axes[1, 0].plot(results['month'], results['warehouse_stock'], 'b-o')
axes[1, 0].axhline(y=2000, color='r', linestyle='--', label='Safety stock')
axes[1, 0].set_xlabel('Month')
axes[1, 0].set_ylabel('Units')
axes[1, 0].set_title('Warehouse Stock Level')
axes[1, 0].legend()

# Stockouts
axes[1, 1].bar(results['month'], results['facilities_with_stockout'], color='red', alpha=0.7)
axes[1, 1].set_xlabel('Month')
axes[1, 1].set_ylabel('Number of Facilities')
axes[1, 1].set_title('Facilities with Stockouts')

plt.tight_layout()
plt.show()

In [None]:
# Summary statistics
print("=" * 50)
print("SIMULATION SUMMARY: Addis Ababa Region")
print("=" * 50)
print(f"\nAverage service level: {results['service_level'].mean()*100:.1f}%")
print(f"Months with >95% service: {(results['service_level'] >= 0.95).sum()}/12")
print(f"Total stockout events: {results['facilities_with_stockout'].sum()}")
print(f"Final warehouse stock: {results['warehouse_stock'].iloc[-1]:.0f} units")

## Part 7: Compare Scenarios

ABMs let us test "what-if" scenarios. Let's compare ordering strategies.

In [None]:
class ConservativeFacility(Facility):
    """Orders more aggressively based on upper confidence bound."""
    
    def decide_order(self, predicted_demand, uncertainty):
        # Use upper bound of prediction (more conservative)
        safety_buffer = 2.5 * uncertainty  # Larger safety margin
        target_stock = predicted_demand + safety_buffer
        target_stock = min(target_stock, self.storage_capacity)
        order_qty = max(0, target_stock - self.stock)
        return order_qty


class AggressiveFacility(Facility):
    """Orders based on point prediction only (no safety buffer)."""
    
    def decide_order(self, predicted_demand, uncertainty):
        # Just use point prediction, ignore uncertainty
        target_stock = predicted_demand
        target_stock = min(target_stock, self.storage_capacity)
        order_qty = max(0, target_stock - self.stock)
        return order_qty


def run_scenario_comparison(region_name):
    """Compare different ordering strategies."""
    
    scenarios = {
        'Balanced (1.5x uncertainty)': Facility,
        'Conservative (2.5x uncertainty)': ConservativeFacility,
        'Aggressive (no buffer)': AggressiveFacility
    }
    
    all_results = {}
    
    for scenario_name, FacilityClass in scenarios.items():
        # Temporarily patch the Facility class
        region_data = df[df['region'] == region_name].copy()
        facility_ids = region_data['facility_id'].unique()
        
        facilities = {}
        for fid in facility_ids:
            fac_data = region_data[region_data['facility_id'] == fid].iloc[0]
            facilities[fid] = FacilityClass(
                fid, fac_data['facility_name'], fac_data['storage_capacity']
            )
        
        warehouse = Warehouse()
        results = []
        
        for month in range(1, 13):
            month_data = region_data[region_data['month'] == month]
            total_demand, total_served = 0, 0
            
            for fid, facility in facilities.items():
                fac_month = month_data[month_data['facility_id'] == fid]
                if len(fac_month) == 0: continue
                fac_month = fac_month.iloc[0]
                
                order_qty = facility.decide_order(fac_month['pred_mean'], fac_month['pred_std'])
                warehouse.process_order(facility, order_qty)
                served, _ = facility.serve_demand(fac_month['actual_demand'])
                
                total_demand += fac_month['actual_demand']
                total_served += served
            
            warehouse.restock(int(total_demand * 1.1))
            results.append({
                'month': month,
                'service_level': total_served / total_demand if total_demand > 0 else 1
            })
        
        all_results[scenario_name] = pd.DataFrame(results)
    
    return all_results


# Compare scenarios
scenario_results = run_scenario_comparison('Addis Ababa')

# Plot comparison
fig, ax = plt.subplots(figsize=(12, 5))

for scenario_name, results in scenario_results.items():
    ax.plot(results['month'], results['service_level'] * 100, 'o-', label=scenario_name)

ax.axhline(y=95, color='gray', linestyle='--', alpha=0.5, label='95% target')
ax.set_xlabel('Month')
ax.set_ylabel('Service Level (%)')
ax.set_title('Service Level by Ordering Strategy')
ax.legend()
ax.set_ylim([80, 102])
plt.tight_layout()
plt.show()

# Summary table
summary = pd.DataFrame({
    'Strategy': list(scenario_results.keys()),
    'Avg Service Level': [r['service_level'].mean() * 100 for r in scenario_results.values()],
    'Min Service Level': [r['service_level'].min() * 100 for r in scenario_results.values()],
    'Months >= 95%': [(r['service_level'] >= 0.95).sum() for r in scenario_results.values()]
})

print("\nStrategy Comparison:")
display(summary)

## Summary

In this notebook, you learned how to:

1. **Generate predictions** from trained ML models for all facilities
2. **Add uncertainty bounds** using Random Forest tree variance
3. **Format outputs** for ABM input files
4. **Build simple agents** that use ML predictions for decision-making
5. **Compare scenarios** to evaluate different strategies

### The ML → ABM Integration Pattern

```
  ML MODEL                      ABM
  ────────                      ───
  Historical data ──▶ Train    
  Features ──▶ Predict demand   
  Uncertainty ──▶ Confidence    
                   ↓
            Predictions file
                   ↓
              Agent decisions ──▶ Order quantities
              System dynamics ──▶ Stock levels
              Emergent outcomes ──▶ Service levels
```

**Next:** Day 4 covers full ABM development with Dr. Ozawa

---

## Exercise (Optional)

1. Run the simulation for a different region (e.g., 'Tigray' or 'Oromia')
2. Modify the `ConservativeFacility` class to use a different safety buffer
3. What happens if warehouse restock is delayed (e.g., only every 2 months)?

In [None]:
# Your code here
