# Optimization Workflow

Learn how to select optimal interventions under budget constraints.

This notebook demonstrates:
1. Setting up a portfolio and deterioration model
2. Configuring and running the optimizer
3. Examining intervention selections
4. Understanding the greedy algorithm
5. Comparing different budget scenarios

## Setup

In [1]:
# Core imports
import os
from datetime import date, timedelta

import numpy as np
import pandas as pd

# SDK imports
from asset_optimization import (
    WeibullModel,
    Optimizer,
    Simulator,
    SimulationConfig,
)

## 1. Create Portfolio Data and Model

First, we create a synthetic portfolio of water pipes with different materials and ages.

In [2]:
# Generate synthetic portfolio
np.random.seed(42)

n_assets = 500
materials = ["Cast Iron", "PVC", "Ductile Iron"]

base_date = date(2024, 1, 1)
install_dates = [
    base_date - timedelta(days=int(np.random.uniform(20 * 365, 80 * 365)))
    for _ in range(n_assets)
]

data = pd.DataFrame(
    {
        "asset_id": [f"PIPE-{i:04d}" for i in range(n_assets)],
        "install_date": pd.to_datetime(install_dates),
        "asset_type": "pipe",
        "material": np.random.choice(materials, n_assets, p=[0.4, 0.35, 0.25]),
        "diameter_mm": np.random.choice([150, 200, 300, 400], n_assets),
        "length_m": np.random.uniform(50, 500, n_assets).round(0),
    }
)

portfolio = data
print(portfolio.head())

    asset_id install_date asset_type      material  diameter_mm  length_m
0  PIPE-0000   1981-07-23       pipe           PVC          300     408.0
1  PIPE-0001   1947-01-05       pipe           PVC          400     267.0
2  PIPE-0002   1960-02-16       pipe     Cast Iron          200     103.0
3  PIPE-0003   1968-02-14       pipe  Ductile Iron          400     106.0
4  PIPE-0004   1994-08-30       pipe           PVC          400     359.0


In [3]:
# Configure deterioration model
params = {
    "Cast Iron": (3.0, 60),
    "PVC": (2.5, 80),
    "Ductile Iron": (2.8, 70),
}

model = WeibullModel(params)
print(model)

WeibullModel(types=['Cast Iron', 'PVC', 'Ductile Iron'], type_column='material', age_column='age')


## 2. Configure Optimizer

The **Optimizer** uses a two-stage greedy algorithm to select interventions:

1. **Stage 1**: For each asset, find the best intervention (highest cost-effectiveness)
2. **Stage 2**: Rank all candidates by risk-to-cost ratio and greedily fill the budget

Parameters:
- **strategy**: 'greedy' (default) or 'milp' (planned)
- **min_risk_threshold**: Only consider assets above this failure probability

In [4]:
# Create optimizer with risk threshold
optimizer = Optimizer(
    strategy="greedy",
    min_risk_threshold=0.1,  # Only consider assets with >10% failure risk
)

print(optimizer)

Optimizer(strategy='greedy', min_risk_threshold=0.1, fitted=False)


## 3. Run Optimization

The `fit()` method follows the scikit-learn pattern, returning self with a `result_` attribute.

In [5]:
# Run optimization with $500,000 budget
budget = 500_000

optimizer.fit(portfolio, model, budget=budget)

# Access result via .result property
result = optimizer.result
print(result)

OptimizationResult(strategy='greedy', selected=100 assets, spent=$500,000, utilization=100.0%)


## 4. Examine Selections

The result contains:
- **selections**: DataFrame of selected interventions
- **budget_summary**: Budget utilization statistics

In [6]:
# Budget summary
print("Budget Summary:")
print(f"  Total budget: ${budget:,.0f}")
print(f"  Total spent: ${result.total_spent:,.0f}")
print(f"  Utilization: {result.utilization_pct:.1f}%")
print(f"\nSelected {len(result.selections)} interventions")

Budget Summary:
  Total budget: $500,000
  Total spent: $500,000
  Utilization: 100.0%

Selected 100 interventions


In [7]:
# View top selections (highest priority first)
print("Top 15 Selected Interventions:")
result.selections.head(15)

Top 15 Selected Interventions:


Unnamed: 0,asset_id,intervention_type,cost,risk_score,risk_before,risk_after,risk_reduction,rank
0,PIPE-0393,Repair,5000.0,0.776172,0.776172,0.696688,0.079484,1
1,PIPE-0076,Repair,5000.0,0.771616,0.771616,0.69139,0.080226,2
2,PIPE-0157,Repair,5000.0,0.762974,0.762974,0.681393,0.081582,3
3,PIPE-0092,Repair,5000.0,0.762229,0.762229,0.680533,0.081696,4
4,PIPE-0107,Repair,5000.0,0.757438,0.757438,0.675022,0.082416,5
5,PIPE-0399,Repair,5000.0,0.756516,0.756516,0.673963,0.082553,6
6,PIPE-0419,Repair,5000.0,0.754539,0.754539,0.671697,0.082843,7
7,PIPE-0398,Repair,5000.0,0.753147,0.753147,0.670102,0.083045,8
8,PIPE-0388,Repair,5000.0,0.752893,0.752893,0.669811,0.083082,9
9,PIPE-0462,Repair,5000.0,0.748645,0.748645,0.664957,0.083688,10


In [8]:
# Intervention type breakdown
type_counts = result.selections["intervention_type"].value_counts()
print("Interventions by Type:")
for itype, count in type_counts.items():
    total_cost = result.selections[result.selections["intervention_type"] == itype][
        "cost"
    ].sum()
    print(f"  {itype}: {count} assets (${total_cost:,.0f})")

Interventions by Type:
  Repair: 100 assets ($500,000)


## 5. Understand the Algorithm

The greedy algorithm prioritizes assets based on their **risk-to-cost ratio**:

```
priority = risk_score / intervention_cost
```

This means:
- High-risk assets with low intervention costs are selected first
- Assets just above the risk threshold may not be selected if budget is limited

In [9]:
# Look at the risk distribution of selected assets
selections = result.selections

print("Risk Score Distribution of Selected Assets:")
print(f"  Min risk: {selections['risk_score'].min():.3f}")
print(f"  Max risk: {selections['risk_score'].max():.3f}")
print(f"  Mean risk: {selections['risk_score'].mean():.3f}")
print(f"  Median risk: {selections['risk_score'].median():.3f}")

Risk Score Distribution of Selected Assets:
  Min risk: 0.512
  Max risk: 0.776
  Mean risk: 0.619
  Median risk: 0.602


In [10]:
# Show why certain assets were selected
# Add age information for context
portfolio_with_age = portfolio.copy()
portfolio_with_age["age"] = (
    pd.Timestamp.now() - portfolio_with_age["install_date"]
).dt.days / 365.25

# Join selections with portfolio data
analysis = selections.merge(
    portfolio_with_age[["asset_id", "material", "age"]], on="asset_id", how="left"
)

print("Selection Analysis (first 10):")
analysis[
    ["rank", "asset_id", "material", "age", "risk_score", "intervention_type", "cost"]
].head(10)

Selection Analysis (first 10):


Unnamed: 0,rank,asset_id,material,age,risk_score,intervention_type,cost
0,1,PIPE-0393,Cast Iron,68.635181,0.776172,Repair,5000.0
1,2,PIPE-0076,Cast Iron,68.325804,0.771616,Repair,5000.0
2,3,PIPE-0157,Cast Iron,67.748118,0.762974,Repair,5000.0
3,4,PIPE-0092,Cast Iron,67.698836,0.762229,Repair,5000.0
4,5,PIPE-0107,Cast Iron,67.383984,0.757438,Repair,5000.0
5,6,PIPE-0399,Cast Iron,67.323751,0.756516,Repair,5000.0
6,7,PIPE-0419,Cast Iron,67.195072,0.754539,Repair,5000.0
7,8,PIPE-0398,Cast Iron,67.104723,0.753147,Repair,5000.0
8,9,PIPE-0388,Cast Iron,67.088296,0.752893,Repair,5000.0
9,10,PIPE-0462,Cast Iron,66.814511,0.748645,Repair,5000.0


## 6. Compare Scenarios

What if we had different budget levels? Let's compare:
- **Low budget**: $250,000
- **Medium budget**: $500,000 (current)
- **High budget**: $1,000,000

In [11]:
# Run optimization at different budget levels
budgets = {
    "low": 250_000,
    "medium": 500_000,
    "high": 1_000_000,
}

results = {}
for name, budget_amount in budgets.items():
    opt = Optimizer(strategy="greedy", min_risk_threshold=0.1)
    opt.fit(portfolio, model, budget=budget_amount)
    results[name] = opt.result

# Compare results
print("Budget Comparison:")
print("-" * 60)
print(f"{'Scenario':<10} {'Budget':>12} {'Spent':>12} {'Assets':>8} {'Util%':>8}")
print("-" * 60)
for name, res in results.items():
    budget_amount = budgets[name]
    print(
        f"{name:<10} ${budget_amount:>10,} ${res.total_spent:>10,.0f} {len(res.selections):>8} {res.utilization_pct:>7.1f}%"
    )

Budget Comparison:
------------------------------------------------------------
Scenario         Budget        Spent   Assets    Util%
------------------------------------------------------------
low        $   250,000 $   250,000       50   100.0%
medium     $   500,000 $   500,000      100   100.0%
high       $ 1,000,000 $ 1,000,000      200   100.0%


In [12]:
# Run simulations for each budget scenario
# to see impact on costs and failures

config = SimulationConfig(
    n_years=10,
    start_year=2024,
    random_seed=42,
    failure_response="replace",
)

sim = Simulator(model, config)

# Run baseline simulation (no optimization context, just for comparison)
sim_result = sim.run(portfolio)

print("\n10-Year Simulation Results (baseline):")
print(f"  Total cost: ${sim_result.total_cost():,.0f}")
print(f"  Total failures: {sim_result.total_failures()}")


10-Year Simulation Results (baseline):
  Total cost: $7,930,000
  Total failures: 122


## 7. Export Intervention Schedule

Export results in different formats:
- **minimal**: Just asset_id, year, intervention_type, cost
- **detailed**: Includes risk scores, rankings, and optional portfolio data

In [13]:
# Export minimal format
result.to_parquet("schedule_minimal.parquet", format="minimal", year=2024)
print("Exported: schedule_minimal.parquet")

# Export detailed format with portfolio data
result.to_parquet(
    "schedule_detailed.parquet",
    format="detailed",
    year=2024,
    portfolio=portfolio_with_age,
)
print("Exported: schedule_detailed.parquet")

Exported: schedule_minimal.parquet
Exported: schedule_detailed.parquet


In [14]:
# Read back and verify
minimal = pd.read_parquet("schedule_minimal.parquet")
print("Minimal format columns:", list(minimal.columns))
minimal.head()

Minimal format columns: ['asset_id', 'year', 'intervention_type', 'cost']


Unnamed: 0,asset_id,year,intervention_type,cost
0,PIPE-0393,2024,Repair,5000.0
1,PIPE-0076,2024,Repair,5000.0
2,PIPE-0157,2024,Repair,5000.0
3,PIPE-0092,2024,Repair,5000.0
4,PIPE-0107,2024,Repair,5000.0


In [15]:
detailed = pd.read_parquet("schedule_detailed.parquet")
print("Detailed format columns:", list(detailed.columns))
detailed.head()

Detailed format columns: ['asset_id', 'year', 'intervention_type', 'cost', 'risk_score', 'rank', 'material', 'age', 'risk_before', 'risk_after', 'risk_reduction']


Unnamed: 0,asset_id,year,intervention_type,cost,risk_score,rank,material,age,risk_before,risk_after,risk_reduction
0,PIPE-0393,2024,Repair,5000.0,0.776172,1,Cast Iron,68.635181,0.776172,0.696688,0.079484
1,PIPE-0076,2024,Repair,5000.0,0.771616,2,Cast Iron,68.325804,0.771616,0.69139,0.080226
2,PIPE-0157,2024,Repair,5000.0,0.762974,3,Cast Iron,67.748118,0.762974,0.681393,0.081582
3,PIPE-0092,2024,Repair,5000.0,0.762229,4,Cast Iron,67.698836,0.762229,0.680533,0.081696
4,PIPE-0107,2024,Repair,5000.0,0.757438,5,Cast Iron,67.383984,0.757438,0.675022,0.082416


## Summary

In this notebook, we covered:

1. **Portfolio Data and Model Setup**: Creating a realistic asset portfolio with deterioration parameters
2. **Optimizer Configuration**: Using the greedy strategy with risk thresholds
3. **Running Optimization**: The `fit()` method returns self with results in `result_`
4. **Examining Selections**: Understanding which assets were selected and why
5. **Budget Comparison**: Seeing how different budgets affect intervention counts
6. **Export Formats**: Saving schedules in minimal or detailed parquet format

Next: See **`visualization.ipynb`** for charts and scenario comparisons.

In [16]:
# Clean up temporary files
for f in ["schedule_minimal.parquet", "schedule_detailed.parquet"]:
    if os.path.exists(f):
        os.remove(f)
        print(f"Cleaned up: {f}")

Cleaned up: schedule_minimal.parquet
Cleaned up: schedule_detailed.parquet
