# Asset Optimization Quickstart

This notebook demonstrates the basic workflow for asset portfolio optimization.

You'll learn how to:
1. Create and load a portfolio
2. Configure a deterioration model
3. Run a multi-year simulation
4. Examine and export results

## Setup

Install the package if needed:

```bash
pip install asset-optimization
```

In [1]:
# Core imports
import pandas as pd
import numpy as np
from datetime import date, timedelta

# SDK imports
from asset_optimization import (
    WeibullModel,
    Simulator,
    SimulationConfig,
)
from asset_optimization.portfolio import validate_portfolio, compute_quality_metrics


## 1. Generate Synthetic Portfolio

We'll create a sample water pipe portfolio with realistic characteristics.

The portfolio includes:
- **500 pipes** of different materials
- **Three materials**: Cast Iron (oldest), PVC (modern), Ductile Iron
- **Install dates**: Ranging from 20-80 years ago
- **Various diameters and lengths**: Typical for water distribution

In [2]:
# Set seed for reproducibility
np.random.seed(42)

n_assets = 500
materials = ['Cast Iron', 'PVC', 'Ductile Iron']

# Generate realistic install dates (20-80 years old)
base_date = date(2024, 1, 1)
install_dates = [
    base_date - timedelta(days=int(np.random.uniform(20*365, 80*365)))
    for _ in range(n_assets)
]

# Create portfolio DataFrame
data = pd.DataFrame({
    'asset_id': [f'PIPE-{i:04d}' for i in range(n_assets)],
    'install_date': pd.to_datetime(install_dates),
    'asset_type': 'pipe',  # All are pipes in this example
    'material': np.random.choice(materials, n_assets, p=[0.4, 0.35, 0.25]),
    'diameter_mm': np.random.choice([150, 200, 300, 400], n_assets),
    'length_m': np.random.uniform(50, 500, n_assets).round(0),
})

print(f"Generated {len(data)} assets")
data.head(10)

Generated 500 assets


Unnamed: 0,asset_id,install_date,asset_type,material,diameter_mm,length_m
0,PIPE-0000,1981-07-23,pipe,PVC,300,408.0
1,PIPE-0001,1947-01-05,pipe,PVC,400,267.0
2,PIPE-0002,1960-02-16,pipe,Cast Iron,200,103.0
3,PIPE-0003,1968-02-14,pipe,Ductile Iron,400,106.0
4,PIPE-0004,1994-08-30,pipe,PVC,400,359.0
5,PIPE-0005,1994-08-30,pipe,Cast Iron,300,244.0
6,PIPE-0006,2000-07-13,pipe,Ductile Iron,400,140.0
7,PIPE-0007,1952-01-30,pipe,Ductile Iron,300,271.0
8,PIPE-0008,1967-12-22,pipe,Ductile Iron,150,79.0
9,PIPE-0009,1961-07-24,pipe,PVC,150,312.0


## 2. Validate Portfolio Data

Validation is handled via a DataFrame-first helper (and enforced when running simulations/optimizations).

In [3]:
# Validate portfolio DataFrame (optional helper)
portfolio = validate_portfolio(data)

# Display portfolio summary
print(portfolio.head())
print(f"\nAsset types: {sorted(portfolio['asset_type'].unique())}")

age_years = (pd.Timestamp.now() - portfolio['install_date']).dt.days / 365.25
print(f"Mean age: {age_years.mean():.1f} years")


    asset_id install_date asset_type      material  diameter_mm  length_m
0  PIPE-0000   1981-07-23       pipe           PVC          300     408.0
1  PIPE-0001   1947-01-05       pipe           PVC          400     267.0
2  PIPE-0002   1960-02-16       pipe     Cast Iron          200     103.0
3  PIPE-0003   1968-02-14       pipe  Ductile Iron          400     106.0
4  PIPE-0004   1994-08-30       pipe           PVC          400     359.0

Asset types: ['pipe']
Mean age: 52.0 years


In [4]:
# Check data quality metrics
quality = compute_quality_metrics(portfolio)
print("Data Quality Metrics:")
print(quality)


Data Quality Metrics:
              Completeness (%)  Missing Count
asset_id                 100.0              0
install_date             100.0              0
asset_type               100.0              0
material                 100.0              0
diameter_mm              100.0              0
length_m                 100.0              0


In [5]:
# Access individual assets
oldest_idx = portfolio['install_date'].idxmin()
oldest = portfolio.loc[oldest_idx]
print(f"Oldest asset: {oldest['asset_id']}")
print(f"  Material: {oldest['material']}")
print(f"  Installed: {oldest['install_date'].date()}")


Oldest asset: PIPE-0475
  Material: Cast Iron
  Installed: 1944-06-24


## 3. Configure Deterioration Model

We use a **Weibull model** where each material type has different parameters:

- **shape (k)**: Controls failure rate behavior
  - k > 1 means increasing failure rate (typical for aging infrastructure)
- **scale (lambda)**: Characteristic life in years

Typical values for water pipes:
- Cast Iron: Older technology, shorter expected life
- PVC: Modern material, longer expected life
- Ductile Iron: Good durability, moderate expected life

In [6]:
# Define Weibull parameters for each material type
# Format: 'material': (shape, scale)
params = {
    'Cast Iron': (3.0, 60),      # Older, shape=3 (increasing failures)
    'PVC': (2.5, 80),            # Modern, longer life
    'Ductile Iron': (2.8, 70),   # Good durability
}

model = WeibullModel(params)
print(model)

WeibullModel(types=['Cast Iron', 'PVC', 'Ductile Iron'], type_column='material', age_column='age')


In [7]:
# The model can transform portfolio data to add failure probabilities
# First, we need to add an 'age' column
portfolio_with_age = portfolio.copy()
portfolio_with_age['age'] = (
    (pd.Timestamp.now() - portfolio_with_age['install_date']).dt.days / 365.25
)

# Transform adds failure_rate and failure_probability columns
enriched = model.transform(portfolio_with_age)
enriched[['asset_id', 'material', 'age', 'failure_rate', 'failure_probability']].head(10)

Unnamed: 0,asset_id,material,age,failure_rate,failure_probability
0,PIPE-0000,PVC,44.539357,0.012982,0.206481
1,PIPE-0001,PVC,79.085558,0.030716,0.621549
2,PIPE-0002,Cast Iron,65.971253,0.060447,0.735327
3,PIPE-0003,Ductile Iron,57.976728,0.028493,0.445661
4,PIPE-0004,PVC,31.436003,0.007698,0.092256
5,PIPE-0005,Cast Iron,31.436003,0.013725,0.133959
6,PIPE-0006,Ductile Iron,25.566051,0.006526,0.057851
7,PIPE-0007,Ductile Iron,74.017796,0.044227,0.689366
8,PIPE-0008,Ductile Iron,58.124572,0.028624,0.447996
9,PIPE-0009,PVC,64.536619,0.022642,0.44262


## 4. Run Simulation

Run a **10-year simulation** that tracks:
- Costs (failure costs + intervention costs)
- Failures (sampled based on deterioration model)
- Asset aging

The simulation uses **conditional probability** to sample failures:
- P(fail in year t | survived to t) = (S(t) - S(t+1)) / S(t)
- Failed assets are automatically replaced (default behavior)

In [8]:
# Configure simulation
config = SimulationConfig(
    n_years=10,
    start_year=2024,
    random_seed=42,  # For reproducibility
    failure_response='replace',  # Replace failed assets
)

print(config)

SimulationConfig(n_years=10, start_year=2024, random_seed=42, failure_response='replace')


In [9]:
# Create simulator and run
sim = Simulator(model, config)
result = sim.run(portfolio)

print(result)

SimulationResult(years=2024-2033, total_cost=$7,930,000, failures=122)


## 5. Examine Results

The `SimulationResult` contains:
- **summary**: Year-by-year metrics
- **cost_breakdown**: Detailed cost allocation
- **failure_log**: Individual failure events

In [10]:
# Summary statistics
print(f"Total cost over {config.n_years} years: ${result.total_cost():,.0f}")
print(f"Total failures: {result.total_failures()}")
print(f"Average failures per year: {result.total_failures() / config.n_years:.1f}")

Total cost over 10 years: $7,930,000
Total failures: 122
Average failures per year: 12.2


In [11]:
# Year-by-year summary
result.summary

Unnamed: 0,year,total_cost,failure_count,intervention_count,avg_age
0,2024,585000.0,9,9,49.747936
1,2025,1170000.0,18,18,48.533146
2,2026,845000.0,13,13,47.840893
3,2027,585000.0,9,9,47.656465
4,2028,780000.0,12,12,47.033057
5,2029,975000.0,15,15,45.950552
6,2030,1105000.0,17,17,44.688984
7,2031,520000.0,8,8,44.607035
8,2032,845000.0,13,13,43.811288
9,2033,520000.0,8,8,43.720134


In [12]:
# Cost breakdown by year
result.cost_breakdown

Unnamed: 0,year,failure_direct_cost,failure_consequence_cost,intervention_cost
0,2024,90000.0,45000.0,450000.0
1,2025,180000.0,90000.0,900000.0
2,2026,130000.0,65000.0,650000.0
3,2027,90000.0,45000.0,450000.0
4,2028,120000.0,60000.0,600000.0
5,2029,150000.0,75000.0,750000.0
6,2030,170000.0,85000.0,850000.0
7,2031,80000.0,40000.0,400000.0
8,2032,130000.0,65000.0,650000.0
9,2033,80000.0,40000.0,400000.0


In [13]:
# Individual failure events
if not result.failure_log.empty:
    print(f"\nSample failures (first 10):")
    display(result.failure_log.head(10))
else:
    print("No failures recorded (lucky run!)")


Sample failures (first 10):


Unnamed: 0,year,asset_id,age_at_failure,material,direct_cost,consequence_cost
0,2024,PIPE-0051,67.461328,Ductile Iron,10000.0,5000.0
1,2024,PIPE-0139,79.253251,Cast Iron,10000.0,5000.0
2,2024,PIPE-0163,53.109514,Cast Iron,10000.0,5000.0
3,2024,PIPE-0204,57.344969,Cast Iron,10000.0,5000.0
4,2024,PIPE-0226,79.324435,Cast Iron,10000.0,5000.0
5,2024,PIPE-0238,59.685832,PVC,10000.0,5000.0
6,2024,PIPE-0270,69.514716,Cast Iron,10000.0,5000.0
7,2024,PIPE-0272,72.972621,Cast Iron,10000.0,5000.0
8,2024,PIPE-0422,26.453799,Ductile Iron,10000.0,5000.0
9,2025,PIPE-0018,47.883641,Ductile Iron,10000.0,5000.0


## 6. Export Results

Results can be exported to **Parquet format** for further analysis or reporting.

Supported formats:
- `summary`: Year-by-year metrics (default)
- `cost_projections`: Long format for plotting
- `failure_log`: Detailed failure events

In [14]:
# Export summary (default format)
result.to_parquet('simulation_summary.parquet')
print("Exported: simulation_summary.parquet")

# Export in long format (good for plotting)
result.to_parquet('cost_projections.parquet', format='cost_projections')
print("Exported: cost_projections.parquet")

Exported: simulation_summary.parquet
Exported: cost_projections.parquet


In [15]:
# Read back and verify
df = pd.read_parquet('simulation_summary.parquet')
print("Read back simulation_summary.parquet:")
df.head()

Read back simulation_summary.parquet:


Unnamed: 0,year,total_cost,failure_count,intervention_count,avg_age
0,2024,585000.0,9,9,49.747936
1,2025,1170000.0,18,18,48.533146
2,2026,845000.0,13,13,47.840893
3,2027,585000.0,9,9,47.656465
4,2028,780000.0,12,12,47.033057


## Next Steps

- See **`optimization.ipynb`** for budget-constrained intervention selection
- See **`visualization.ipynb`** for charts and scenario comparisons

In [16]:
# Clean up temporary files
import os
for f in ['simulation_summary.parquet', 'cost_projections.parquet']:
    if os.path.exists(f):
        os.remove(f)
        print(f"Cleaned up: {f}")

Cleaned up: simulation_summary.parquet
Cleaned up: cost_projections.parquet
