# Typical Periods Optimization with `cluster_reduce()`

This notebook demonstrates the new `cluster_reduce()` method for fast sizing optimization using typical periods.

## Key Concept

Unlike `cluster()` which uses equality constraints (same number of timesteps), `cluster_reduce()` **actually reduces** the number of timesteps:

| Method | Timesteps | Mechanism | Use Case |
|--------|-----------|-----------|----------|
| `cluster()` | 8760 | Equality constraints | Accurate operational dispatch |
| `cluster_reduce()` | 192 (8×24) | Typical periods only | Fast initial sizing |

## Features

- **Actual timestep reduction**: Only solves for typical periods (e.g., 8 days × 24h = 192 instead of 8760)
- **Timestep weighting**: Operational costs are weighted by cluster occurrence
- **Inter-period storage linking**: SOC_boundary variables track storage state across original periods
- **Cyclic constraint**: Optional cyclic storage constraint for long-term balance

!!! note "Requirements"
    This notebook requires the `tsam` package: `pip install tsam`

In [None]:
import timeit

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import flixopt as fx

fx.CONFIG.notebook()

## Create a Full-Year Example System

We'll create a simple district heating system with a full year of hourly data.

In [None]:
# Generate synthetic yearly data
np.random.seed(42)
hours = 8760  # Full year hourly

# Create realistic heat demand profile (seasonal + daily patterns)
t = np.arange(hours)
seasonal = 50 + 40 * np.cos(2 * np.pi * t / 8760)  # Higher in winter
daily = 10 * np.sin(2 * np.pi * t / 24 - np.pi / 2)  # Peak in morning/evening
noise = np.random.normal(0, 5, hours)
heat_demand = np.maximum(seasonal + daily + noise, 10)

# Create electricity price profile (higher during day, lower at night)
hour_of_day = t % 24
elec_price = 50 + 30 * np.sin(np.pi * hour_of_day / 12) + np.random.normal(0, 5, hours)
elec_price = np.maximum(elec_price, 20)

timesteps = pd.date_range('2020-01-01', periods=hours, freq='h')

print(f'Created {hours} hourly timesteps ({hours / 24:.0f} days)')
print(f'Heat demand range: {heat_demand.min():.1f} - {heat_demand.max():.1f} MW')
print(f'Electricity price range: {elec_price.min():.1f} - {elec_price.max():.1f} EUR/MWh')

In [None]:
# Visualize first month of data
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)

fig.add_trace(go.Scatter(x=timesteps[:720], y=heat_demand[:720], name='Heat Demand'), row=1, col=1)
fig.add_trace(go.Scatter(x=timesteps[:720], y=elec_price[:720], name='Electricity Price'), row=2, col=1)

fig.update_layout(height=400, title='First Month of Data')
fig.update_yaxes(title_text='Heat Demand [MW]', row=1, col=1)
fig.update_yaxes(title_text='El. Price [EUR/MWh]', row=2, col=1)
fig.show()

In [None]:
def create_flow_system():
    """Create the district heating FlowSystem."""
    fs = fx.FlowSystem(timesteps=timesteps)

    # Effects
    costs = fx.Effect(label='costs', unit='EUR', is_objective=True)

    # Buses
    heat_bus = fx.Bus('Heat')
    elec_bus = fx.Bus('Electricity')
    gas_bus = fx.Bus('Gas')

    fs.add_elements(costs, heat_bus, elec_bus, gas_bus)

    # Gas supply
    gas_supply = fx.Source(
        'GasSupply',
        outputs=[fx.Flow('gas_out', bus='Gas', size=500, effects_per_flow_hour={'costs': 35})],
    )

    # Electricity grid
    grid_buy = fx.Source(
        'GridBuy',
        outputs=[fx.Flow('elec_out', bus='Electricity', size=200, effects_per_flow_hour={'costs': elec_price})],
    )

    grid_sell = fx.Sink(
        'GridSell',
        inputs=[fx.Flow('elec_in', bus='Electricity', size=200, effects_per_flow_hour={'costs': -elec_price * 0.9})],
    )

    # Boiler (investment)
    boiler = fx.linear_converters.Boiler(
        'Boiler',
        thermal_efficiency=0.9,
        thermal_flow=fx.Flow(
            'Q_th',
            bus='Heat',
            size=fx.InvestParameters(minimum_size=0, maximum_size=200, effects_of_investment_per_size={'costs': 50000}),
        ),
        fuel_flow=fx.Flow('Q_fu', bus='Gas'),
    )

    # CHP (investment)
    chp = fx.linear_converters.CHP(
        'CHP',
        thermal_efficiency=0.45,
        electrical_efficiency=0.35,
        thermal_flow=fx.Flow(
            'Q_th',
            bus='Heat',
            size=fx.InvestParameters(
                minimum_size=0, maximum_size=150, effects_of_investment_per_size={'costs': 150000}
            ),
        ),
        electrical_flow=fx.Flow('P_el', bus='Electricity'),
        fuel_flow=fx.Flow('Q_fu', bus='Gas'),
    )

    # Heat storage (investment)
    storage = fx.Storage(
        'ThermalStorage',
        charging=fx.Flow('charge', bus='Heat', size=50),
        discharging=fx.Flow('discharge', bus='Heat', size=50),
        capacity_in_flow_hours=fx.InvestParameters(
            minimum_size=0, maximum_size=500, effects_of_investment_per_size={'costs': 20000}
        ),
        eta_charge=0.95,
        eta_discharge=0.95,
        relative_loss_per_hour=0.005,
        initial_charge_state='equals_final',
    )

    # Heat demand
    demand = fx.Sink(
        'HeatDemand',
        inputs=[fx.Flow('Q_th', bus='Heat', size=1, fixed_relative_profile=heat_demand)],
    )

    fs.add_elements(gas_supply, grid_buy, grid_sell, boiler, chp, storage, demand)

    return fs


# Create the system
flow_system = create_flow_system()
print(f'FlowSystem created with {len(flow_system.timesteps)} timesteps')
print(f'Components: {list(flow_system.components.keys())}')

## Method 1: Full Optimization (Baseline)

First, let's solve the full problem with all 8760 timesteps.

In [None]:
solver = fx.solvers.HighsSolver(mip_gap=0.01)

start = timeit.default_timer()
fs_full = create_flow_system()
fs_full.optimize(solver)
time_full = timeit.default_timer() - start

print(f'Full optimization: {time_full:.2f} seconds')
print(f'Total cost: {fs_full.solution["costs"].item():,.0f} EUR')
print('\nOptimized sizes:')
for name, size in fs_full.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

## Method 2: Typical Periods with `cluster_reduce()`

Now let's use the new `cluster_reduce()` method to solve with only 8 typical days (192 timesteps).

**Important**: Use `time_series_for_high_peaks` to force inclusion of peak demand periods. Without this, the typical periods may miss extreme peaks, leading to undersized components that cause infeasibility in the full-resolution dispatch stage.

In [None]:
start = timeit.default_timer()

# IMPORTANT: Use time_series_for_high_peaks to force inclusion of peak demand periods!
# Without this, the typical periods may miss extreme peaks, leading to undersized components.
# The format is the column name in the internal dataframe: 'ComponentName(FlowName)|attribute'
peak_forcing_series = ['HeatDemand(Q_th)|fixed_relative_profile']

# Create reduced FlowSystem with 8 typical days
fs_reduced = create_flow_system().transform.cluster_reduce(
    hours_per_period=24,  # 24 hours per period (daily)
    nr_of_typical_periods=8,  # 8 typical days
    time_series_for_high_peaks=peak_forcing_series,  # Force inclusion of peak demand day!
    storage_inter_period_linking=True,  # Link storage states between periods
    storage_cyclic=True,  # Cyclic constraint: SOC[0] = SOC[end]
)

time_clustering = timeit.default_timer() - start
print(f'Clustering time: {time_clustering:.2f} seconds')
print(f'Reduced from {len(flow_system.timesteps)} to {len(fs_reduced.timesteps)} timesteps')
print(f'Timestep weights (cluster occurrences): {np.unique(fs_reduced._typical_periods_info["timestep_weights"])}')

In [None]:
# Optimize the reduced system
start = timeit.default_timer()
fs_reduced.optimize(solver)
time_reduced = timeit.default_timer() - start

print(f'Reduced optimization: {time_reduced:.2f} seconds')
print(f'Total cost: {fs_reduced.solution["costs"].item():,.0f} EUR')
print(f'Speedup vs full: {time_full / (time_clustering + time_reduced):.1f}x')
print('\nOptimized sizes:')
for name, size in fs_reduced.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

## Method 3: Two-Stage Workflow

The recommended workflow:
1. **Stage 1**: Fast sizing with `cluster_reduce()`
2. **Stage 2**: Fix sizes (with safety margin) and re-optimize for accurate dispatch

**Note**: Typical periods aggregate similar days, so individual days within a cluster may have higher demand than the typical day. Adding a 5-10% safety margin to sizes helps ensure feasibility.

In [None]:
# Stage 1: Fast sizing (already done above)
print('Stage 1: Sizing with typical periods')
print(f'  Time: {time_clustering + time_reduced:.2f} seconds')
print(f'  Cost estimate: {fs_reduced.solution["costs"].item():,.0f} EUR')

# Apply safety margin to sizes (5-10% buffer for demand variability)
SAFETY_MARGIN = 1.05  # 5% buffer
sizes_with_margin = {name: float(size.item()) * SAFETY_MARGIN for name, size in fs_reduced.statistics.sizes.items()}
print(f'\nSizes with {(SAFETY_MARGIN - 1) * 100:.0f}% safety margin:')
for name, size in sizes_with_margin.items():
    original = fs_reduced.statistics.sizes[name].item()
    print(f'  {name}: {original:.1f} -> {size:.1f}')

# Stage 2: Fix sizes and re-optimize at full resolution
print('\nStage 2: Dispatch at full resolution')
start = timeit.default_timer()

fs_dispatch = create_flow_system().transform.fix_sizes(sizes_with_margin)
fs_dispatch.optimize(solver)

time_dispatch = timeit.default_timer() - start
print(f'  Time: {time_dispatch:.2f} seconds')
print(f'  Actual cost: {fs_dispatch.solution["costs"].item():,.0f} EUR')

# Total time comparison
total_two_stage = time_clustering + time_reduced + time_dispatch
print(f'\nTotal two-stage time: {total_two_stage:.2f} seconds')
print(f'Full optimization time: {time_full:.2f} seconds')
print(f'Two-stage speedup: {time_full / total_two_stage:.1f}x')

## Compare Results

In [None]:
results = {
    'Full (baseline)': {
        'Time [s]': time_full,
        'Cost [EUR]': fs_full.solution['costs'].item(),
        'Boiler Size': fs_full.statistics.sizes['Boiler(Q_th)'].item(),
        'CHP Size': fs_full.statistics.sizes['CHP(Q_th)'].item(),
        'Storage Size': fs_full.statistics.sizes['ThermalStorage'].item(),
    },
    'Typical Periods (sizing)': {
        'Time [s]': time_clustering + time_reduced,
        'Cost [EUR]': fs_reduced.solution['costs'].item(),
        'Boiler Size': fs_reduced.statistics.sizes['Boiler(Q_th)'].item(),
        'CHP Size': fs_reduced.statistics.sizes['CHP(Q_th)'].item(),
        'Storage Size': fs_reduced.statistics.sizes['ThermalStorage'].item(),
    },
    'Two-Stage (with margin)': {
        'Time [s]': total_two_stage,
        'Cost [EUR]': fs_dispatch.solution['costs'].item(),
        'Boiler Size': sizes_with_margin['Boiler(Q_th)'],
        'CHP Size': sizes_with_margin['CHP(Q_th)'],
        'Storage Size': sizes_with_margin['ThermalStorage'],
    },
}

comparison = pd.DataFrame(results).T
baseline_cost = comparison.loc['Full (baseline)', 'Cost [EUR]']
baseline_time = comparison.loc['Full (baseline)', 'Time [s]']
comparison['Cost Gap [%]'] = ((comparison['Cost [EUR]'] - baseline_cost) / abs(baseline_cost) * 100).round(2)
comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1)

comparison.style.format(
    {
        'Time [s]': '{:.2f}',
        'Cost [EUR]': '{:,.0f}',
        'Boiler Size': '{:.1f}',
        'CHP Size': '{:.1f}',
        'Storage Size': '{:.0f}',
        'Cost Gap [%]': '{:.2f}',
        'Speedup': '{:.1f}x',
    }
)

## Inter-Period Storage Linking

The `cluster_reduce()` method creates special constraints to track storage state across original periods:

- **SOC_boundary[d]**: Storage state at the boundary of original period d
- **delta_SOC[c]**: Change in SOC during typical period c
- **Linking**: `SOC_boundary[d+1] = SOC_boundary[d] + delta_SOC[cluster_order[d]]`
- **Cyclic**: `SOC_boundary[0] = SOC_boundary[end]` (optional)

This ensures long-term storage behavior is captured correctly even though we only solve for typical periods.

In [None]:
# Show clustering info
info = fs_reduced._typical_periods_info
print('Typical Periods Configuration:')
print(f'  Number of typical periods: {info["nr_of_typical_periods"]}')
print(f'  Timesteps per period: {info["timesteps_per_period"]}')
print(f'  Total reduced timesteps: {info["nr_of_typical_periods"] * info["timesteps_per_period"]}')
print(f'  Cluster order (first 10): {info["cluster_order"][:10]}...')
print(f'  Cluster occurrences: {dict(info["cluster_occurrences"])}')
print(f'  Storage inter-period linking: {info["storage_inter_period_linking"]}')
print(f'  Storage cyclic: {info["storage_cyclic"]}')

## API Reference

### `transform.cluster_reduce()` Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `hours_per_period` | `float` | Duration of each period in hours (e.g., 24 for daily) |
| `nr_of_typical_periods` | `int` | Number of typical periods to extract (e.g., 8) |
| `weights` | `dict[str, float]` | Optional weights for clustering each time series |
| `time_series_for_high_peaks` | `list[str]` | **IMPORTANT**: Force inclusion of high-value periods to capture peak demands |
| `time_series_for_low_peaks` | `list[str]` | Force inclusion of low-value periods |
| `storage_inter_period_linking` | `bool` | Link storage states between periods (default: True) |
| `storage_cyclic` | `bool` | Enforce cyclic storage constraint (default: True) |

### Peak Forcing

**Always use `time_series_for_high_peaks`** for demand time series to ensure extreme peaks are captured. The format is:
```python
time_series_for_high_peaks=['ComponentName(FlowName)|fixed_relative_profile']
```

Without peak forcing, the clustering algorithm may select typical periods that don't include the peak demand day, leading to undersized components and infeasibility in the dispatch stage.

### Comparison with `cluster()`

| Feature | `cluster()` | `cluster_reduce()` |
|---------|-------------|--------------------|
| Timesteps | Original (8760) | Reduced (e.g., 192) |
| Mechanism | Equality constraints | Typical periods only |
| Solve time | Moderate reduction | Dramatic reduction |
| Accuracy | Higher | Lower (sizing only) |
| Storage handling | Via constraints | SOC boundary linking |
| Use case | Final dispatch | Initial sizing |

## Summary

The new `cluster_reduce()` method provides:

1. **Dramatic speedup** for sizing optimization by reducing timesteps
2. **Proper cost weighting** so operational costs reflect cluster occurrences
3. **Storage state tracking** across original periods via SOC_boundary variables
4. **Two-stage workflow** support via `fix_sizes()` for accurate dispatch

### Recommended Workflow

```python
# Stage 1: Fast sizing with typical periods
fs_sizing = flow_system.transform.cluster_reduce(
    hours_per_period=24,
    nr_of_typical_periods=8,
    time_series_for_high_peaks=['DemandComponent(FlowName)|fixed_relative_profile'],
)
fs_sizing.optimize(solver)

# Apply safety margin (typical periods aggregate, so individual days may exceed)
SAFETY_MARGIN = 1.05  # 5% buffer
sizes_with_margin = {
    name: float(size.item()) * SAFETY_MARGIN
    for name, size in fs_sizing.statistics.sizes.items()
}

# Stage 2: Fix sizes and optimize dispatch at full resolution
fs_dispatch = flow_system.transform.fix_sizes(sizes_with_margin)
fs_dispatch.optimize(solver)
```

### Key Considerations

- **Peak forcing is essential**: Use `time_series_for_high_peaks` to capture peak demand days
- **Safety margin recommended**: Add 5-10% buffer to sizes since aggregation smooths peaks
- **Two-stage is recommended**: Use `cluster_reduce()` for fast sizing, then `fix_sizes()` for dispatch
- **Storage linking preserves long-term behavior**: SOC_boundary variables ensure correct storage cycling