# Typical Periods Optimization with `cluster_reduce()`

This notebook demonstrates the `cluster_reduce()` method for fast sizing optimization using typical periods.

## Key Concept

Unlike `cluster()` which uses equality constraints (same number of timesteps), `cluster_reduce()` **actually reduces** the number of timesteps:

| Method | Timesteps | Mechanism | Use Case |
|--------|-----------|-----------|----------|
| `cluster()` | 2976 | Equality constraints | Accurate operational dispatch |
| `cluster_reduce()` | 768 (8×96) | Typical periods only | Fast initial sizing |

## Features

- **Actual timestep reduction**: Only solves for typical periods (e.g., 8 days × 96 timesteps = 768 instead of 2976)
- **Timestep weighting**: Operational costs are weighted by cluster occurrence
- **Inter-period storage linking**: SOC_boundary variables track storage state across original periods
- **Cyclic constraint**: Optional cyclic storage constraint for long-term balance

!!! note "Requirements"
    This notebook requires the `tsam` package: `pip install tsam`

In [None]:
import timeit

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import flixopt as fx

fx.CONFIG.notebook()

## Load the FlowSystem

We use a pre-built district heating system with real-world time series data (one month at 15-min resolution):

In [None]:
from pathlib import Path

# Generate example data if not present (for local development)
data_file = Path('data/district_heating_system.nc4')
if not data_file.exists():
    from data.generate_example_systems import create_district_heating_system

    fs = create_district_heating_system()
    fs.optimize(fx.solvers.HighsSolver(log_to_console=False))
    fs.to_netcdf(data_file, overwrite=True)

# Load the district heating system (real data from Zeitreihen2020.csv)
flow_system = fx.FlowSystem.from_netcdf(data_file)

timesteps = flow_system.timesteps
print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')
print(f'Components: {list(flow_system.components.keys())}')

In [None]:
# Visualize first two weeks of data
heat_demand = flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile
electricity_price = flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs']

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)

fig.add_trace(go.Scatter(x=timesteps[:1344], y=heat_demand.values[:1344], name='Heat Demand'), row=1, col=1)
fig.add_trace(go.Scatter(x=timesteps[:1344], y=electricity_price.values[:1344], name='Electricity Price'), row=2, col=1)

fig.update_layout(height=400, title='First Two Weeks of Data')
fig.update_yaxes(title_text='Heat Demand [MW]', row=1, col=1)
fig.update_yaxes(title_text='El. Price [€/MWh]', row=2, col=1)
fig.show()

## Method 1: Full Optimization (Baseline)

First, let's solve the full problem with all timesteps.

In [None]:
solver = fx.solvers.HighsSolver(mip_gap=0.01)

start = timeit.default_timer()
fs_full = flow_system.copy()
fs_full.optimize(solver)
time_full = timeit.default_timer() - start

print(f'Full optimization: {time_full:.2f} seconds')
print(f'Total cost: {fs_full.solution["costs"].item():,.0f} €')
print('\nOptimized sizes:')
for name, size in fs_full.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

## Method 2: Typical Periods with `cluster_reduce()`

Now let's use the `cluster_reduce()` method to solve with only 8 typical days (768 timesteps).

**Important**: Use `time_series_for_high_peaks` to force inclusion of peak demand periods. Without this, the typical periods may miss extreme peaks, leading to undersized components that cause infeasibility in the full-resolution dispatch stage.

In [None]:
start = timeit.default_timer()

# IMPORTANT: Use time_series_for_high_peaks to force inclusion of peak demand periods!
# Without this, the typical periods may miss extreme peaks, leading to undersized components.
# The format is the column name in the internal dataframe: 'ComponentName(FlowName)|attribute'
peak_forcing_series = ['HeatDemand(Q_th)|fixed_relative_profile']

# Create reduced FlowSystem with 8 typical days
fs_reduced = flow_system.transform.cluster_reduce(
    n_clusters=8,  # 8 typical days
    cluster_duration='1D',  # Daily periods (can also use hours, e.g., 24)
    time_series_for_high_peaks=peak_forcing_series,  # Force inclusion of peak demand day!
    storage_inter_period_linking=True,  # Link storage states between periods
    storage_cyclic=True,  # Cyclic constraint: SOC[0] = SOC[end]
)

time_clustering = timeit.default_timer() - start
print(f'Clustering time: {time_clustering:.2f} seconds')
print(f'Reduced from {len(flow_system.timesteps)} to {len(fs_reduced.timesteps)} timesteps')
print(f'Timestep weights (cluster occurrences): {np.unique(fs_reduced._cluster_info["timestep_weights"])}')

In [None]:
# Optimize the reduced system
start = timeit.default_timer()
fs_reduced.optimize(solver)
time_reduced = timeit.default_timer() - start

print(f'Reduced optimization: {time_reduced:.2f} seconds')
print(f'Total cost: {fs_reduced.solution["costs"].item():,.0f} €')
print(f'Speedup vs full: {time_full / (time_clustering + time_reduced):.1f}x')
print('\nOptimized sizes:')
for name, size in fs_reduced.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

## Method 3: Two-Stage Workflow

The recommended workflow:
1. **Stage 1**: Fast sizing with `cluster_reduce()`
2. **Stage 2**: Fix sizes (with safety margin) and re-optimize for accurate dispatch

**Note**: Typical periods aggregate similar days, so individual days within a cluster may have higher demand than the typical day. Adding a 5-10% safety margin to sizes helps ensure feasibility.

In [None]:
# Stage 1: Fast sizing (already done above)
print('Stage 1: Sizing with typical periods')
print(f'  Time: {time_clustering + time_reduced:.2f} seconds')
print(f'  Cost estimate: {fs_reduced.solution["costs"].item():,.0f} €')

# Apply safety margin to sizes (5-10% buffer for demand variability)
SAFETY_MARGIN = 1.05  # 5% buffer
sizes_with_margin = {name: float(size.item()) * SAFETY_MARGIN for name, size in fs_reduced.statistics.sizes.items()}
print(f'\nSizes with {(SAFETY_MARGIN - 1) * 100:.0f}% safety margin:')
for name, size in sizes_with_margin.items():
    original = fs_reduced.statistics.sizes[name].item()
    print(f'  {name}: {original:.1f} -> {size:.1f}')

# Stage 2: Fix sizes and re-optimize at full resolution
print('\nStage 2: Dispatch at full resolution')
start = timeit.default_timer()

fs_dispatch = flow_system.transform.fix_sizes(sizes_with_margin)
fs_dispatch.optimize(solver)

time_dispatch = timeit.default_timer() - start
print(f'  Time: {time_dispatch:.2f} seconds')
print(f'  Actual cost: {fs_dispatch.solution["costs"].item():,.0f} €')

# Total time comparison
total_two_stage = time_clustering + time_reduced + time_dispatch
print(f'\nTotal two-stage time: {total_two_stage:.2f} seconds')
print(f'Full optimization time: {time_full:.2f} seconds')
print(f'Two-stage speedup: {time_full / total_two_stage:.1f}x')

## Compare Results

In [None]:
results = {
    'Full (baseline)': {
        'Time [s]': time_full,
        'Cost [€]': fs_full.solution['costs'].item(),
        'CHP Size': fs_full.statistics.sizes['CHP(Q_th)'].item(),
        'Boiler Size': fs_full.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage Size': fs_full.statistics.sizes['Storage'].item(),
    },
    'Typical Periods (sizing)': {
        'Time [s]': time_clustering + time_reduced,
        'Cost [€]': fs_reduced.solution['costs'].item(),
        'CHP Size': fs_reduced.statistics.sizes['CHP(Q_th)'].item(),
        'Boiler Size': fs_reduced.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage Size': fs_reduced.statistics.sizes['Storage'].item(),
    },
    'Two-Stage (with margin)': {
        'Time [s]': total_two_stage,
        'Cost [€]': fs_dispatch.solution['costs'].item(),
        'CHP Size': sizes_with_margin['CHP(Q_th)'],
        'Boiler Size': sizes_with_margin['Boiler(Q_th)'],
        'Storage Size': sizes_with_margin['Storage'],
    },
}

comparison = pd.DataFrame(results).T
baseline_cost = comparison.loc['Full (baseline)', 'Cost [€]']
baseline_time = comparison.loc['Full (baseline)', 'Time [s]']
comparison['Cost Gap [%]'] = ((comparison['Cost [€]'] - baseline_cost) / abs(baseline_cost) * 100).round(2)
comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1)

comparison.style.format(
    {
        'Time [s]': '{:.2f}',
        'Cost [€]': '{:,.0f}',
        'CHP Size': '{:.1f}',
        'Boiler Size': '{:.1f}',
        'Storage Size': '{:.0f}',
        'Cost Gap [%]': '{:.2f}',
        'Speedup': '{:.1f}x',
    }
)

## Inter-Period Storage Linking

The `cluster_reduce()` method creates special constraints to track storage state across original periods:

- **SOC_boundary[d]**: Storage state at the boundary of original period d
- **delta_SOC[c]**: Change in SOC during typical period c
- **Linking**: `SOC_boundary[d+1] = SOC_boundary[d] + delta_SOC[cluster_order[d]]`
- **Cyclic**: `SOC_boundary[0] = SOC_boundary[end]` (optional)

This ensures long-term storage behavior is captured correctly even though we only solve for typical periods.

In [None]:
# Show clustering info
info = fs_reduced._cluster_info
print('Typical Periods Configuration:')
print(f'  Number of typical periods: {info["n_clusters"]}')
print(f'  Timesteps per period: {info["timesteps_per_cluster"]}')
print(f'  Total reduced timesteps: {info["n_clusters"] * info["timesteps_per_cluster"]}')
print(f'  Cluster order (first 10): {info["cluster_order"][:10]}...')
cluster_occurrences = info['cluster_occurrences'][(None, None)]
print(f'  Cluster occurrences: {dict(cluster_occurrences)}')
print(f'  Storage inter-period linking: {info["storage_inter_period_linking"]}')
print(f'  Storage cyclic: {info["storage_cyclic"]}')

## API Reference

### `transform.cluster_reduce()` Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `n_clusters` | `int` | Number of typical periods to extract (e.g., 8) |
| `cluster_duration` | `str \| float` | Duration of each period ('1D', '24h') or hours as float |
| `weights` | `dict[str, float]` | Optional weights for clustering each time series |
| `time_series_for_high_peaks` | `list[str]` | **IMPORTANT**: Force inclusion of high-value periods to capture peak demands |
| `time_series_for_low_peaks` | `list[str]` | Force inclusion of low-value periods |
| `storage_inter_period_linking` | `bool` | Link storage states between periods (default: True) |
| `storage_cyclic` | `bool` | Enforce cyclic storage constraint (default: True) |

### Peak Forcing

**Always use `time_series_for_high_peaks`** for demand time series to ensure extreme peaks are captured. The format is:
```python
time_series_for_high_peaks=['ComponentName(FlowName)|fixed_relative_profile']
```

Without peak forcing, the clustering algorithm may select typical periods that don't include the peak demand day, leading to undersized components and infeasibility in the dispatch stage.

### Comparison with `cluster()`

| Feature | `cluster()` | `cluster_reduce()` |
|---------|-------------|--------------------|
| Timesteps | Original (2976) | Reduced (e.g., 768) |
| Mechanism | Equality constraints | Typical periods only |
| Solve time | Moderate reduction | Dramatic reduction |
| Accuracy | Higher | Lower (sizing only) |
| Storage handling | Via constraints | SOC boundary linking |
| Use case | Final dispatch | Initial sizing |

## Summary

The `cluster_reduce()` method provides:

1. **Dramatic speedup** for sizing optimization by reducing timesteps
2. **Proper cost weighting** so operational costs reflect cluster occurrences
3. **Storage state tracking** across original periods via SOC_boundary variables
4. **Two-stage workflow** support via `fix_sizes()` for accurate dispatch

### Recommended Workflow

```python
# Stage 1: Fast sizing with typical periods
fs_sizing = flow_system.transform.cluster_reduce(
    n_clusters=8,
    cluster_duration='1D',
    time_series_for_high_peaks=['DemandComponent(FlowName)|fixed_relative_profile'],
)
fs_sizing.optimize(solver)

# Apply safety margin (typical periods aggregate, so individual days may exceed)
SAFETY_MARGIN = 1.05  # 5% buffer
sizes_with_margin = {
    name: float(size.item()) * SAFETY_MARGIN
    for name, size in fs_sizing.statistics.sizes.items()
}

# Stage 2: Fix sizes and optimize dispatch at full resolution
fs_dispatch = flow_system.transform.fix_sizes(sizes_with_margin)
fs_dispatch.optimize(solver)
```

### Key Considerations

- **Peak forcing is essential**: Use `time_series_for_high_peaks` to capture peak demand days
- **Safety margin recommended**: Add 5-10% buffer to sizes since aggregation smooths peaks
- **Two-stage is recommended**: Use `cluster_reduce()` for fast sizing, then `fix_sizes()` for dispatch
- **Storage linking preserves long-term behavior**: SOC_boundary variables ensure correct storage cycling

In [None]:
fs_expanded = fs_reduced.transform.expand_solution()

In [None]:
fs_expanded.statistics.plot.effects()