# Time Series Clustering with `cluster()`

Accelerate investment optimization using typical periods (clustering).

This notebook demonstrates:

- **Typical periods**: Cluster similar time segments (e.g., days) and solve only representative ones
- **Weighted costs**: Automatically weight operational costs by cluster occurrence
- **Two-stage workflow**: Fast sizing with clustering, accurate dispatch at full resolution

!!! note "Requirements"
    This notebook requires the `tsam` package: `pip install tsam`

In [None]:
import timeit

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import flixopt as fx

fx.CONFIG.notebook()

## Load the FlowSystem

We use a pre-built district heating system with real-world time series data (one month at 15-min resolution):

In [None]:
from data.generate_example_systems import create_district_heating_system

flow_system = create_district_heating_system()
flow_system.connect_and_transform()

timesteps = flow_system.timesteps
print(f'FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')
print(f'Components: {list(flow_system.components.keys())}')

In [None]:
# Visualize input data
heat_demand = flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile
electricity_price = flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs']

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)
fig.add_trace(go.Scatter(x=timesteps, y=heat_demand.values, name='Heat Demand', line=dict(width=0.5)), row=1, col=1)
fig.add_trace(
    go.Scatter(x=timesteps, y=electricity_price.values, name='Electricity Price', line=dict(width=0.5)), row=2, col=1
)
fig.update_layout(height=400, title='One Month of Input Data')
fig.update_yaxes(title_text='Heat Demand [MW]', row=1, col=1)
fig.update_yaxes(title_text='El. Price [€/MWh]', row=2, col=1)
fig.show()

## Method 1: Full Optimization (Baseline)

First, solve the complete problem with all 2976 timesteps:

In [None]:
solver = fx.solvers.HighsSolver(mip_gap=0.01)

start = timeit.default_timer()
fs_full = flow_system.copy()
fs_full.optimize(solver)
time_full = timeit.default_timer() - start

print(f'Full optimization: {time_full:.1f} seconds')
print(f'Total cost: {fs_full.solution["costs"].item():,.0f} €')
print('\nOptimized sizes:')
for name, size in fs_full.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

## Method 2: Clustering with `cluster()`

The `cluster()` method:

1. **Clusters similar days** using the TSAM (Time Series Aggregation Module) package
2. **Reduces timesteps** to only typical periods (e.g., 8 typical days = 768 timesteps)
3. **Weights costs** by how many original days each typical day represents
4. **Handles storage** with configurable behavior via `storage_mode`

!!! warning "Peak Forcing"
    Always use `time_series_for_high_peaks` to ensure extreme demand days are captured.
    Without this, clustering may miss peak periods, causing undersized components.

In [None]:
start = timeit.default_timer()

# IMPORTANT: Force inclusion of peak demand periods!
peak_series = ['HeatDemand(Q_th)|fixed_relative_profile']

# Create reduced FlowSystem with 8 typical days
fs_clustered = flow_system.transform.cluster(
    n_clusters=8,  # 8 typical days
    cluster_duration='1D',  # Daily clustering
    time_series_for_high_peaks=peak_series,  # Capture peak demand day
)

time_clustering = timeit.default_timer() - start
print(f'Clustering time: {time_clustering:.1f} seconds')
print(f'Reduced: {len(flow_system.timesteps)} → {len(fs_clustered.timesteps)} timesteps')

In [None]:
# Optimize the reduced system
start = timeit.default_timer()
fs_clustered.optimize(solver)
time_clustered = timeit.default_timer() - start

print(f'Clustered optimization: {time_clustered:.1f} seconds')
print(f'Total cost: {fs_clustered.solution["costs"].item():,.0f} €')
print(f'\nSpeedup vs full: {time_full / (time_clustering + time_clustered):.1f}x')
print('\nOptimized sizes:')
for name, size in fs_clustered.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

## Understanding the Clustering

The clustering algorithm groups similar days together. Let's inspect the cluster structure:

In [None]:
# Show clustering info
info = fs_clustered.clustering
cs = info.result.cluster_structure
print('Clustering Configuration:')
print(f'  Number of typical periods: {cs.n_clusters}')
print(f'  Timesteps per period: {cs.timesteps_per_cluster}')
print(f'  Total reduced timesteps: {cs.n_clusters * cs.timesteps_per_cluster}')
print(f'  Cluster order (first 10 days): {cs.cluster_order.values[:10]}...')

# Show how many times each cluster appears
cluster_order = cs.cluster_order.values
unique, counts = np.unique(cluster_order, return_counts=True)
print('\nCluster occurrences:')
for cluster_id, count in zip(unique, counts, strict=False):
    print(f'  Cluster {cluster_id}: {count} days')

## Method 3: Two-Stage Workflow (Recommended)

The recommended approach for investment optimization:

1. **Stage 1**: Fast sizing with `cluster()` 
2. **Stage 2**: Fix sizes (with safety margin) and dispatch at full resolution

!!! tip "Safety Margin"
    Typical periods aggregate similar days, so individual days may have higher demand 
    than the typical day. Adding a 5-10% margin ensures feasibility.

In [None]:
# Stage 1 already done above
print('Stage 1: Sizing with typical periods')
print(f'  Time: {time_clustering + time_clustered:.1f} seconds')
print(f'  Cost estimate: {fs_clustered.solution["costs"].item():,.0f} €')

# Apply safety margin to sizes
SAFETY_MARGIN = 1.05  # 5% buffer
sizes_with_margin = {name: float(size.item()) * SAFETY_MARGIN for name, size in fs_clustered.statistics.sizes.items()}
print(f'\nSizes with {(SAFETY_MARGIN - 1) * 100:.0f}% safety margin:')
for name, size in sizes_with_margin.items():
    original = fs_clustered.statistics.sizes[name].item()
    print(f'  {name}: {original:.1f} → {size:.1f}')

In [None]:
# Stage 2: Fix sizes and optimize at full resolution
print('Stage 2: Dispatch at full resolution')
start = timeit.default_timer()

fs_dispatch = flow_system.transform.fix_sizes(sizes_with_margin)
fs_dispatch.optimize(solver)

time_dispatch = timeit.default_timer() - start
print(f'  Time: {time_dispatch:.1f} seconds')
print(f'  Actual cost: {fs_dispatch.solution["costs"].item():,.0f} €')

# Total comparison
total_two_stage = time_clustering + time_clustered + time_dispatch
print(f'\nTotal two-stage time: {total_two_stage:.1f} seconds')
print(f'Speedup vs full: {time_full / total_two_stage:.1f}x')

## Compare Results

In [None]:
results = {
    'Full (baseline)': {
        'Time [s]': time_full,
        'Cost [€]': fs_full.solution['costs'].item(),
        'CHP': fs_full.statistics.sizes['CHP(Q_th)'].item(),
        'Boiler': fs_full.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage': fs_full.statistics.sizes['Storage'].item(),
    },
    'Clustered (8 days)': {
        'Time [s]': time_clustering + time_clustered,
        'Cost [€]': fs_clustered.solution['costs'].item(),
        'CHP': fs_clustered.statistics.sizes['CHP(Q_th)'].item(),
        'Boiler': fs_clustered.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage': fs_clustered.statistics.sizes['Storage'].item(),
    },
    'Two-Stage': {
        'Time [s]': total_two_stage,
        'Cost [€]': fs_dispatch.solution['costs'].item(),
        'CHP': sizes_with_margin['CHP(Q_th)'],
        'Boiler': sizes_with_margin['Boiler(Q_th)'],
        'Storage': sizes_with_margin['Storage'],
    },
}

comparison = pd.DataFrame(results).T
baseline_cost = comparison.loc['Full (baseline)', 'Cost [€]']
baseline_time = comparison.loc['Full (baseline)', 'Time [s]']
comparison['Cost Gap [%]'] = ((comparison['Cost [€]'] - baseline_cost) / abs(baseline_cost) * 100).round(2)
comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1)

comparison.style.format(
    {
        'Time [s]': '{:.1f}',
        'Cost [€]': '{:,.0f}',
        'CHP': '{:.1f}',
        'Boiler': '{:.1f}',
        'Storage': '{:.0f}',
        'Cost Gap [%]': '{:.2f}',
        'Speedup': '{:.1f}x',
    }
)

## Expand Solution to Full Resolution

Use `expand_solution()` to map the clustered solution back to all original timesteps.
This repeats the typical period values for all days belonging to that cluster:

In [None]:
# Expand the clustered solution to full resolution
fs_expanded = fs_clustered.transform.expand_solution()

print(f'Expanded: {len(fs_clustered.timesteps)} → {len(fs_expanded.timesteps)} timesteps')
print(f'Cost: {fs_expanded.solution["costs"].item():,.0f} €')

In [None]:
# Compare heat balance: Full vs Expanded
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=['Full Optimization', 'Expanded from Clustering'])

# Full
for var in ['CHP(Q_th)', 'Boiler(Q_th)']:
    values = fs_full.solution[f'{var}|flow_rate'].values
    fig.add_trace(go.Scatter(x=fs_full.timesteps, y=values, name=var, legendgroup=var, showlegend=True), row=1, col=1)

# Expanded
for var in ['CHP(Q_th)', 'Boiler(Q_th)']:
    values = fs_expanded.solution[f'{var}|flow_rate'].values
    fig.add_trace(
        go.Scatter(x=fs_expanded.timesteps, y=values, name=var, legendgroup=var, showlegend=False), row=2, col=1
    )

fig.update_layout(height=500, title='Heat Production Comparison')
fig.update_yaxes(title_text='MW', row=1, col=1)
fig.update_yaxes(title_text='MW', row=2, col=1)
fig.show()

## Visualize Clustered Heat Balance

In [None]:
fs_clustered.statistics.plot.storage('Storage')

In [None]:
fs_expanded.statistics.plot.storage('Storage')

## API Reference

### `transform.cluster()` Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `n_clusters` | `int` | Number of typical periods (e.g., 8 typical days) |
| `cluster_duration` | `str \| float` | Duration per cluster ('1D', '24h') or hours |
| `weights` | `dict[str, float]` | Optional weights for time series in clustering |
| `time_series_for_high_peaks` | `list[str]` | **Essential**: Force inclusion of peak periods |
| `time_series_for_low_peaks` | `list[str]` | Force inclusion of minimum periods |

### Storage Behavior

Each `Storage` component has a `cluster_storage_mode` parameter that controls how it behaves during clustering:

| Mode | Description |
|------|-------------|
| `'intercluster_cyclic'` | Links storage across clusters + yearly cyclic **(default)** |
| `'intercluster'` | Links storage across clusters, free start/end |
| `'cyclic'` | Each cluster is independent but cyclic (start = end) |
| `'independent'` | Each cluster is independent, free start/end |

For a detailed comparison of storage modes, see [08c2-clustering-storage-modes](08c2-clustering-storage-modes.ipynb).

### Peak Forcing Format

```python
time_series_for_high_peaks = ['ComponentName(FlowName)|fixed_relative_profile']
```

### Recommended Workflow

```python
# Stage 1: Fast sizing
fs_sizing = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    time_series_for_high_peaks=['Demand(Flow)|fixed_relative_profile'],
)
fs_sizing.optimize(solver)

# Apply safety margin
sizes = {k: v.item() * 1.05 for k, v in fs_sizing.statistics.sizes.items()}

# Stage 2: Accurate dispatch
fs_dispatch = flow_system.transform.fix_sizes(sizes)
fs_dispatch.optimize(solver)
```

## Summary

You learned how to:

- Use **`cluster()`** to reduce time series into typical periods
- Apply **peak forcing** to capture extreme demand days
- Use **two-stage optimization** for fast yet accurate investment decisions
- **Expand solutions** back to full resolution with `expand_solution()`

### Key Takeaways

1. **Always use peak forcing** (`time_series_for_high_peaks`) for demand time series
2. **Add safety margin** (5-10%) when fixing sizes from clustering
3. **Two-stage is recommended**: clustering for sizing, full resolution for dispatch
4. **Storage handling** is configurable via `storage_mode`

### Next Steps

- **[08c2-clustering-storage-modes](08c2-clustering-storage-modes.ipynb)**: Compare storage modes using a seasonal storage system
- **[08d-clustering-multiperiod](08d-clustering-multiperiod.ipynb)**: Clustering with multiple periods and scenarios