# Clustering and Segmentation with tsam

Speed up large problems by reducing time series complexity using the [tsam](https://github.com/FZJ-IEK3-VSA/tsam) package.

This notebook demonstrates two complementary techniques:

- **Clustering** (inter-period): Identify typical periods (e.g., 8 typical days from 365 days)
- **Segmentation** (inner-period): Reduce timesteps within periods (e.g., 24 hours to 4 segments)

Both can be used independently or combined for maximum speedup.

!!! note "Requirements"
    This notebook requires the `tsam` package: `pip install tsam`

## Setup

In [3]:
import timeit

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import flixopt as fx

fx.CONFIG.notebook()

flixopt.config.CONFIG

## Load the FlowSystem

We use a pre-built district heating system with real-world time series data (one month at 15-min resolution):

In [4]:
# Load the district heating system (real data from Zeitreihen2020.csv)
flow_system = fx.FlowSystem.from_netcdf('data/district_heating_system.nc4')

timesteps = flow_system.timesteps
print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')
print(f'Components: {list(flow_system.components.keys())}')

OSError: Failed to load FlowSystem from NetCDF file data/district_heating_system.nc4: [Errno 2] No such file or directory: '/Users/felix/PycharmProjects/flixopt_182303/docs/notebooks/data/district_heating_system.nc4'

In [None]:
# Visualize first two weeks of data
heat_demand = flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile
electricity_price = flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs']

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)

fig.add_trace(go.Scatter(x=timesteps[:1344], y=heat_demand.values[:1344], name='Heat Demand'), row=1, col=1)
fig.add_trace(go.Scatter(x=timesteps[:1344], y=electricity_price.values[:1344], name='Electricity Price'), row=2, col=1)

fig.update_layout(height=400, title='First Two Weeks of Data')
fig.update_yaxes(title_text='Heat Demand [MW]', row=1, col=1)
fig.update_yaxes(title_text='El. Price [€/MWh]', row=2, col=1)
fig.show()

## Part 1: Clustering (Inter-Period Aggregation)

**Clustering** groups similar periods together to find representative "typical" periods.

For example, with 31 days of data:
- Original: 31 days × 96 timesteps/day = 2,976 timesteps  
- Clustered (8 typical days): 8 days × 96 timesteps/day = 768 representative timesteps

The optimizer only solves for 8 unique days, but weights results by how often each typical day occurred.

```python
fs.transform.cluster(
    n_clusters=8,           # Find 8 typical days
    cluster_duration='1D',  # Each cluster is 1 day
)
```

In [None]:
# Cluster with 8 typical days (from 31 days)
fs_clustering_demo = flow_system.copy()
fs_clustered_demo = fs_clustering_demo.transform.cluster(n_clusters=8, cluster_duration='1D')

# Get the clustering object to access tsam results
clustering = fs_clustered_demo._clustering_info['clustering']

print(f'Original: {len(flow_system.timesteps)} timesteps ({len(flow_system.timesteps) / 96:.0f} days)')
print(f'Clustered: {clustering.nr_of_periods} typical days')
print(f'Cluster assignments: {list(clustering.tsam.clusterOrder)}')

# Plot original vs aggregated data
clustering.plot()

### Comparing Different Cluster Counts

More clusters = better accuracy but less speedup. Let's compare:

In [16]:
# Test different numbers of clusters
cluster_configs = [4, 8, 12, 16]
clustering_results = {}

for n in cluster_configs:
    fs_test = flow_system.copy()
    fs_clustered = fs_test.transform.cluster(n_clusters=n, cluster_duration='1D')
    clustering_results[n] = fs_clustered._clustering_info['clustering']

# Use heat demand for comparison (most relevant for district heating)
heat_demand_col = [c for c in clustering_results[4].original_data.columns if 'Heat' in c or 'Q_th' in c][0]
print(f'Comparing: {heat_demand_col}')

Comparing: HeatDemand(Q_th)|fixed_relative_profile


In [17]:
# Compare the aggregated data for each configuration
fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=[f'{n} Typical Days' for n in cluster_configs],
    shared_xaxes=True,
    shared_yaxes=True,
    vertical_spacing=0.12,
    horizontal_spacing=0.08,
)

for i, (_n, clustering) in enumerate(clustering_results.items()):
    row, col = divmod(i, 2)
    row += 1
    col += 1

    original = clustering.original_data[heat_demand_col]
    aggregated = clustering.aggregated_data[heat_demand_col]

    fig.add_trace(
        go.Scatter(
            x=list(range(len(original))),
            y=original.values,
            name='Original',
            line=dict(color='lightgray'),
            showlegend=(i == 0),
        ),
        row=row,
        col=col,
    )
    fig.add_trace(
        go.Scatter(
            x=list(range(len(aggregated))),
            y=aggregated.values,
            name='Clustered',
            line=dict(color='blue', width=2),
            showlegend=(i == 0),
        ),
        row=row,
        col=col,
    )

fig.update_layout(
    title='Heat Demand: Original vs Clustered',
    height=500,
    legend=dict(orientation='h', yanchor='bottom', y=1.02),
)
fig.update_xaxes(title_text='Timestep', row=2)
fig.update_yaxes(title_text='Heat Demand [MW]', col=1)
fig.show()

In [18]:
# Calculate error metrics for each configuration
metrics = []
for n, clustering in clustering_results.items():
    original = clustering.original_data[heat_demand_col].values
    aggregated = clustering.aggregated_data[heat_demand_col].values

    rmse = np.sqrt(np.mean((original - aggregated) ** 2))
    mae = np.mean(np.abs(original - aggregated))
    max_error = np.max(np.abs(original - aggregated))
    correlation = np.corrcoef(original, aggregated)[0, 1]

    metrics.append(
        {
            'Typical Days': n,
            'RMSE': rmse,
            'MAE': mae,
            'Max Error': max_error,
            'Correlation': correlation,
        }
    )

metrics_df = pd.DataFrame(metrics).set_index('Typical Days')
metrics_df.style.format(
    {
        'RMSE': '{:.2f}',
        'MAE': '{:.2f}',
        'Max Error': '{:.2f}',
        'Correlation': '{:.4f}',
    }
)

Unnamed: 0_level_0,RMSE,MAE,Max Error,Correlation
Typical Days,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
4,4.84,4.52,12.19,0.9905
8,3.45,2.6,6.89,0.9952
12,1.68,0.83,6.39,0.9989
16,0.37,0.25,1.86,0.9999


## Part 2: Segmentation (Inner-Period Aggregation)

**Segmentation** reduces the number of timesteps *within* each period by grouping similar consecutive timesteps.

For example, with 15-minute resolution data:
- Original day: 96 timesteps (24h × 4 per hour)
- Segmented (4 segments): 4 representative timesteps per day

This is useful when you have high-resolution data but don't need that granularity for your analysis.

```python
fs.transform.cluster(
    n_clusters=None,        # Skip clustering (keep all periods)
    cluster_duration='1D',  # Segment within each day
    n_segments=4,           # Reduce to 4 segments per day
)
```

In [None]:
# Segmentation only: reduce 96 timesteps/day to 4 segments/day
fs_segmentation_demo = flow_system.copy()
fs_segmented_demo = fs_segmentation_demo.transform.cluster(
    n_clusters=None,  # No clustering - keep all 31 days
    cluster_duration='1D',  # Segment within each day
    n_segments=4,  # 4 segments per day
)

# Get the clustering object
segmentation = fs_segmented_demo._clustering_info['clustering']

print('Original: 96 timesteps per day (15-min resolution)')
print(f'Segmented: {segmentation.n_segments} segments per day')

# Plot original vs segmented data
segmentation.plot()

### Comparing Different Segment Counts

More segments = better accuracy but less speedup:

In [None]:
# Test different numbers of segments
segment_configs = [4, 8, 12, 24]
segmentation_results = {}

for n_seg in segment_configs:
    fs_test = flow_system.copy()
    fs_seg = fs_test.transform.cluster(n_clusters=None, cluster_duration='1D', n_segments=n_seg)
    segmentation_results[n_seg] = fs_seg._clustering_info['clustering']

# Use heat demand for comparison
heat_demand_col = [c for c in segmentation_results[4].original_data.columns if 'Heat' in c or 'Q_th' in c][0]
print(f'Comparing: {heat_demand_col}')

In [None]:
# Compare the segmented data for first day only (clearer visualization)
fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=[f'{n} Segments per Day' for n in segment_configs],
    shared_xaxes=True,
    shared_yaxes=True,
    vertical_spacing=0.12,
    horizontal_spacing=0.08,
)

# Only show first day (96 timesteps) for clarity
day_length = 96

for i, (_n_seg, seg_result) in enumerate(segmentation_results.items()):
    row, col = divmod(i, 2)
    row += 1
    col += 1

    original = seg_result.original_data[heat_demand_col][:day_length]
    aggregated = seg_result.aggregated_data[heat_demand_col][:day_length]

    fig.add_trace(
        go.Scatter(
            x=list(range(len(original))),
            y=original.values,
            name='Original',
            line=dict(color='lightgray'),
            showlegend=(i == 0),
        ),
        row=row,
        col=col,
    )
    fig.add_trace(
        go.Scatter(
            x=list(range(len(aggregated))),
            y=aggregated.values,
            name='Segmented',
            line=dict(color='green', width=2),
            showlegend=(i == 0),
        ),
        row=row,
        col=col,
    )

fig.update_layout(
    title='Heat Demand (First Day): Original vs Segmented',
    height=500,
    legend=dict(orientation='h', yanchor='bottom', y=1.02),
)
fig.update_xaxes(title_text='Timestep', row=2)
fig.update_yaxes(title_text='Heat Demand [MW]', col=1)
fig.show()

In [None]:
# Calculate error metrics for segmentation
seg_metrics = []
for n_seg, seg_result in segmentation_results.items():
    original = seg_result.original_data[heat_demand_col].values
    aggregated = seg_result.aggregated_data[heat_demand_col].values

    rmse = np.sqrt(np.mean((original - aggregated) ** 2))
    mae = np.mean(np.abs(original - aggregated))
    max_error = np.max(np.abs(original - aggregated))
    correlation = np.corrcoef(original, aggregated)[0, 1]

    seg_metrics.append(
        {
            'Segments': n_seg,
            'RMSE': rmse,
            'MAE': mae,
            'Max Error': max_error,
            'Correlation': correlation,
        }
    )

seg_metrics_df = pd.DataFrame(seg_metrics).set_index('Segments')
seg_metrics_df.style.format(
    {
        'RMSE': '{:.2f}',
        'MAE': '{:.2f}',
        'Max Error': '{:.2f}',
        'Correlation': '{:.4f}',
    }
)

## Part 3: Combined Clustering + Segmentation

For maximum speedup, combine both techniques:

```python
fs.transform.cluster(
    n_clusters=8,           # 8 typical days (inter-period)
    cluster_duration='1D',
    n_segments=4,           # 4 segments per day (inner-period)
)
```

This reduces 2,976 timesteps to just 8 × 4 = 32 representative timesteps!

In [None]:
# Combined: 8 typical days × 4 segments each
fs_combined_demo = flow_system.copy()
fs_combined = fs_combined_demo.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    n_segments=4,
)

combined_clustering = fs_combined._clustering_info['clustering']

print(f'Original: {len(flow_system.timesteps)} timesteps')
print(
    f'Combined: {combined_clustering.nr_of_periods} typical days × {combined_clustering.n_segments} segments = {combined_clustering.nr_of_periods * combined_clustering.n_segments} representative timesteps'
)

# Plot the combined result
combined_clustering.plot()

## Performance Comparison

Now let's compare the optimization performance of all approaches.

### Baseline: Full Optimization (No Aggregation)

In [19]:
solver = fx.solvers.HighsSolver(mip_gap=0.01)

start = timeit.default_timer()
fs_full = flow_system.copy()
fs_full.optimize(solver)
time_full = timeit.default_timer() - start

print(f'Full optimization: {time_full:.2f} seconds')
print(f'Cost: {fs_full.solution["costs"].item():,.0f} €')
print('\nOptimized sizes:')
for name, size in fs_full.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

[2m                       [0m          │ [33m│[0m  array([0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3], shape=(2976,))
[2m                       [0m          │ [33m│[0m  Coordinates:
[2m                       [0m          │ [33m└─[0m   * time     (time) datetime64[ns] 24kB 2020-01-01 ... 2020-01-31T23:45:00 and no status_parameters. This prevents the Flow from switching inactive (flow_rate = 0). Consider using status_parameters to allow the Flow to be switched active and inactive.
[2m                       [0m          │ [33m│[0m  array([0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1], shape=(2976,))
[2m                       [0m          │ [33m│[0m  Coordinates:
[2m                       [0m          │ [33m└─[0m   * time     (time) datetime64[ns] 24kB 2020-01-01 ... 2020-01-31T23:45:00 and no status_parameters. This prevents the Flow from switching inactive (flow_rate = 0). Consider using status_parameters to allow the Flow to be switched active and inactive.


Writing constraints.: 100%|[38;2;128;191;255m██████████[0m| 64/64 [00:00<00:00, 75.17it/s] 
Writing continuous variables.: 100%|[38;2;128;191;255m██████████[0m| 55/55 [00:00<00:00, 421.23it/s]
Writing binary variables.: 100%|[38;2;128;191;255m██████████[0m| 5/5 [00:00<00:00, 368.46it/s]


Running HiGHS 1.12.0 (git hash: 755a8e0): Copyright (c) 2025 HiGHS under MIT licence terms
MIP linopy-problem-dqtvcofp has 89316 rows; 80386 cols; 264919 nonzeros; 5955 integer variables (5955 binary)
Coefficient ranges:
  Matrix  [1e-05, 1e+03]
  Cost    [1e+00, 1e+00]
  Bound   [1e+00, 1e+03]
  RHS     [1e+00, 1e+00]
Presolving model
38694 rows, 26790 cols, 92267 nonzeros  0s
31169 rows, 18018 cols, 88849 nonzeros  0s
30836 rows, 17685 cols, 89182 nonzeros  0s
Presolve reductions: rows 30836(-58480); columns 17685(-62701); nonzeros 89182(-175737) 

Solving MIP model with:
   30836 rows
   17685 cols (5955 binary, 0 integer, 0 implied int., 11730 continuous, 0 domain fixed)
   89182 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; H => Heuristic;
     I => Shifting; J => Feasibility jump; L => Sub-MIP; P => Empty MIP; R => Randomized rounding;
     S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution; Y => HiGHS solution;
     Z => ZI Round

### Clustering Only (8 Typical Days)

In [20]:
start = timeit.default_timer()

# Cluster into 8 typical days
fs_clustered = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
)

fs_clustered.optimize(solver)
time_clustered = timeit.default_timer() - start

print(f'Clustered optimization: {time_clustered:.2f} seconds')
print(f'Cost: {fs_clustered.solution["costs"].item():,.0f} €')
print(f'Speedup: {time_full / time_clustered:.1f}x')
print('\nOptimized sizes:')
for name, size in fs_clustered.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

[2m                       [0m          │ [33m│[0m  array([0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3], shape=(2976,))
[2m                       [0m          │ [33m│[0m  Coordinates:
[2m                       [0m          │ [33m└─[0m   * time     (time) datetime64[ns] 24kB 2020-01-01 ... 2020-01-31T23:45:00 and no status_parameters. This prevents the Flow from switching inactive (flow_rate = 0). Consider using status_parameters to allow the Flow to be switched active and inactive.
[2m                       [0m          │ [33m│[0m  array([0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1], shape=(2976,))
[2m                       [0m          │ [33m│[0m  Coordinates:
[2m                       [0m          │ [33m└─[0m   * time     (time) datetime64[ns] 24kB 2020-01-01 ... 2020-01-31T23:45:00 and no status_parameters. This prevents the Flow from switching inactive (flow_rate = 0). Consider using status_parameters to allow the Flow to be switched active and inactive.


Writing constraints.: 100%|[38;2;128;191;255m██████████[0m| 81/81 [00:01<00:00, 65.44it/s]
Writing continuous variables.: 100%|[38;2;128;191;255m██████████[0m| 55/55 [00:00<00:00, 808.42it/s]
Writing binary variables.: 100%|[38;2;128;191;255m██████████[0m| 5/5 [00:00<00:00, 766.39it/s]


Running HiGHS 1.12.0 (git hash: 755a8e0): Copyright (c) 2025 HiGHS under MIT licence terms
MIP linopy-problem-bhnhp1id has 126461 rows; 80386 cols; 339209 nonzeros; 5955 integer variables (5955 binary)
Coefficient ranges:
  Matrix  [1e-05, 1e+03]
  Cost    [1e+00, 1e+00]
  Bound   [1e+00, 1e+03]
  RHS     [1e+00, 1e+00]
Presolving model
41449 rows, 7695 cols, 100532 nonzeros  0s
9148 rows, 5691 cols, 23883 nonzeros  0s
8222 rows, 4788 cols, 23865 nonzeros  0s
Presolve reductions: rows 8222(-118239); columns 4788(-75598); nonzeros 23865(-315344) 

Solving MIP model with:
   8222 rows
   4788 cols (1585 binary, 0 integer, 0 implied int., 3203 continuous, 0 domain fixed)
   23865 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; H => Heuristic;
     I => Shifting; J => Feasibility jump; L => Sub-MIP; P => Empty MIP; R => Randomized rounding;
     S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution; Y => HiGHS solution;
     Z => ZI Round; l => 

### Segmentation Only (4 Segments per Day)

In [None]:
start = timeit.default_timer()

# Segmentation only: reduce timesteps within each day
fs_segmented = flow_system.transform.cluster(
    n_clusters=None,  # No clustering
    cluster_duration='1D',
    n_segments=4,  # 4 segments per day
)

fs_segmented.optimize(solver)
time_segmented = timeit.default_timer() - start

print(f'Segmentation optimization: {time_segmented:.2f} seconds')
print(f'Cost: {fs_segmented.solution["costs"].item():,.0f} €')
print(f'Speedup: {time_full / time_segmented:.1f}x')
print('\nOptimized sizes:')
for name, size in fs_segmented.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

### Combined: Clustering + Segmentation

In [None]:
start = timeit.default_timer()

# Combined: 8 typical days × 4 segments each
fs_combined_opt = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    n_segments=4,
)

fs_combined_opt.optimize(solver)
time_combined = timeit.default_timer() - start

print(f'Combined optimization: {time_combined:.2f} seconds')
print(f'Cost: {fs_combined_opt.solution["costs"].item():,.0f} €')
print(f'Speedup: {time_full / time_combined:.1f}x')
print('\nOptimized sizes:')
for name, size in fs_combined_opt.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

## Compare Results

In [None]:
results = {
    'Full (baseline)': {
        'Time [s]': time_full,
        'Cost [€]': fs_full.solution['costs'].item(),
        'CHP Size': fs_full.statistics.sizes['CHP(Q_th)'].item(),
        'Boiler Size': fs_full.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage Size': fs_full.statistics.sizes['Storage'].item(),
    },
    'Clustering (8 days)': {
        'Time [s]': time_clustered,
        'Cost [€]': fs_clustered.solution['costs'].item(),
        'CHP Size': fs_clustered.statistics.sizes['CHP(Q_th)'].item(),
        'Boiler Size': fs_clustered.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage Size': fs_clustered.statistics.sizes['Storage'].item(),
    },
    'Segmentation (4 seg)': {
        'Time [s]': time_segmented,
        'Cost [€]': fs_segmented.solution['costs'].item(),
        'CHP Size': fs_segmented.statistics.sizes['CHP(Q_th)'].item(),
        'Boiler Size': fs_segmented.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage Size': fs_segmented.statistics.sizes['Storage'].item(),
    },
    'Combined (8×4)': {
        'Time [s]': time_combined,
        'Cost [€]': fs_combined_opt.solution['costs'].item(),
        'CHP Size': fs_combined_opt.statistics.sizes['CHP(Q_th)'].item(),
        'Boiler Size': fs_combined_opt.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage Size': fs_combined_opt.statistics.sizes['Storage'].item(),
    },
}

comparison = pd.DataFrame(results).T
baseline_cost = comparison.loc['Full (baseline)', 'Cost [€]']
baseline_time = comparison.loc['Full (baseline)', 'Time [s]']
comparison['Cost Gap [%]'] = ((comparison['Cost [€]'] - baseline_cost) / abs(baseline_cost) * 100).round(2)
comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1)

comparison.style.format(
    {
        'Time [s]': '{:.2f}',
        'Cost [€]': '{:,.0f}',
        'CHP Size': '{:.1f}',
        'Boiler Size': '{:.1f}',
        'Storage Size': '{:.0f}',
        'Cost Gap [%]': '{:.2f}',
        'Speedup': '{:.1f}x',
    }
)

## Multi-Period Clustering

For multi-year investment studies, clustering is applied **independently per period** (year).
Each year gets its own set of typical days:

In [None]:
# Load raw data for multi-period example
data = pd.read_csv('../../examples/resources/Zeitreihen2020.csv', index_col=0, parse_dates=True).sort_index()
data_2w = data['2020-01-01':'2020-01-14 23:45:00']  # Two weeks
timesteps_2w = data_2w.index

# Build system with periods
fs_mp = fx.FlowSystem(
    timesteps_2w,
    periods=pd.Index([2024, 2025, 2026], name='year'),
)

# Scale demands by year (growing demand)
heat_demand_2w = data_2w['Q_Netz/MW'].to_numpy()
elec_demand_2w = data_2w['P_Netz/MW'].to_numpy()
elec_price_2w = data_2w['Strompr.€/MWh'].to_numpy()
gas_price_2w = data_2w['Gaspr.€/MWh'].to_numpy()

# Create period-varying profiles (demand grows 5% per year)
heat_profile = fx.TimeSeriesData(
    np.stack([heat_demand_2w * 1.0, heat_demand_2w * 1.05, heat_demand_2w * 1.10]),
    dims=['period', 'time'],
)
elec_profile = fx.TimeSeriesData(
    np.stack([elec_demand_2w * 1.0, elec_demand_2w * 1.05, elec_demand_2w * 1.10]),
    dims=['period', 'time'],
)

fs_mp.add_elements(
    fx.Bus('Electricity'),
    fx.Bus('Heat'),
    fx.Bus('Gas'),
    fx.Effect('costs', '€', is_standard=True, is_objective=True),
    fx.linear_converters.Boiler(
        'Boiler',
        thermal_efficiency=0.85,
        thermal_flow=fx.Flow('Q_th', bus='Heat', size=350),
        fuel_flow=fx.Flow('Q_fu', bus='Gas'),
    ),
    fx.Source(
        'GasGrid',
        outputs=[fx.Flow('Q_Gas', bus='Gas', size=1000, effects_per_flow_hour={'costs': gas_price_2w})],
    ),
    fx.Source(
        'GridBuy',
        outputs=[fx.Flow('P_el', bus='Electricity', size=1000, effects_per_flow_hour={'costs': elec_price_2w})],
    ),
    fx.Sink('HeatDemand', inputs=[fx.Flow('Q_th', bus='Heat', size=1, fixed_relative_profile=heat_profile)]),
    fx.Sink('ElecDemand', inputs=[fx.Flow('P_el', bus='Electricity', size=1, fixed_relative_profile=elec_profile)]),
)

print(f'Multi-period system: {len(fs_mp.timesteps)} timesteps × {len(fs_mp.periods)} periods')

In [23]:
# Cluster - each period gets clustered independently
fs_mp_clustered = fs_mp.transform.cluster(n_clusters=4, cluster_duration='1D')

# Get clustering info
clustering_info = fs_mp_clustered._clustering_info
print(f'Clustering was applied to {len(clustering_info["clustering_results"])} period(s):')
for (period, _scenario), _ in clustering_info['clustering_results'].items():
    print(f'  - period={period}')

Clustering was applied to 3 period(s):
  - period=2024
  - period=2025
  - period=2026


In [24]:
# Optimize
fs_mp_clustered.optimize(solver)
print(f'Multi-period clustered cost: {fs_mp_clustered.solution["costs"].sum().item():,.0f} €')

Writing constraints.: 100%|[38;2;128;191;255m██████████[0m| 38/38 [00:00<00:00, 80.29it/s]
Writing continuous variables.: 100%|[38;2;128;191;255m██████████[0m| 22/22 [00:00<00:00, 398.66it/s]


Running HiGHS 1.12.0 (git hash: 755a8e0): Copyright (c) 2025 HiGHS under MIT licence terms
LP linopy-problem-u73pgf9e has 49392 rows; 40356 cols; 131016 nonzeros
Coefficient ranges:
  Matrix  [2e-01, 2e+01]
  Cost    [1e+00, 1e+00]
  Bound   [5e+01, 1e+03]
  RHS     [0e+00, 0e+00]
Presolving model
0 rows, 0 cols, 0 nonzeros  0s
0 rows, 0 cols, 0 nonzeros  0s
Presolve reductions: rows 0(-49392); columns 0(-40356); nonzeros 0(-131016) - Reduced to empty
Performed postsolve
Solving the original LP from the solution after postsolve

Model name          : linopy-problem-u73pgf9e
Model status        : Optimal
Objective value     :  1.3352558890e+07
P-D objective error :  1.7437154695e-15
HiGHS run time      :          0.17
Multi-period clustered cost: 13,352,559 €


## API Reference

### `transform.cluster()` Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `n_clusters` | `int \| None` | Number of typical periods (e.g., 8 typical days). Set to `None` for segmentation-only. |
| `cluster_duration` | `str \| float` | Duration per cluster ('1D', '24h', or hours as float) |
| `n_segments` | `int \| None` | Segments within each period (inner-period aggregation). Default: `None` (no segmentation) |
| `aggregate_data` | `bool` | If True (default), aggregate time series data |
| `include_storage` | `bool` | Include storage in clustering constraints (default: True) |
| `flexibility_percent` | `float` | Allow binary variable deviations (default: 0) |
| `flexibility_penalty` | `float` | Penalty for deviations (default: 0) |
| `time_series_for_high_peaks` | `list` | Force inclusion of high-value periods |
| `time_series_for_low_peaks` | `list` | Force inclusion of low-value periods |

### Common Patterns

```python
# Clustering only: 8 typical days from a year
fs.transform.cluster(n_clusters=8, cluster_duration='1D')

# Segmentation only: reduce to 4 segments per day
fs.transform.cluster(n_clusters=None, cluster_duration='1D', n_segments=4)

# Combined: 8 typical days × 4 segments each
fs.transform.cluster(n_clusters=8, cluster_duration='1D', n_segments=4)

# Force inclusion of peak demand periods
fs.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    time_series_for_high_peaks=[heat_demand_ts],
)
```

## Summary

You learned how to:

- Use **clustering** (`n_clusters`) to identify typical periods (inter-period aggregation)
- Use **segmentation** (`n_segments`) to reduce timesteps within periods (inner-period aggregation)
- **Combine both** techniques for maximum speedup
- Cluster **multi-period** FlowSystems (each period independently)

### When to Use Each Technique

| Technique | Use Case | Example |
|-----------|----------|---------|
| **Clustering** | Many similar periods (days, weeks) | 365 days → 12 typical days |
| **Segmentation** | High-resolution data not needed | 96 timesteps/day → 4 segments |
| **Combined** | Large problems with high resolution | 365 × 96 → 12 × 4 = 48 timesteps |

### Accuracy vs. Speed Trade-off

| Approach | Speedup | Accuracy | Best For |
|----------|---------|----------|----------|
| More clusters/segments | Lower | Higher | Final results |
| Fewer clusters/segments | Higher | Lower | Screening, exploration |

### Next Steps

- **[08a-Aggregation](08a-aggregation.ipynb)**: Other aggregation techniques (resampling, two-stage)
- **[08b-Rolling Horizon](08b-rolling-horizon.ipynb)**: Sequential optimization for long time series