# Clustering with tsam

Speed up large problems by identifying typical periods using time series clustering.

This notebook demonstrates how to use **`transform.cluster()`** to reduce a year of time series data to representative days (typical periods).

!!! note "Requirements"
    This notebook requires the `tsam` package: `pip install tsam`

## Setup

In [7]:
import timeit

import pandas as pd

import flixopt as fx

fx.CONFIG.notebook()

flixopt.config.CONFIG

## Load Time Series Data

We use real-world district heating data at 15-minute resolution (one week for faster execution):

In [8]:
# Load time series data (15-min resolution)
data = pd.read_csv('../../examples/resources/Zeitreihen2020.csv', index_col=0, parse_dates=True).sort_index()
data = data['2020-01-01':'2020-01-07 23:45:00']  # One week
data.index.name = 'time'

timesteps = data.index

# Extract profiles
electricity_demand = data['P_Netz/MW'].to_numpy()
heat_demand = data['Q_Netz/MW'].to_numpy()
electricity_price = data['Strompr.€/MWh'].to_numpy()
gas_price = data['Gaspr.€/MWh'].to_numpy()

print(f'Timesteps: {len(timesteps)} ({len(timesteps) / 96:.0f} days at 15-min resolution)')
print(f'Heat demand: {heat_demand.min():.1f} - {heat_demand.max():.1f} MW')

Timesteps: 672 (7 days at 15-min resolution)
Heat demand: 122.2 - 254.0 MW


## Build a Simple FlowSystem

A district heating system with CHP, boiler, and storage:

In [9]:
def build_system(timesteps, heat_demand, electricity_demand, electricity_price, gas_price):
    """Build a district heating system."""
    fs = fx.FlowSystem(timesteps)

    fs.add_elements(
        # Buses
        fx.Bus('Electricity'),
        fx.Bus('Heat'),
        fx.Bus('Gas'),
        fx.Bus('Coal'),
        # Effects
        fx.Effect('costs', '€', 'Total Costs', is_standard=True, is_objective=True),
        # CHP
        fx.linear_converters.CHP(
            'CHP',
            thermal_efficiency=0.58,
            electrical_efficiency=0.22,
            status_parameters=fx.StatusParameters(effects_per_startup=1000),
            electrical_flow=fx.Flow('P_el', bus='Electricity', size=200),
            thermal_flow=fx.Flow('Q_th', bus='Heat', size=200, relative_minimum=0.3),
            fuel_flow=fx.Flow('Q_fu', bus='Coal', size=350, previous_flow_rate=100),  # size ≈ 200/0.58
        ),
        # Gas Boiler
        fx.linear_converters.Boiler(
            'Boiler',
            thermal_efficiency=0.85,
            status_parameters=fx.StatusParameters(effects_per_startup=500),
            thermal_flow=fx.Flow('Q_th', bus='Heat', size=100, relative_minimum=0.1),
            fuel_flow=fx.Flow('Q_fu', bus='Gas', size=120, previous_flow_rate=20),  # size ≈ 100/0.85
        ),
        # Thermal Storage
        fx.Storage(
            'Storage',
            capacity_in_flow_hours=500,
            initial_charge_state=100,
            eta_charge=0.95,
            eta_discharge=0.95,
            relative_loss_per_hour=0.001,
            charging=fx.Flow('Charge', size=100, bus='Heat'),
            discharging=fx.Flow('Discharge', size=100, bus='Heat'),
        ),
        # Fuel sources
        fx.Source(
            'GasGrid',
            outputs=[fx.Flow('Q_Gas', bus='Gas', size=1000, effects_per_flow_hour={'costs': gas_price})],
        ),
        fx.Source(
            'CoalSupply',
            outputs=[fx.Flow('Q_Coal', bus='Coal', size=1000, effects_per_flow_hour={'costs': 4.6})],
        ),
        # Electricity grid
        fx.Source(
            'GridBuy',
            outputs=[
                fx.Flow('P_el', bus='Electricity', size=1000, effects_per_flow_hour={'costs': electricity_price + 0.5})
            ],
        ),
        fx.Sink(
            'GridSell',
            inputs=[fx.Flow('P_el', bus='Electricity', size=1000, effects_per_flow_hour=-(electricity_price - 0.5))],
        ),
        # Demands
        fx.Sink('HeatDemand', inputs=[fx.Flow('Q_th', bus='Heat', size=1, fixed_relative_profile=heat_demand)]),
        fx.Sink(
            'ElecDemand', inputs=[fx.Flow('P_el', bus='Electricity', size=1, fixed_relative_profile=electricity_demand)]
        ),
    )

    return fs


flow_system = build_system(timesteps, heat_demand, electricity_demand, electricity_price, gas_price)
print(f'System: {len(timesteps)} timesteps')

System: 672 timesteps


## Baseline: Full Optimization

First, solve without clustering for comparison:

In [10]:
solver = fx.solvers.HighsSolver(mip_gap=0.01)

start = timeit.default_timer()
fs_full = flow_system.copy()
fs_full.optimize(solver)
time_full = timeit.default_timer() - start

print(f'Full optimization: {time_full:.2f} seconds')
print(f'Cost: {fs_full.solution["costs"].item():,.0f} €')



Writing constraints.: 100%|[38;2;128;191;255m██████████[0m| 71/71 [00:00<00:00, 122.37it/s]
Writing continuous variables.: 100%|[38;2;128;191;255m██████████[0m| 51/51 [00:00<00:00, 472.81it/s]
Writing binary variables.: 100%|[38;2;128;191;255m██████████[0m| 13/13 [00:00<00:00, 377.15it/s]


Running HiGHS 1.12.0 (git hash: 755a8e0): Copyright (c) 2025 HiGHS under MIT licence terms
MIP linopy-problem-vm730vxe has 26909 rows; 24221 cols; 84703 nonzeros; 8736 integer variables (8736 binary)
Coefficient ranges:
  Matrix  [1e-05, 1e+03]
  Cost    [1e+00, 1e+00]
  Bound   [1e+00, 1e+03]
  RHS     [1e-05, 1e+02]
Presolving model
17472 rows, 13440 cols, 45021 nonzeros  0s
14789 rows, 10964 cols, 45835 nonzeros  0s
12214 rows, 9019 cols, 39022 nonzeros  0s
Presolve reductions: rows 12214(-14695); columns 9019(-15202); nonzeros 39022(-45681) 

Solving MIP model with:
   12214 rows
   9019 cols (6824 binary, 0 integer, 0 implied int., 2195 continuous, 0 domain fixed)
   39022 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; H => Heuristic;
     I => Shifting; J => Feasibility jump; L => Sub-MIP; P => Empty MIP; R => Randomized rounding;
     S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution; Y => HiGHS solution;
     Z => ZI Round; l =>

## Basic Clustering

Cluster the time series into **4 typical days** (since we have 7 days of data):

```python
clustered_fs = flow_system.transform.cluster(
    n_clusters=4,           # Number of typical periods
    cluster_duration='1D',  # Duration per cluster (1 day)
)
```

In [11]:
start = timeit.default_timer()

# Cluster into 4 typical days
fs_clustered = flow_system.transform.cluster(
    n_clusters=4,
    cluster_duration='1D',
)

fs_clustered.optimize(solver)
time_clustered = timeit.default_timer() - start

print(f'Clustered optimization: {time_clustered:.2f} seconds')
print(f'Cost: {fs_clustered.solution["costs"].item():,.0f} €')
print(f'Speedup: {time_full / time_clustered:.1f}x')

Writing constraints.: 100%|[38;2;128;191;255m██████████[0m| 99/99 [00:00<00:00, 134.54it/s]
Writing continuous variables.: 100%|[38;2;128;191;255m██████████[0m| 51/51 [00:00<00:00, 739.63it/s]
Writing binary variables.: 100%|[38;2;128;191;255m██████████[0m| 13/13 [00:00<00:00, 407.20it/s]


Running HiGHS 1.12.0 (git hash: 755a8e0): Copyright (c) 2025 HiGHS under MIT licence terms
MIP linopy-problem-3zos3gx7 has 34889 rows; 24221 cols; 100663 nonzeros; 8736 integer variables (8736 binary)
Coefficient ranges:
  Matrix  [1e-05, 1e+03]
  Cost    [1e+00, 1e+00]
  Bound   [1e+00, 1e+03]
  RHS     [1e-05, 1e+02]
Presolving model
17852 rows, 7835 cols, 46161 nonzeros  0s
8771 rows, 6538 cols, 26638 nonzeros  0s
7501 rows, 5532 cols, 24162 nonzeros  0s
Presolve reductions: rows 7501(-27388); columns 5532(-18689); nonzeros 24162(-76501) 

Solving MIP model with:
   7501 rows
   5532 cols (4223 binary, 0 integer, 0 implied int., 1309 continuous, 0 domain fixed)
   24162 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; H => Heuristic;
     I => Shifting; J => Feasibility jump; L => Sub-MIP; P => Empty MIP; R => Randomized rounding;
     S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution; Y => HiGHS solution;
     Z => ZI Round; l => Triv

## Compare Results

In [12]:
results = {
    'Full (baseline)': {'Time [s]': time_full, 'Cost [€]': fs_full.solution['costs'].item()},
    'Clustered (4 days)': {'Time [s]': time_clustered, 'Cost [€]': fs_clustered.solution['costs'].item()},
}

comparison = pd.DataFrame(results).T
baseline_cost = comparison.loc['Full (baseline)', 'Cost [€]']
baseline_time = comparison.loc['Full (baseline)', 'Time [s]']
comparison['Cost Gap [%]'] = ((comparison['Cost [€]'] - baseline_cost) / abs(baseline_cost) * 100).round(2)
comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1)

comparison.style.format(
    {
        'Time [s]': '{:.2f}',
        'Cost [€]': '{:,.0f}',
        'Cost Gap [%]': '{:.2f}',
        'Speedup': '{:.1f}x',
    }
)

Unnamed: 0,Time [s],Cost [€],Cost Gap [%],Speedup
Full (baseline),11.54,510866,0.0,1.0x
Clustered (4 days),7.64,511017,0.03,1.5x


## API Reference

### `transform.cluster()` Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `n_clusters` | `int` | Number of typical periods (e.g., 8 typical days) |
| `cluster_duration` | `str \| float` | Duration per cluster ('1D', '24h', or hours as float) |
| `aggregate_data` | `bool` | If True (default), aggregate time series data |
| `include_storage` | `bool` | Include storage in clustering constraints (default: True) |
| `flexibility_percent` | `float` | Allow binary variable deviations (default: 0) |
| `flexibility_penalty` | `float` | Penalty for deviations (default: 0) |
| `time_series_for_high_peaks` | `list` | Force inclusion of high-value periods |
| `time_series_for_low_peaks` | `list` | Force inclusion of low-value periods |

### Common Patterns

```python
# 8 typical days from a year
fs.transform.cluster(n_clusters=8, cluster_duration='1D')

# 4 typical weeks
fs.transform.cluster(n_clusters=4, cluster_duration='1W')

# Force inclusion of peak demand periods
fs.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    time_series_for_high_peaks=[heat_demand_ts],
)
```

## Summary

You learned how to use **`transform.cluster()`** to identify typical periods and reduce computational complexity.

### When to Use Clustering

| Scenario | Recommendation |
|----------|----------------|
| Annual optimization | 8-12 typical days |
| Investment decisions | Use with two-stage optimization |
| Preserve extremes | Use `time_series_for_high_peaks` |

### Next Steps

- **[08a-Aggregation](08a-aggregation.ipynb)**: Other aggregation techniques (resampling, two-stage)
- **[08b-Rolling Horizon](08b-rolling-horizon.ipynb)**: Sequential optimization for long time series