# Clustering Internals

Understanding the data structures behind time series clustering.

!!! note "Prerequisites"
    This notebook assumes familiarity with [08c-clustering](08c-clustering.ipynb).

In [None]:
from pathlib import Path

import flixopt as fx

fx.CONFIG.notebook()

# Load the district heating system
data_file = Path('data/district_heating_system.nc4')
if not data_file.exists():
    from data.generate_example_systems import create_district_heating_system

    fs = create_district_heating_system()
    fs.to_netcdf(data_file)

flow_system = fx.FlowSystem.from_netcdf(data_file)

## Clustering and ClusterInfo

After calling `cluster()`, metadata is stored in `fs._cluster_info`:

In [None]:
fs_clustered = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    time_series_for_high_peaks=['HeatDemand(Q_th)|fixed_relative_profile'],
)

fs_clustered._cluster_info

The `ClusterInfo` contains:
- **`result`**: A `ClusterResult` with timestep mapping and weights
- **`result.cluster_structure`**: A `ClusterStructure` with cluster assignments

In [None]:
fs_clustered._cluster_info.result

In [None]:
fs_clustered._cluster_info.result.cluster_structure

## Visualizing Clustering

Built-in plot methods show how original periods map to clusters:

In [None]:
# Which original period belongs to which cluster?
fs_clustered._cluster_info.result.cluster_structure.plot()

In [None]:
# What does each cluster's typical profile look like?
fs_clustered._cluster_info.plot_typical_periods('HeatDemand(Q_th)|fixed_relative_profile')

In [None]:
# How well does the aggregated data match the original?
fs_clustered._cluster_info.plot()

## Cluster Weights

Each representative timestep has a weight equal to the number of original periods it represents.
This ensures operational costs scale correctly:

$$\text{Objective} = \sum_{t \in \text{typical}} w_t \cdot c_t$$

The weights sum to the original timestep count:

In [None]:
print(f'Sum of weights: {fs_clustered.cluster_weight.sum().item():.0f}')
print(f'Original timesteps: {len(flow_system.timesteps)}')

## Solution Expansion

After optimization, `expand_solution()` maps results back to full resolution:

In [None]:
solver = fx.solvers.HighsSolver(mip_gap=0.01, log_to_console=False)
fs_clustered.optimize(solver)

fs_expanded = fs_clustered.transform.expand_solution()

print(f'Clustered: {len(fs_clustered.timesteps)} timesteps')
print(f'Expanded:  {len(fs_expanded.timesteps)} timesteps')

## Summary

| Class | Purpose |
|-------|--------|
| `ClusterInfo` | Stored on `fs._cluster_info` after `cluster()` |
| `ClusterResult` | Contains timestep mapping and weights |
| `ClusterStructure` | Maps original periods to clusters |

**Key methods:**
- `cluster_structure.plot()` - visualize cluster assignments
- `cluster_info.plot()` - compare original vs aggregated data
- `cluster_info.plot_typical_periods()` - view each cluster's profile