# Clustering Internals

Understanding the data structures and visualization tools behind time series clustering.

This notebook demonstrates:

- **Data structure**: The `Clustering` class that stores all clustering information
- **Plot accessor**: Built-in visualizations via `.plot`
- **Data expansion**: Using `expand_data()` to map aggregated data back to original timesteps

!!! note "Prerequisites"
    This notebook assumes familiarity with [08c-clustering](08c-clustering.ipynb).

In [None]:
from data.generate_example_systems import create_district_heating_system

import flixopt as fx

fx.CONFIG.notebook()

flow_system = create_district_heating_system()
flow_system.connect_and_transform()

## Clustering Metadata

After calling `cluster()`, metadata is stored in `fs.clustering`:

In [None]:
from tsam.config import ExtremeConfig

fs_clustered = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    extremes=ExtremeConfig(method='new_cluster', max_value=['HeatDemand(Q_th)|fixed_relative_profile']),
)

fs_clustered.clustering

The `Clustering` object contains:
- **`cluster_order`**: Which cluster each original period maps to
- **`cluster_occurrences`**: How many original periods each cluster represents
- **`timestep_mapping`**: Maps each original timestep to its representative
- **`original_data`** / **`aggregated_data`**: The data before and after clustering
- **`results`**: `ClusteringResults` object with xarray-like interface (`.dims`, `.coords`, `.sel()`)

In [None]:
# Cluster order shows which cluster each original period maps to
fs_clustered.clustering.cluster_order

In [None]:
# Cluster occurrences shows how many original periods each cluster represents
fs_clustered.clustering.cluster_occurrences

## Visualizing Clustering

The `.plot` accessor provides built-in visualizations for understanding clustering results.

In [None]:
# Compare original vs aggregated data as timeseries
# By default, plots all time-varying variables
fs_clustered.clustering.plot.compare()

In [None]:
# Compare specific variables only
fs_clustered.clustering.plot.compare(variables='HeatDemand(Q_th)|fixed_relative_profile')

In [None]:
# Duration curves show how well the aggregated data preserves the distribution
fs_clustered.clustering.plot.compare(kind='duration_curve').data

In [None]:
# View typical period profiles for each cluster
# Each line represents a cluster's representative day
fs_clustered.clustering.plot.clusters(variables='HeatDemand(Q_th)|fixed_relative_profile')

In [None]:
# Heatmap shows cluster assignments for each original period
fs_clustered.clustering.plot.heatmap()

## Expanding Aggregated Data

The `Clustering.expand_data()` method maps aggregated data back to original timesteps.
This is useful for comparing clustering results before optimization:

In [None]:
# Get original and aggregated data
clustering = fs_clustered.clustering
original = clustering.original_data['HeatDemand(Q_th)|fixed_relative_profile']
aggregated = clustering.aggregated_data['HeatDemand(Q_th)|fixed_relative_profile']

# Expand aggregated data back to original timesteps
expanded = clustering.expand_data(aggregated)

print(f'Original:   {len(original.time)} timesteps')
print(f'Aggregated: {len(aggregated.time)} timesteps')
print(f'Expanded:   {len(expanded.time)} timesteps')

## Summary

| Property | Description |
|----------|-------------|
| `clustering.n_clusters` | Number of representative clusters |
| `clustering.timesteps_per_cluster` | Timesteps in each cluster period |
| `clustering.cluster_order` | Maps original periods to clusters |
| `clustering.cluster_occurrences` | Count of original periods per cluster |
| `clustering.timestep_mapping` | Maps original timesteps to representative indices |
| `clustering.original_data` | Dataset before clustering |
| `clustering.aggregated_data` | Dataset after clustering |
| `clustering.results` | `ClusteringResults` with xarray-like interface |

### ClusteringResults (xarray-like)

Access the underlying tsam results via `clustering.results`:

```python
# Dimension info (like xarray)
clustering.results.dims      # ('period', 'scenario') or ()
clustering.results.coords    # {'period': [2020, 2030], 'scenario': ['high', 'low']}

# Select specific result (like xarray)
clustering.results.sel(period=2020, scenario='high')   # Label-based
clustering.results.isel(period=0, scenario=1)          # Index-based
```

### Plot Accessor Methods

| Method | Description |
|--------|-------------|
| `plot.compare()` | Compare original vs aggregated data (timeseries) |
| `plot.compare(kind='duration_curve')` | Compare as duration curves |
| `plot.clusters()` | View each cluster's profile |
| `plot.heatmap()` | Visualize cluster assignments |

### Key Parameters

```python
# Compare with options
clustering.plot.compare(
    variables='Demand|profile',       # Single variable, list, or None (all)
    kind='timeseries',                # 'timeseries' or 'duration_curve'
    select={'scenario': 'Base'},      # xarray-style selection
    colors='viridis',                 # Colorscale name, list, or dict
    facet_col='period',               # Facet by period if present
    facet_row='scenario',             # Facet by scenario if present
)

# Heatmap shows cluster assignments (no variable needed)
clustering.plot.heatmap()

# Expand aggregated data to original timesteps
expanded = clustering.expand_data(aggregated_data)
```

## Cluster Weights

Each representative timestep has a weight equal to the number of original periods it represents.
This ensures operational costs scale correctly:

$$\text{Objective} = \sum_{t \in \text{typical}} w_t \cdot c_t$$

The weights sum to the original timestep count:

In [None]:
print(f'Sum of weights: {fs_clustered.cluster_weight.sum().item():.0f}')
print(f'Original timesteps: {len(flow_system.timesteps)}')

## Solution Expansion

After optimization, `expand()` maps results back to full resolution:

In [None]:
solver = fx.solvers.HighsSolver(mip_gap=0.01, log_to_console=False)
fs_clustered.optimize(solver)

fs_expanded = fs_clustered.transform.expand()

print(f'Clustered: {len(fs_clustered.timesteps)} timesteps')
print(f'Expanded:  {len(fs_expanded.timesteps)} timesteps')