# Clustering Internals

Understanding the data structures and visualization tools behind time series clustering.

This notebook demonstrates:

- **Data structures**: `Clustering`, `ClusterResult`, and `ClusterStructure`
- **Plot accessor**: Built-in visualizations via `.plot`
- **Data expansion**: Using `expand_data()` to map aggregated data back to original timesteps

!!! note "Prerequisites"
    This notebook assumes familiarity with [08c-clustering](08c-clustering.ipynb).

In [None]:
from pathlib import Path

import flixopt as fx

fx.CONFIG.notebook()

# Load the district heating system
data_file = Path('data/district_heating_system.nc4')
if not data_file.exists():
    from data.generate_example_systems import create_district_heating_system

    fs = create_district_heating_system()
    fs.to_netcdf(data_file)

flow_system = fx.FlowSystem.from_netcdf(data_file)

## Clustering Metadata

After calling `cluster()`, metadata is stored in `fs.clustering`:

In [None]:
fs_clustered = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    time_series_for_high_peaks=['HeatDemand(Q_th)|fixed_relative_profile'],
)

fs_clustered.clustering

The `Clustering` contains:
- **`result`**: A `ClusterResult` with timestep mapping and weights
- **`result.cluster_structure`**: A `ClusterStructure` with cluster assignments

In [None]:
fs_clustered.clustering.result

In [None]:
fs_clustered.clustering.result.cluster_structure

## Visualizing Clustering

The `.plot` accessor provides built-in visualizations for understanding clustering results.

In [None]:
# Compare original vs aggregated data as timeseries
# By default, plots all time-varying variables
fs_clustered.clustering.plot.compare()

In [None]:
# Compare specific variables only
fs_clustered.clustering.plot.compare(variable='HeatDemand(Q_th)|fixed_relative_profile')

In [None]:
# Duration curves show how well the aggregated data preserves the distribution
fs_clustered.clustering.plot.compare(kind='duration_curve').data

In [None]:
# View typical period profiles for each cluster
# Each line represents a cluster's representative day
fs_clustered.clustering.plot.typical_periods(variable='HeatDemand(Q_th)|fixed_relative_profile')

In [None]:
# Heatmap shows which original period belongs to which cluster
# Rows are original periods (days), columns show cluster assignment
fs_clustered.clustering.plot.heatmap()

## Expanding Aggregated Data

The `ClusterResult.expand_data()` method maps aggregated data back to original timesteps.
This is useful for comparing clustering results before optimization:

In [None]:
import plotly.express as px

# Get original and aggregated data
result = fs_clustered.clustering.result
original = result.original_data['HeatDemand(Q_th)|fixed_relative_profile']
aggregated = result.aggregated_data['HeatDemand(Q_th)|fixed_relative_profile']

# Expand aggregated data back to original timesteps
expanded = result.expand_data(aggregated)

print(f'Original:   {len(original.time)} timesteps')
print(f'Aggregated: {len(aggregated.time)} timesteps')
print(f'Expanded:   {len(expanded.time)} timesteps')

In [None]:
# Plot original vs expanded (reconstructed from clusters)
import xarray as xr

ds = xr.Dataset({'Original': original, 'Expanded': expanded})
df = ds.to_dataframe().reset_index().melt(id_vars='time', var_name='series', value_name='value')

fig = px.line(df, x='time', y='value', color='series', title='Original vs Expanded Heat Demand')
fig.update_layout(height=350)
fig.show()

## Cluster Weights

Each representative timestep has a weight equal to the number of original periods it represents.
This ensures operational costs scale correctly:

$$\text{Objective} = \sum_{t \in \text{typical}} w_t \cdot c_t$$

The weights sum to the original timestep count:

In [None]:
print(f'Sum of weights: {fs_clustered.cluster_weight.sum().item():.0f}')
print(f'Original timesteps: {len(flow_system.timesteps)}')

## Solution Expansion

After optimization, `expand_solution()` maps results back to full resolution:

In [None]:
solver = fx.solvers.HighsSolver(mip_gap=0.01, log_to_console=False)
fs_clustered.optimize(solver)

fs_expanded = fs_clustered.transform.expand_solution()

print(f'Clustered: {len(fs_clustered.timesteps)} timesteps')
print(f'Expanded:  {len(fs_expanded.timesteps)} timesteps')

## Summary

| Class | Purpose |
|-------|--------|
| `Clustering` | Stored on `fs.clustering` after `cluster()` |
| `ClusterResult` | Contains timestep mapping, weights, and `expand_data()` method |
| `ClusterStructure` | Maps original periods to clusters |

### Plot Accessor Methods

| Method | Description |
|--------|-------------|
| `plot.compare()` | Compare original vs aggregated data (timeseries) |
| `plot.compare(kind='duration_curve')` | Compare as duration curves |
| `plot.typical_periods()` | View each cluster's profile |
| `plot.heatmap()` | Visualize clustering structure |

### Key Parameters

```python
# Compare with options
clustering.plot.compare(
    variable='Demand|profile',        # Single variable, list, or None (all)
    kind='timeseries',                # 'timeseries' or 'duration_curve'
    facet_col='scenario',             # Facet by scenario if present
    facet_row='period',               # Facet by period if present
)

# Expand aggregated data to original timesteps
result = clustering.result
expanded = result.expand_data(aggregated_data)
```