# External Clustering

This notebook demonstrates different ways to apply clustering to a FlowSystem:

1. **Built-in clustering** - Let flixopt handle everything via `transform.cluster()`
2. **External tsam** - Run tsam yourself on a data subset and pass results to flixopt
3. **Custom indices** - Provide your own cluster assignments directly

The latter two options are useful when:
- You want to cluster on a subset of time series (faster tsam computation)
- You have custom clustering algorithms
- You want to reuse clustering results across multiple FlowSystems

In [None]:
import pandas as pd
import xarray as xr

import flixopt as fx

fx.CONFIG.notebook()

## Load a Pre-built FlowSystem

We'll use the district heating system from the data directory.

In [None]:
from pathlib import Path

# Generate example data if not present
data_file = Path('data/district_heating_system.nc4')
if not data_file.exists():
    from data.generate_example_systems import create_district_heating_system

    fs = create_district_heating_system()
    fs.to_netcdf(data_file)

# Load the FlowSystem
flow_system = fx.FlowSystem.from_netcdf(data_file)
print(f'Loaded FlowSystem: {len(flow_system.timesteps)} timesteps ({len(flow_system.timesteps) / 96:.0f} days)')
print(f'Components: {list(flow_system.components.keys())}')

In [None]:
# Extract key time series from the FlowSystem for later use
heat_demand = flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile
elec_price = flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs']

print(f'Heat demand shape: {heat_demand.shape}')
print(f'Electricity price shape: {elec_price.shape}')

In [None]:
# Baseline: solve without clustering
solver = fx.solvers.HighsSolver(mip_gap=0.01, log_to_console=False)
fs_baseline = flow_system.copy()
fs_baseline.optimize(solver)
print(f'Baseline cost (no clustering): {fs_baseline.solution["costs"].item():,.0f} €')

## Option 1: Built-in Clustering

The simplest approach - let flixopt handle clustering internally using tsam.
This extracts ALL time series from the FlowSystem and clusters on them.

In [None]:
# Create clustered system using built-in method
fs_builtin = flow_system.transform.cluster(
    n_clusters=8,  # Find 8 typical days
    cluster_duration='1D',
)

fs_builtin.optimize(solver)
print(f'Built-in clustering cost: {fs_builtin.solution["costs"].item():,.0f} €')

# Access the clustering parameters
params = fs_builtin._clustering_info['parameters']
print(f'\nCluster assignments: {params.cluster_order.values}')
print(f'Period length: {params.period_length} timesteps')

## Option 2: External tsam on Data Subset

Run tsam yourself on a **subset** of time series data, then pass results to flixopt.

This is useful when:
- You only want to cluster based on the most important time series (faster tsam)
- You want more control over tsam parameters
- You want to reuse the same clustering for multiple FlowSystems

In [None]:
import tsam.timeseriesaggregation as tsam

# Create DataFrame with only the KEY time series
# (Much faster than letting flixopt extract ALL time series)
clustering_data = pd.DataFrame(
    {
        'heat_demand': heat_demand.values,
        'elec_price': elec_price.values,
    },
    index=flow_system.timesteps,
)

print(f'Clustering on {len(clustering_data.columns)} time series (subset of FlowSystem data)')
print(f'Columns: {list(clustering_data.columns)}')

In [None]:
# Run tsam with custom parameters
aggregation = tsam.TimeSeriesAggregation(
    clustering_data,
    noTypicalPeriods=8,
    hoursPerPeriod=24,
    resolution=0.25,  # 15-min resolution
    clusterMethod='hierarchical',
)
aggregation.createTypicalPeriods()

print(f'tsam cluster order: {aggregation.clusterOrder}')

In [None]:
# Create ClusteringParameters with the external tsam aggregation
# This allows flixopt to use the tsam results to aggregate ALL FlowSystem data
params_external = fx.ClusteringParameters(
    n_clusters=8,
    cluster_duration='1D',
    tsam_aggregation=aggregation,  # Pass the tsam object for data aggregation
)

print(f'Indices populated: {params_external.has_indices}')
print(f'Cluster order: {params_external.cluster_order.values}')
print(f'Period length: {params_external.period_length}')

In [None]:
# Apply to FlowSystem using add_clustering()
fs_external = flow_system.transform.add_clustering(params_external)

fs_external.optimize(solver)
print(f'External tsam clustering cost: {fs_external.solution["costs"].item():,.0f} €')

## Option 3: Custom Indices

Provide your own cluster assignments directly - no tsam required.

This is useful when:
- You have a custom clustering algorithm
- You want to manually define typical periods (e.g., weekdays vs weekends)
- You're loading clustering results from another source

In [None]:
# Define custom cluster assignments based on day of week
# We have 31 days, let's group by weekday pattern
n_days = len(flow_system.timesteps) // 96  # 96 timesteps per day (15-min)
print(f'Number of days: {n_days}')

# Simple pattern: group every 4th day together
custom_cluster_order = [i % 8 for i in range(n_days)]

# Note: With custom indices (no tsam object), we use aggregate_data=False
# because we don't have a tsam to transform the data. This only equalizes
# binary (on/off) decisions across similar periods.
params_custom = fx.ClusteringParameters(
    n_clusters=8,
    cluster_duration='1D',
    aggregate_data=False,  # No tsam available for data transformation
    # Provide indices directly
    cluster_order=xr.DataArray(custom_cluster_order, dims=['cluster_period'], name='cluster_order'),
    period_length=96,  # 96 timesteps per day (15-min resolution)
)

print(f'Custom indices set: {params_custom.has_indices}')
print(f'Cluster order: {params_custom.cluster_order.values}')

In [None]:
# Apply to FlowSystem
fs_custom = flow_system.transform.add_clustering(params_custom)

fs_custom.optimize(solver)
print(f'Custom clustering cost: {fs_custom.solution["costs"].item():,.0f} €')

## Comparison

In [None]:
results = pd.DataFrame(
    {
        'Method': ['Baseline (no clustering)', 'Built-in clustering', 'External tsam (subset)', 'Custom indices'],
        'Cost [€]': [
            fs_baseline.solution['costs'].item(),
            fs_builtin.solution['costs'].item(),
            fs_external.solution['costs'].item(),
            fs_custom.solution['costs'].item(),
        ],
    }
).set_index('Method')

results['Gap vs Baseline [%]'] = (results['Cost [€]'] / results.loc['Baseline (no clustering)', 'Cost [€]'] - 1) * 100
results.style.format({'Cost [€]': '{:,.0f}', 'Gap vs Baseline [%]': '{:.2f}'})

## IO: Save and Reload

Clustering indices are automatically saved with the FlowSystem and restored on load.

In [None]:
import tempfile

# Save clustered FlowSystem
with tempfile.TemporaryDirectory() as tmpdir:
    path = Path(tmpdir) / 'clustered_system.nc4'
    fs_external.to_netcdf(path)
    print(f'Saved to: {path}')

    # Reload
    fs_loaded = fx.FlowSystem.from_netcdf(path)

    # Check clustering was restored
    params_loaded = fs_loaded._clustering_info['parameters']
    print('\nRestored clustering:')
    print(f'  has_indices: {params_loaded.has_indices}')
    print(f'  cluster_order: {params_loaded.cluster_order.values}')
    print(f'  period_length: {params_loaded.period_length}')

    # Solve reloaded system
    fs_loaded.optimize(solver)
    print(f'\nReloaded cost: {fs_loaded.solution["costs"].item():,.0f} €')
    print(f'Original cost: {fs_external.solution["costs"].item():,.0f} €')

## Advanced: Segmentation with External tsam

You can also provide segment assignments for intra-period aggregation.

In [None]:
# Run tsam with segmentation on the data subset
aggregation_seg = tsam.TimeSeriesAggregation(
    clustering_data,
    noTypicalPeriods=8,
    hoursPerPeriod=24,
    resolution=0.25,
    segmentation=True,
    noSegments=12,  # 12 segments per day (~2 hours each)
)
aggregation_seg.createTypicalPeriods()

# Create parameters with segmentation and tsam for data aggregation
params_seg = fx.ClusteringParameters(
    n_clusters=8,
    cluster_duration='1D',
    n_segments=12,
    tsam_aggregation=aggregation_seg,  # Pass tsam for data aggregation
)

print(f'Segment assignment shape: {params_seg.segment_assignment.shape}')
print(f'Segment assignment for cluster 0:\n{params_seg.segment_assignment.sel(cluster=0).values}')

In [None]:
# Apply segmented clustering
fs_segmented = flow_system.transform.add_clustering(params_seg)
fs_segmented.optimize(solver)
print(f'Segmented clustering cost: {fs_segmented.solution["costs"].item():,.0f} €')

## Summary

| Method | Data Aggregation | When to Use |
|--------|------------------|-------------|
| `transform.cluster()` | Yes | Default - let flixopt handle everything |
| `tsam_aggregation=...` | Yes | External tsam on data subset, with data aggregation |
| Direct `cluster_order` | No | Custom algorithms or manual period grouping (binary only) |

All methods use `ClusteringParameters` which stores:
- `cluster_order`: Which cluster each period belongs to
- `period_length`: Timesteps per period
- `segment_assignment`: (optional) Segment IDs within each cluster
- `tsam_aggregation`: (optional) tsam object for data transformation