# Large-Scale Optimization: Computational Efficiency Techniques

## User Story

> *You're planning a district energy system with a full year of hourly data (8,760 timesteps). The optimization takes hours to complete. You need to find ways to get good solutions faster for iterative design exploration.*

This notebook introduces:

- **Resampling**: Reduce time resolution (e.g., hourly → 4-hourly)
- **Clustering**: Identify typical periods (e.g., 8 representative days)
- **Two-stage optimization**: Size with reduced data, dispatch at full resolution
- **Speed vs. accuracy trade-offs**: When to use each technique

## Setup

In [None]:
import timeit

import numpy as np
import pandas as pd
import plotly.express as px
import xarray as xr

import flixopt as fx

fx.CONFIG.notebook()

## Create a Realistic Annual Dataset

We simulate one month of hourly data (720 timesteps) to demonstrate the techniques:

In [None]:
# One month at hourly resolution
timesteps = pd.date_range('2024-01-01', periods=720, freq='h')  # 30 days
hours = np.arange(len(timesteps))
hour_of_day = hours % 24
day_of_month = hours // 24

print(f'Timesteps: {len(timesteps)} hours ({len(timesteps) / 24:.0f} days)')

In [None]:
np.random.seed(42)

# Heat demand: daily pattern with weekly variation
daily_pattern = np.select(
    [
        (hour_of_day >= 6) & (hour_of_day < 9),
        (hour_of_day >= 9) & (hour_of_day < 17),
        (hour_of_day >= 17) & (hour_of_day < 22),
    ],
    [200, 150, 180],
    default=100,
).astype(float)

# Add temperature effect (colder mid-month)
temp_effect = 1 + 0.3 * np.sin(day_of_month * np.pi / 30)
heat_demand = daily_pattern * temp_effect + np.random.normal(0, 10, len(timesteps))
heat_demand = np.clip(heat_demand, 80, 300)

# Electricity price: time-of-use with volatility
base_price = np.where((hour_of_day >= 7) & (hour_of_day <= 21), 0.15, 0.08)
elec_price = base_price * (1 + np.random.uniform(-0.2, 0.2, len(timesteps)))

# Gas price: relatively stable
gas_price = 0.06 + np.random.uniform(-0.005, 0.005, len(timesteps))

print(f'Heat demand: {heat_demand.min():.0f} - {heat_demand.max():.0f} kW')
print(f'Elec price: {elec_price.min():.3f} - {elec_price.max():.3f} €/kWh')

In [None]:
# Visualize first week with plotly - using xarray and faceting
profiles = xr.Dataset(
    {
        'Heat Demand [kW]': xr.DataArray(heat_demand[:168], dims=['time'], coords={'time': timesteps[:168]}),
        'Electricity Price [€/kWh]': xr.DataArray(elec_price[:168], dims=['time'], coords={'time': timesteps[:168]}),
    }
)

df = profiles.to_dataframe().reset_index().melt(id_vars='time', var_name='variable', value_name='value')
fig = (
    px.line(df, x='time', y='value', facet_col='variable', height=300)
    .update_yaxes(matches=None, showticklabels=True)
    .for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1]))
)
fig.show()

## Build the Base FlowSystem

A typical district heating system with investment decisions:

In [None]:
def build_system(timesteps, heat_demand, elec_price, gas_price):
    """Build a FlowSystem with investment optimization."""
    fs = fx.FlowSystem(timesteps)

    fs.add_elements(
        # Buses
        fx.Bus('Electricity', carrier='electricity'),
        fx.Bus('Heat', carrier='heat'),
        fx.Bus('Gas', carrier='gas'),
        # Effects
        fx.Effect('costs', '€', 'Total Costs', is_standard=True, is_objective=True),
        # Gas Supply
        fx.Source(
            'GasGrid',
            outputs=[fx.Flow('Gas', bus='Gas', size=1000, effects_per_flow_hour=gas_price)],
        ),
        # CHP with investment optimization
        fx.linear_converters.CHP(
            'CHP',
            electrical_efficiency=0.35,
            thermal_efficiency=0.50,
            status_parameters=fx.StatusParameters(
                effects_per_startup={'costs': 50},
                min_uptime=3,
            ),
            electrical_flow=fx.Flow(
                'P_el',
                bus='Electricity',
                size=fx.InvestParameters(
                    minimum_size=0,
                    maximum_size=150,
                    effects_of_investment_per_size={'costs': 30},
                ),
            ),
            thermal_flow=fx.Flow('Q_th', bus='Heat'),
            fuel_flow=fx.Flow('Q_fuel', bus='Gas', relative_minimum=0.3),
        ),
        # Gas Boiler with investment optimization
        fx.linear_converters.Boiler(
            'Boiler',
            thermal_efficiency=0.92,
            thermal_flow=fx.Flow(
                'Q_th',
                bus='Heat',
                size=fx.InvestParameters(
                    minimum_size=0,
                    maximum_size=400,
                    effects_of_investment_per_size={'costs': 10},
                ),
            ),
            fuel_flow=fx.Flow('Q_fuel', bus='Gas'),
        ),
        # Thermal Storage with investment optimization
        fx.Storage(
            'Storage',
            capacity_in_flow_hours=fx.InvestParameters(
                minimum_size=0,
                maximum_size=500,
                effects_of_investment_per_size={'costs': 2},
            ),
            initial_charge_state=0,
            eta_charge=0.95,
            eta_discharge=0.95,
            relative_loss_per_hour=0.01,
            charging=fx.Flow('Charge', bus='Heat', size=100),
            discharging=fx.Flow('Discharge', bus='Heat', size=100),
        ),
        # Electricity Sales
        fx.Sink(
            'ElecSales',
            inputs=[fx.Flow('P_el', bus='Electricity', size=200, effects_per_flow_hour=-elec_price)],
        ),
        # Heat Demand
        fx.Sink(
            'HeatDemand',
            inputs=[fx.Flow('Q_th', bus='Heat', size=1, fixed_relative_profile=heat_demand)],
        ),
    )

    return fs


# Build the base system
flow_system = build_system(timesteps, heat_demand, elec_price, gas_price)
print(f'Base system: {len(timesteps)} timesteps')

## Technique 1: Resampling

Reduce time resolution to speed up optimization:

In [None]:
solver = fx.solvers.HighsSolver(mip_gap=0.01)

# Resample from 1h to 4h resolution
fs_resampled = flow_system.transform.resample('4h')

print(f'Original: {len(flow_system.timesteps)} timesteps')
print(f'Resampled: {len(fs_resampled.timesteps)} timesteps')
print(f'Reduction: {(1 - len(fs_resampled.timesteps) / len(flow_system.timesteps)) * 100:.0f}%')

In [None]:
# Optimize resampled system
start = timeit.default_timer()
fs_resampled.optimize(solver)
time_resampled = timeit.default_timer() - start

print(f'\nResampled optimization: {time_resampled:.2f} seconds')
print(f'Cost: {fs_resampled.solution["costs"].item():.2f} €')

## Technique 2: Two-Stage Optimization

1. **Stage 1**: Size components with resampled data (fast)
2. **Stage 2**: Fix sizes and optimize dispatch at full resolution

In [None]:
# Stage 1: Sizing with resampled data
start = timeit.default_timer()
fs_sizing = flow_system.transform.resample('4h')
fs_sizing.optimize(solver)
time_stage1 = timeit.default_timer() - start

print('=== Stage 1: Sizing ===')
print(f'Time: {time_stage1:.2f} seconds')
print('\nOptimized sizes:')
for name, size in fs_sizing.statistics.sizes.items():
    print(f'  {name}: {float(size.item()):.1f}')

In [None]:
# Stage 2: Dispatch at full resolution with fixed sizes
start = timeit.default_timer()
fs_dispatch = flow_system.transform.fix_sizes(fs_sizing.statistics.sizes)
fs_dispatch.optimize(solver)
time_stage2 = timeit.default_timer() - start

print('=== Stage 2: Dispatch ===')
print(f'Time: {time_stage2:.2f} seconds')
print(f'Cost: {fs_dispatch.solution["costs"].item():.2f} €')
print(f'\nTotal two-stage time: {time_stage1 + time_stage2:.2f} seconds')

## Technique 3: Full Optimization (Baseline)

For comparison, solve the full problem:

In [None]:
start = timeit.default_timer()
fs_full = flow_system.copy()
fs_full.optimize(solver)
time_full = timeit.default_timer() - start

print('=== Full Optimization ===')
print(f'Time: {time_full:.2f} seconds')
print(f'Cost: {fs_full.solution["costs"].item():.2f} €')

## Compare Results

In [None]:
# Collect results
results = {
    'Full (baseline)': {
        'Time [s]': time_full,
        'Cost [€]': fs_full.solution['costs'].item(),
        'CHP Size [kW]': fs_full.statistics.sizes['CHP(P_el)'].item(),
        'Boiler Size [kW]': fs_full.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage Size [kWh]': fs_full.statistics.sizes['Storage'].item(),
    },
    'Resampled (4h)': {
        'Time [s]': time_resampled,
        'Cost [€]': fs_resampled.solution['costs'].item(),
        'CHP Size [kW]': fs_resampled.statistics.sizes['CHP(P_el)'].item(),
        'Boiler Size [kW]': fs_resampled.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage Size [kWh]': fs_resampled.statistics.sizes['Storage'].item(),
    },
    'Two-Stage': {
        'Time [s]': time_stage1 + time_stage2,
        'Cost [€]': fs_dispatch.solution['costs'].item(),
        'CHP Size [kW]': fs_dispatch.statistics.sizes['CHP(P_el)'].item(),
        'Boiler Size [kW]': fs_dispatch.statistics.sizes['Boiler(Q_th)'].item(),
        'Storage Size [kWh]': fs_dispatch.statistics.sizes['Storage'].item(),
    },
}

comparison = pd.DataFrame(results).T

# Add relative metrics
baseline_cost = comparison.loc['Full (baseline)', 'Cost [€]']
baseline_time = comparison.loc['Full (baseline)', 'Time [s]']
comparison['Cost Gap [%]'] = ((comparison['Cost [€]'] - baseline_cost) / baseline_cost * 100).round(2)
comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1)

comparison.round(2)

## Visual Comparison: Heat Balance

In [None]:
# Full optimization heat balance
fs_full.statistics.plot.balance('Heat')

In [None]:
# Two-stage optimization heat balance
fs_dispatch.statistics.plot.balance('Heat')

## When to Use Each Technique

| Technique | Best For | Trade-off |
|-----------|----------|------------|
| **Full optimization** | Final results, small problems | Slowest, most accurate |
| **Resampling** | Quick screening, trend analysis | Fast, loses temporal detail |
| **Two-stage** | Investment decisions, large problems | Good balance of speed and accuracy |
| **Clustering** | Preserves extreme periods | Requires `tsam` package |

### Resampling Options

```python
# Different resolutions
fs_2h = flow_system.transform.resample('2h')   # 2-hourly
fs_4h = flow_system.transform.resample('4h')   # 4-hourly
fs_daily = flow_system.transform.resample('1D')  # Daily

# Different aggregation methods
fs_mean = flow_system.transform.resample('4h', method='mean')  # Default
fs_max = flow_system.transform.resample('4h', method='max')    # Preserve peaks
```

### Two-Stage Workflow

```python
# Stage 1: Sizing
fs_sizing = flow_system.transform.resample('4h')
fs_sizing.optimize(solver)

# Stage 2: Dispatch
fs_dispatch = flow_system.transform.fix_sizes(fs_sizing.statistics.sizes)
fs_dispatch.optimize(solver)
```

## Summary

You learned how to:

- Use **`transform.resample()`** to reduce time resolution
- Apply **two-stage optimization** for large investment problems
- Use **`transform.fix_sizes()`** to lock in investment decisions
- Compare **speed vs. accuracy** trade-offs

### Key Takeaways

1. **Start fast**: Use resampling for initial exploration
2. **Iterate**: Refine with two-stage optimization
3. **Validate**: Run full optimization for final results
4. **Monitor**: Check cost gaps to ensure acceptable accuracy

### Further Reading

- For clustering with typical periods, see `transform.cluster()` (requires `tsam` package)
- For time selection, see `transform.sel()` and `transform.isel()`