# Example 5: Multi-Objective Exploration

When multiple aspects of representativeness matter simultaneously, a single weighted score can hide important trade-offs. This notebook sets up a **4-component objective** and compares two selection policies:

- **ParetoMaxMinStrategy**: finds the Pareto-optimal solution that maximizes the worst-performing objective (conservative, balanced)
- **WeightedSumPolicy**: collapses all objectives into a single scalar via weighted sum (simpler, but requires choosing weights a priori)

The same search is run twice — the only difference is the policy that picks the winner from the scored candidates.

In [None]:
import pandas as pd
import energy_repset as rep
import energy_repset.diagnostics as diag

In [None]:
url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)

slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw=df_raw, slicer=slicer)

feature_pipeline = rep.FeaturePipeline(engineers={
    'stats': rep.StandardStatsFeatureEngineer(),
    'pca': rep.PCAFeatureEngineer(),
})

## Rich objective set: 4 components

Each component captures a different dimension of representativeness:

| Component | What it measures | Direction |
|-----------|-----------------|----------|
| **Wasserstein** | Marginal distribution similarity | minimize |
| **Correlation** | Cross-variable dependency preservation | minimize |
| **Duration curve** | Duration curve NRMSE (load-ordered fidelity) | minimize |
| **Diversity** | Spread of selection in feature space | maximize |

The first three are *fidelity* metrics (lower = better match to the full year). Diversity is a *coverage* metric (higher = more spread). This tension is intentional: pure fidelity optimization tends to pick "average" months, while diversity pushes toward distinct ones.

In [None]:
objective_set = rep.ObjectiveSet({
    'wasserstein': (1.0, rep.WassersteinFidelity()),
    'correlation': (1.0, rep.CorrelationFidelity()),
    'duration_curve': (1.0, rep.DurationCurveFidelity()),
    'diversity': (0.5, rep.DiversityReward()),
})

k = 3
combi_gen = rep.ExhaustiveCombiGen(k=k)
representation_model = rep.UniformRepresentationModel()

## Run A: ParetoMaxMinStrategy

In [None]:
search_pareto = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
    objective_set, rep.ParetoMaxMinStrategy(), combi_gen
)
workflow_pareto = rep.Workflow(feature_pipeline, search_pareto, representation_model)
experiment_pareto = rep.RepSetExperiment(context, workflow_pareto)
result_pareto = experiment_pareto.run()

print(f"Selection: {result_pareto.selection}")
print(f"Scores:    {result_pareto.scores}")

## Run B: WeightedSumPolicy

We reuse the already-computed features to skip redundant work.

In [None]:
search_weighted = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
    objective_set, rep.WeightedSumPolicy(normalization='robust_minmax'), combi_gen
)
workflow_weighted = rep.Workflow(feature_pipeline, search_weighted, representation_model)
experiment_weighted = rep.RepSetExperiment(experiment_pareto.feature_context, workflow_weighted)
result_weighted = experiment_weighted.run()

print(f"Selection: {result_weighted.selection}")
print(f"Scores:    {result_weighted.scores}")

In [None]:
same = result_pareto.selection == result_weighted.selection
print(f"Same selection? {same}")

## Pareto front visualization

The 2D scatter shows all 220 candidates in two-objective space. Pareto-optimal solutions (highlighted) form the efficient frontier — no solution dominates them on both axes.

In [None]:
fig = diag.ParetoScatter2D(
    objective_x='wasserstein', objective_y='correlation'
).plot(search_algorithm=search_pareto, selected_combination=result_pareto.selection)
fig.update_layout(title='Pareto Front: Wasserstein vs Correlation')
fig.show()

The scatter matrix shows all pairwise objective trade-offs at once.

In [None]:
fig = diag.ParetoScatterMatrix().plot(
    search_algorithm=search_pareto, selected_combination=result_pareto.selection
)
fig.update_layout(title='Pareto Scatter Matrix')
fig.show()

## Score contributions: Pareto vs Weighted Sum

Comparing the normalized score profiles of the two winners reveals where they differ. The Pareto policy tends to produce more balanced profiles, while the weighted sum may sacrifice one objective for gains on others.

In [None]:
for label, res in [('Pareto', result_pareto), ('Weighted Sum', result_weighted)]:
    fig = diag.ScoreContributionBars().plot(res.scores, normalize=True)
    fig.update_layout(title=f'Score Contributions: {label}')
    fig.show()

## Weights comparison

In [None]:
for label, res in [('Pareto', result_pareto), ('Weighted Sum', result_weighted)]:
    fig = diag.ResponsibilityBars().plot(res.weights, show_uniform_reference=True)
    fig.update_layout(title=f'Weights: {label}')
    fig.show()

## Distribution and profile diagnostics (Pareto selection)

In [None]:
selected_indices = slicer.get_indices_for_slice_combi(df_raw.index, result_pareto.selection)
df_selection = df_raw.loc[selected_indices]

for var in df_raw.columns:
    fig = diag.DistributionOverlayHistogram().plot(df_raw[var], df_selection[var], nbins=40)
    fig.update_layout(title=f'Distribution Overlay: {var}')
    fig.show()

In [None]:
fig = diag.DiurnalProfileOverlay().plot(
    df_raw, df_selection, variables=list(df_raw.columns)
)
fig.update_layout(title='Diurnal Profiles: Full Year vs Selection')
fig.show()

In [None]:
fig = diag.CorrelationDifferenceHeatmap().plot(
    df_raw, df_selection, method='pearson', show_lower_only=True
)
fig.update_layout(title='Correlation Difference: Selection - Full Year')
fig.show()

In [None]:
feature_context = experiment_pareto.feature_context
fig = diag.FeatureDistributions().plot(feature_context.df_features, nbins=20, cols=4)
fig.update_layout(title='Feature Distributions')
fig.show()