# Example 4: Comparing Representation Models

The **Representation Model** (pillar R) determines how selected periods stand in for the full year. This notebook runs a single search to find the best 3-month selection, then applies three different representation models to the *same* selection:

| Model | How it works | Weight distribution |
|-------|-------------|--------------------|
| **Uniform** | Each period = 1/k | Equal bars |
| **KMedoids cluster-size** | Weight = fraction of months closest to this representative | Unequal — popular representatives get higher weight |
| **Blended (soft assignment)** | Each original month is a weighted *combination* of all representatives | Full weight matrix, not just one weight per representative |

The choice of R does not change *which* months are selected — only how they are weighted in the downstream model.

In [None]:
import pandas as pd
import plotly.express as px
import energy_repset as rep
import energy_repset.diagnostics as diag

In [None]:
url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)

slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw=df_raw, slicer=slicer)

## Find the best 3-month selection

We use PCA features and a weighted-sum policy with robust min-max normalization (so different score components are on comparable scales).

In [None]:
feature_pipeline = rep.FeaturePipeline(engineers={
    'stats': rep.StandardStatsFeatureEngineer(),
    'pca': rep.PCAFeatureEngineer(),
})

k = 3
objective_set = rep.ObjectiveSet({
    'wasserstein': (1.0, rep.WassersteinFidelity()),
    'correlation': (1.0, rep.CorrelationFidelity()),
})
policy = rep.WeightedSumPolicy(normalization='robust_minmax')
search_algorithm = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
    objective_set, policy, rep.ExhaustiveCombiGen(k=k)
)

# Run with uniform weights to get the selection
workflow = rep.Workflow(feature_pipeline, search_algorithm, rep.UniformRepresentationModel())
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()

selection = result.selection
print(f"Selected months: {selection}")
print(f"Scores: {result.scores}")

## Apply three representation models to the same selection

In [None]:
feature_context = experiment.feature_context

# Model A: Uniform — 1/k each
uniform_model = rep.UniformRepresentationModel()
uniform_model.fit(feature_context)
weights_uniform = uniform_model.weigh(selection)

# Model B: KMedoids cluster-size — proportional to cluster membership
kmedoids_model = rep.KMedoidsClustersizeRepresentation()
kmedoids_model.fit(feature_context)
weights_kmedoids = kmedoids_model.weigh(selection)

# Model C: Blended (soft assignment) — weight matrix
blended_model = rep.BlendedRepresentationModel(blend_type='convex')
blended_model.fit(feature_context)
weights_blended_df = blended_model.weigh(selection)

### Weight comparison table

In [None]:
print(f"{'Month':<12} {'Uniform':>10} {'KMedoids':>10}")
print("-" * 34)
for s in selection:
    print(f"{str(s):<12} {weights_uniform[s]:>10.3f} {weights_kmedoids[s]:>10.3f}")

# Aggregate blended weights to one value per representative
blended_col_sums = weights_blended_df.sum(axis=0)
weights_blended_agg = (blended_col_sums / blended_col_sums.sum()).to_dict()
print(f"\nBlended (aggregated): {weights_blended_agg}")

### Responsibility bars: side by side

The uniform model produces equal bars. KMedoids assigns more weight to representatives that are "closest" to more months. The blended model distributes responsibility more smoothly.

In [None]:
models = {
    'Uniform': weights_uniform,
    'KMedoids': weights_kmedoids,
    'Blended (aggregated)': weights_blended_agg,
}

for label, weights in models.items():
    fig = diag.ResponsibilityBars().plot(weights, show_uniform_reference=True)
    fig.update_layout(title=f'Responsibility Weights: {label}')
    fig.show()

### Blended weight matrix

The heatmap shows the full weight matrix: how much each original month (columns) relies on each representative (rows). In the blended model, every month is a weighted mix of all three representatives — not assigned to just one.

In [None]:
heatmap_df = weights_blended_df.copy()
heatmap_df.index = heatmap_df.index.astype(str)
heatmap_df.columns = heatmap_df.columns.astype(str)

fig = px.imshow(
    heatmap_df.T,
    labels=dict(x='Original Month', y='Representative', color='Weight'),
    color_continuous_scale='Blues',
    aspect='auto',
    title='Blended Weight Matrix',
)
fig.show()

## Feature space and distribution fidelity

In [None]:
fig = diag.FeatureSpaceScatter2D().plot(
    feature_context.df_features, x='pc_0', y='pc_1', selection=selection
)
fig.update_layout(title='Feature Space with Selection')
fig.show()

In [None]:
selected_indices = slicer.get_indices_for_slice_combi(df_raw.index, selection)
df_selection = df_raw.loc[selected_indices]

for var in df_raw.columns:
    fig = diag.DistributionOverlayECDF().plot(df_raw[var], df_selection[var])
    fig.update_layout(title=f'ECDF Overlay: {var}')
    fig.show()