# CumulantLearner: Evaluation

This notebook shows how to use the `culearn.learn.CumulantLearn` class to perform the evaluation described in the case study in:

Igor Manojlović, Goran Švenda, Aleksandar Erdeljan, Milan Gavrić, Darko Čapko, *Cumulant Learning: Highly Accurate and Computationally Efficient Load Pattern Recognition Method for Probabilistic STLF at the LV Level*, IEEE Transactions on Smart Grid, 2024, DOI: [10.1109/TSG.2024.3481894](https://doi.org/10.1109/TSG.2024.3481894)



## Creating the learner

In [None]:
import plotly.express as px
from culearn.data import *
from culearn.learn import *

# Prepare the data source:
source = LCL('../data/LCL')
# Checkout other data sources in the Datasets notebook.

# Load the dataset from the data source:
dataset = source.dataset()
# This might take a while at first, but will make the rest of the process much faster.

# Prepare time encoders that will be used to aggregate time series values before clustering:
transform_encoders = TimeEncoders(MonthOfYear(), DayType(source.calendar), TimeOfDay())

# Configure a transformer that will approximate and cluster time series values:
transformer = CumulantTransform(encoder=transform_encoders)

# Prepare time encoders that will be used to obtain input time features for regression:
regressor_encoders = TimeEncoders(MonthOfYear(), DayOfWeek(), TimeOfDay(), Holiday(source.calendar))

# Configure regression method to predict time series patterns for 48 time steps ahead:
regressor = lambda: TimeSeriesRegressor(48, t_encoder=regressor_encoders)
# You can change the underlying regression model by modifying the 'base' parameter.
# For example, you can set base=DeepS2S(epochs=1) if you only want to perform deep learning for one epoch.
# You can also set base=ShallowS2S() if you want to use shallow learning models instead.

# Configure learner to predict half-hour cluster-level cumulants:
learner = CumulantLearner(dataset, TimeResolution(minutes=30), transformer, regressor)
# With the regressor that predicts 48 time steps ahead, the learner will provide day-ahead forecast.

## Evaluating the learner

In [None]:
# The learner will use 80% of history for initial training:
fit_interval = TimeInterval(
    source.interval.start,
    source.interval.start + timedelta(int(source.interval.delta.days * 0.8))
)

# The remaining 20% of history will be used for testing:
pred_interval = TimeInterval(fit_interval.end, source.interval.end)

# Incremental updates will be performed every 15 prediction intervals (every 15 days):
update_interval = 15

# The learner will be evaluated at percentile probabilities:
p = [_ / 100 for _ in range(1, 100)]

# Evaluation (might take a while):
e = learner.evaluate(fit_interval, pred_interval, update_interval, p)

# Save the results to CSV:
e.to_csv(source.directory, type(learner).__name__)

### Pinball score

In [None]:
e.pinball_score.mean(axis=0).plot(legend=False)

### Winkler score

In [None]:
e.winkler_score.mean(axis=0).plot(legend=False)

### Clustering score

In [None]:
px.bar(e.clustering_score.reset_index(), x='k', y='score', color='selected')

### Feature extraction score

In [None]:
px.bar(e.extractor_score.reset_index(), x='feature', y='score', color='selected')

### Feature selection score

In [None]:
px.scatter_3d(e.x_selector_score.reset_index(), x='x', y='cluster', z='score', color='selected')

### Lag selection score

In [None]:
px.scatter_3d(e.y_selector_score.reset_index(), x='lag', y='cluster', z='score', color='selected')

### Regressor scores

In [None]:
# Show the regressor scores obtained for each cluster during initial training and incremental updates:

rs = e.regressor_score.reset_index()
ax_cols = list(sorted(set(rs.iloc[:, 0])))
ax_rows = list(sorted(set(rs.iloc[:, 1])))
ax_value = 3

rs_fig = make_subplots(rows=len(ax_rows), cols=len(ax_cols))

for i_col in range(len(ax_cols)):
    for i_row in range(len(ax_rows)):
        rs_values = rs[(rs.iloc[:, 0] == ax_cols[i_col]) & (rs.iloc[:, 1] == ax_rows[i_row])].iloc[:, ax_value]
        rs_fig.add_scatter(y=rs_values, row=i_row + 1, col=i_col + 1,
                           name=f'{rs.columns[0]}={i_col}, {rs.columns[1]}={i_row}')

rs_fig.update_layout(height=800)
rs_fig.show()

## Plotting the results

In [None]:
# Plot the normalized cluster-level prediction intervals for the last day:
fig = learner.figure(source.interval.end - timedelta(1), p=[0.5, 0.75, 0.99])
# You can add show_actual=True if you also want to show load measurements.
# However, note that this might consume a lot of memory for large clusters.
fig.show()