
# CumulantTransform

This notebook shows how to transform input time series to time series of normalized cluster-level cumulants using an instance of the `culearn.learn.CumulantTransform` class. This is integrated into `culearn.learn.CumulantLearn` class and performed before regression, but it can also be used by itself if you need clustering and approximation without regression. Since the cluster-level cumulants are used instead of centroids to approximate time series, this transformation is a generalization of the pattern recognition process proposed in Figure 1 in:

Igor Manojlović, Goran Švenda, Aleksandar Erdeljan, Milan Gavrić, *Time Series Grouping Algorithm for Load Pattern Recognition*, Computers in Industry 111: 140-147, 2019, DOI: [10.1016/j.compind.2019.07.009](https://doi.org/10.1016/j.compind.2019.07.009)


## Obtaining normalized cluster-level cumulants

In [None]:
import plotly.express as px
from culearn.data import *
from culearn.learn import *

# Prepare the data source:
source = LCL('../data/LCL')
# Checkout other data sources in the Datasets notebook.

# Load the dataset from the data source:
dataset = source.dataset()
# This might take a while at first, but will make the rest of the process much faster.

# Prepare time encoders that will be used to aggregate time series values before clustering:
transform_encoders = TimeEncoders(MonthOfYear(), DayType(source.calendar), TimeOfDay())

# Configure the transformer that will approximate and cluster time series values:
transformer = CumulantTransform(encoder=transform_encoders)

# Obtain the normalized cluster-level cumulants:
cumulants = transformer.fit_transform(dataset.y, TimeResolution(minutes=30), source.interval)
# You can also use fit and transform functions individually if required.

## Exploring the results

### Time series of cluster-level cumulants

In [None]:
for c in cumulants:
    print(c.ts_id)
    display(c)
    break

### Time series of cluster-level prediction intervals

In [None]:
intervals = transformer.inverse_transform(cumulants, p=[0.5, 0.75, 0.99])
for i in intervals:
    print(i.ts_id)
    display(i.to_frame())
    break

## Evaluating the results

In [None]:
p = [_ / 100 for _ in range(1, 100)] # percentile probabilities
pinball_score, winkler_score = transformer.evaluate(cumulants, p)

### Pinball Score

In [None]:
pinball_score.mean(axis=0).plot(legend=False)

### Winkler Score

In [None]:
winkler_score.mean(axis=0).plot(legend=False)

### Clustering score

In [None]:
px.bar(transformer.clustering_score.reset_index(), x='k', y='score', color='selected')

### Feature extraction score

In [None]:
px.bar(transformer.extractor_score.reset_index(), x='feature', y='score', color='selected')

## Plotting the results

In [None]:
# Plot the normalized cluster-level cumulants for the last week:
last_week = TimeInterval(source.interval.end - timedelta(7), source.interval.end)
last_week_cumulants = [_.select(last_week) for _ in cumulants]
fig = transformer.figure(last_week_cumulants, p=[0.5, 0.75, 0.99])
# You can add show_actual=True if you also want to show load measurements.
# However, note that this might consume a lot of memory for large clusters.
fig.show()