# Cumulant Learning with Transformers and Custom Datasets

This notebook shows how to apply Cumulant Learning with a custom transformer-based regression model and a custom pandas dataset. The transformers and pandas are chosen for this example due to their popularity, but the learning process is open for other customizations as well.

## Custom Regression Model

We will be using [`PositionEmbedding`](https://keras.io/keras_hub/api/modeling_layers/position_embedding/), [`TransformerEncoder`](https://keras.io/keras_hub/api/modeling_layers/transformer_encoder/), and [`TransformerDecoder`](https://keras.io/keras_hub/api/modeling_layers/transformer_decoder/) from [Keras-NLP](https://pypi.org/project/keras-nlp/) to build our custom Sequence-to-Sequence (S2S) regression model `S2S_Transformers` that inherits the `culearn.regression.S2S_DNN` class.

In [None]:
!pip install keras-nlp

In [None]:
from culearn.regression import *
from keras_nlp.layers import TransformerEncoder, TransformerDecoder, PositionEmbedding


class S2S_Transformers(S2S_DNN):
    def __init__(self,
                 depth: 'int > 0' = 2,
                 n_heads: 'int > 0' = 4,
                 embedding_dim: 'int > 0' = 64,
                 hidden_dim: 'int > 0' = 128,
                 max_length: 'int > 0' = 1000,
                 drop: 'float >= 0' = 0.1,
                 *args, **kwargs):
        """
        Sequence-to-Sequence (S2S) model with Transformer encoder and decoder.

        :param depth: Number of encoder-decoder layers.
        :param n_heads: Number of Transformer heads.
        :param embedding_dim: Number of feedforward units in input embedding layer.
        :param hidden_dim: Number of feedforward units within Transformer layers.
        :param max_length: Maximum number of positions in input sequences.
        :param drop: Dropout rate for Transformer layers.
        :param args: Base class arguments.
        :param kwargs: Base class key-value arguments.
        """
        super().__init__(*args, **kwargs)
        self.x_projection = tfl.Dense(embedding_dim)
        self.y_projection = tfl.Dense(embedding_dim)
        self.positioning = PositionEmbedding(max_length)
        self.encoders = [
            TransformerEncoder(
                num_heads=n_heads,
                intermediate_dim=hidden_dim,
                dropout=drop)
            for i in range(depth)
        ]
        self.decoders = [
            TransformerDecoder(
                num_heads=n_heads,
                intermediate_dim=hidden_dim,
                dropout=drop)
            for i in range(depth)
        ]

    def _s2s(self, y_past, x_future):
        # Encode past sequence.
        y_encoded = self.y_projection(y_past)
        y_encoded = self.positioning(y_encoded)
        encoder_outputs = []
        for encoder in self.encoders:
            y_encoded = encoder(y_encoded)
            encoder_outputs.append(y_encoded)

        # Decode future sequence using corresponding encoder output.
        x_encoded = self.x_projection(x_future)
        x_encoded = self.positioning(x_encoded)
        decoder_output = x_encoded
        for i, decoder in enumerate(self.decoders):
            decoder_output = decoder(decoder_output, encoder_outputs[i])

        # Return the output from the last decoder layer directly.
        return decoder_output

## Custom Dataset

We will generate random numbers to represent 10 X and 100 Y time series at one-hour resolution in one-year time interval. For this example, we will store the values in memory using one `pandas.DataFrame` for X and one `TimeSeriesInMemory` wrapper around multiple `pandas.Series` for Y (to support parallel processing). If you don't have enough memory to do that for your own dataset and you need some kind of lazy loading, you can implement your own `TimeSeries` subclass analog to `culearn.base.TimeSeriesInMemory` and `culearn.csv.TimeSeriesCSV`. You can also take a look at different implementations of the `DataSource` class in the `culearn.data` module to see how these classes are used on some real datasets.

In [None]:
import numpy as np
from culearn.base import *

resolution = TimeResolution(hours=1)
interval = TimeInterval(datetime(2021, 1, 1), datetime(2022, 1, 1))
timestamps = pd.DatetimeIndex([_.start for _ in resolution.steps(interval)])
dataset = PredictionDataset(
    x = pd.DataFrame(np.random.rand(len(timestamps), 10), index=timestamps),
    y = [
        TimeSeriesInMemory(
            TimeSeriesID(str(i)),
            pd.Series(np.random.rand(len(timestamps)), index=timestamps))
        for i in range(100)
    ]
)

## Cumulant Learning

Once we have the regression model and dataset ready, we can apply Cumulant Learning.

In [None]:
from culearn.learn import *

# Prepare the regressors
one_day = timedelta(1)
horizon = int(one_day / resolution)
regressor = lambda: TimeSeriesRegressor(horizon, base=DeepS2S(epochs=1, hidden=lambda:S2S_Transformers()))

# Prepare the learner
learner = CumulantLearner(dataset, resolution, CumulantTransform(), regressor)

# Train the learner
fit_interval = TimeInterval(interval.start, interval.end - one_day)
learner.fit(fit_interval, verbose=True)

# Test the learner
pred_intervals = learner.predict(fit_interval.end, p=[0.75, 0.95, 0.99], clusters=True, members=True)
for pi in pred_intervals:
  print(pi.ts_id)
  display(pi.to_frame())
  break
pred_cumulants = learner.predict_cumulants(fit_interval.end)
for pc in pred_cumulants:
  print(pc.ts_id)
  display(pc)
  break
pred_figure = learner.figure(fit_interval.end, p=[0.75, 0.95, 0.99])
pred_figure.show()