# Embedding models

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/210-embedding_models.ipynb)

This notebooks contains examples with embedding models.

**Table of contents**

* [Using embedding models directly](#chapter1)  
* [Using embedding models with transforms](#chapter2)
    * [Baseline](#section_2_1)
    * [EmbeddingSegmentTransform](#section_2_2)
    * [EmbeddingWindowTransform](#section_2_3)

In [1]:
import warnings

from etna.datasets import TSDataset
from etna.datasets import generate_ar_df
from etna.datasets import load_dataset
from etna.metrics import SMAPE
from etna.models import CatBoostMultiSegmentModel
from etna.models import LinearMultiSegmentModel
from etna.pipeline import Pipeline
from etna.transforms import EmbeddingSegmentTransform
from etna.transforms import EmbeddingWindowTransform
from etna.transforms import FilterFeaturesTransform
from etna.transforms import LagTransform
from etna.transforms.embeddings.models import BaseEmbeddingModel
from etna.transforms.embeddings.models import TS2VecEmbeddingModel
from etna.transforms.embeddings.models import TSTCCEmbeddingModel

warnings.filterwarnings("ignore")



## 1. Using embedding models directly <a class="anchor" id="chapter1"></a>

We have two models to generate embeddings for time series: `TS2VecEmbeddingModel` and `TSTCCEmbeddingModel`.

Each model has following methods:
- `fit` to train model:
- `encode_segment` to generate embeddings for the whole series. That features are regressors.
- `encode_window` to generate embeddings for each timestamp. That features are not regressors and should be used as lags for future.
- `freeze` to enable or disable skipping training in `fit` method. It is useful, for example, when you have a pretrained model and you want only to genarate embeddings without new training during `backtest`.
- `save` and `load` to save and load pretrained models, respectively.

In [2]:
df = generate_ar_df(periods=10, start_time="2001-01-01", n_segments=3)
ts = TSDataset(df, freq="D")
ts.head()

segment,segment_0,segment_1,segment_2
feature,target,target,target
timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
2001-01-01,1.624345,1.462108,-1.100619
2001-01-02,1.012589,-0.598033,0.044105
2001-01-03,0.484417,-0.92045,0.945695
2001-01-04,-0.588551,-1.304504,1.44819
2001-01-05,0.276856,-0.170735,2.349046


Now let's work with models directly.

They are expecting array with shapes
(n_segments, n_timestamps, num_features). The example shows working with `TS2VecEmbeddingModel`, it is all the same with `TSTCCEmbeddingModel`.

In [3]:
x = ts.df.values.reshape(ts.size()).transpose(1, 0, 2)
x.shape

(3, 10, 1)

In [4]:
model_ts2vec = TS2VecEmbeddingModel(input_dims=1, output_dims=2)
model_ts2vec.fit(x, n_epochs=1)
segment_embeddings = model_ts2vec.encode_segment(x)
segment_embeddings.shape

(3, 2)

As we are using `encode_segment` we get `output_dims` features consisting of one value for each segment.

And what about `encode_window`?

In [5]:
window_embeddings = model_ts2vec.encode_window(x)
window_embeddings.shape

(3, 10, 2)

We get `output_dims` features consisting of `n_timestamps` values for each segment.

## 2. Using embedding models with transforms <a class="anchor" id="chapter2"></a>

In this section we will test our models on example.

In [6]:
HORIZON = 6

### 2.1 Baseline <a class="anchor" id="section_2_1"></a>

Before working with embedding features, let's make forecasts using usual features.

In [7]:
ts = load_dataset("m3_monthly")
ts.drop_features(features=["origin_timestamp"])
ts.df_exog = None
ts.head()

segment,M1000_MACRO,M1001_MACRO,M1002_MACRO,M1003_MACRO,M1004_MACRO,M1005_MACRO,M1006_MACRO,M1007_MACRO,M1008_MACRO,M1009_MACRO,...,M992_MACRO,M993_MACRO,M994_MACRO,M995_MACRO,M996_MACRO,M997_MACRO,M998_MACRO,M999_MACRO,M99_MICRO,M9_MICRO
feature,target,target,target,target,target,target,target,target,target,target,...,target,target,target,target,target,target,target,target,target,target
timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


In [8]:
model = CatBoostMultiSegmentModel()

lag_transform = LagTransform(in_column="target", lags=list(range(HORIZON, HORIZON + 6)), out_column="lag")

pipeline = Pipeline(model=model, transforms=[lag_transform], horizon=HORIZON)
metrics_df, _, _ = pipeline.backtest(ts, metrics=[SMAPE()])

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    5.3s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:   10.8s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:   16.5s
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:   22.5s
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:   28.6s
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:   28.6s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.6s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    1.2s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    1.8s
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:    2.5s
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:    3.1s
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:    3.1s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.2s
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Do

In [9]:
print("SMAPE: ", metrics_df["SMAPE"].mean())

SMAPE:  14.862075501818936


### 2.2 EmbeddingSegmentTransform <a class="anchor" id="section_2_2"></a>

`EmbeddingSegmentTransform` calls models' `encode_segment` method inside.

In [10]:
def forecast_with_segment_embeddings(emb_model: BaseEmbeddingModel, training_params: dict) -> float:
    model = CatBoostMultiSegmentModel(task_type="GPU", devices="0")

    emb_transform = EmbeddingSegmentTransform(
        in_columns=["target"], embedding_model=emb_model, training_params=training_params, out_column="emb"
    )
    pipeline = Pipeline(model=model, transforms=[lag_transform, emb_transform], horizon=HORIZON)
    metrics_df, _, _ = pipeline.backtest(ts, metrics=[SMAPE()])
    smape_score = metrics_df["SMAPE"].mean()
    return smape_score

You can see training parameters of the model to pass it to transform.

Let's begin with `TSTCCEmbeddingModel`

In [11]:
?TSTCCEmbeddingModel.fit

[0;31mSignature:[0m
[0mTSTCCEmbeddingModel[0m[0;34m.[0m[0mfit[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mself[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mx[0m[0;34m:[0m [0mnumpy[0m[0;34m.[0m[0mndarray[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_epochs[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;36m40[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlr[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m0.001[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtemperature[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m0.2[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlambda1[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlambda2[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m0.7[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;34m'TSTCCEmbeddingModel'[0m[0;34m[0m[0;34m[0m[0m
[0;31mDo

In [12]:
emb_model = TSTCCEmbeddingModel(input_dims=1, tc_hidden_dim=128, depth=5, output_dims=16, device="cuda")
training_params = {"n_epochs": 10}
smape_score = forecast_with_segment_embeddings(emb_model, training_params)

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:   32.6s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:  1.1min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  1.6min
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:  2.2min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  2.7min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  2.7min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    1.3s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    2.7s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    3.9s
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:    5.2s
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:    6.4s
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:    6.4s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.2s
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Do

In [13]:
print("SMAPE: ", smape_score)

SMAPE:  14.503439221629804


Better then without embeddings. Let's try `TS2VecEmbeddingModel`.

In [14]:
emb_model = TS2VecEmbeddingModel(input_dims=1, hidden_dims=128, depth=5, output_dims=16, device="cuda")
training_params = {"n_epochs": 10}
smape_score = forecast_with_segment_embeddings(emb_model, training_params)

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:   23.4s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:   46.8s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  1.2min
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:  1.6min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  2.0min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  2.0min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    1.6s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    2.8s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    4.2s
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:    5.5s
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:    6.9s
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:    6.9s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.2s
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Do

In [15]:
print("SMAPE: ", smape_score)

SMAPE:  13.98802254384104


Much better. Now let's try another transform.

### 2.3 EmbeddingWindowTransform <a class="anchor" id="section_2_3"></a>

`EmbeddingWindowTransform` calls models' `encode_window` method inside. As we have discussed, this features are not regressors and should be used as lags for future.

In [16]:
def forecast_with_window_embeddings(emb_model: BaseEmbeddingModel, training_params: dict) -> float:
    model = CatBoostMultiSegmentModel(task_type="GPU", devices="0")

    output_dims = emb_model.output_dims

    emb_transform = EmbeddingWindowTransform(
        in_columns=["target"], embedding_model=emb_model, training_params=training_params, out_column="embedding_window"
    )
    lag_emb_transforms = [
        LagTransform(in_column=f"embedding_window_{i}", lags=[HORIZON], out_column=f"lag_emb_{i}")
        for i in range(output_dims)
    ]
    filter_transforms = FilterFeaturesTransform(exclude=[f"embedding_window_{i}" for i in range(output_dims)])

    transforms = [lag_transform] + [emb_transform] + lag_emb_transforms + [filter_transforms]

    pipeline = Pipeline(model=model, transforms=transforms, horizon=HORIZON)
    metrics_df, _, _ = pipeline.backtest(ts, metrics=[SMAPE()])
    smape_score = metrics_df["SMAPE"].mean()
    return smape_score

In [17]:
emb_model = TSTCCEmbeddingModel(input_dims=1, tc_hidden_dim=64, depth=5, output_dims=16, device="cuda")
training_params = {"n_epochs": 20, "lr": 0.0005}
smape_score = forecast_with_window_embeddings(emb_model, training_params)

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:  1.5min
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:  2.9min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  4.4min
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:  5.9min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  7.4min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  7.4min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:   28.3s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:   56.6s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  1.4min
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:  1.9min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  2.4min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  2.4min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.2s
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Do

In [18]:
print("SMAPE: ", smape_score)

SMAPE:  105.1297881443976


Oops... What about `TS2VecEmbeddingModel`?

In [19]:
emb_model = TS2VecEmbeddingModel(input_dims=1, hidden_dims=64, depth=5, output_dims=16, device="cuda")
training_params = {"n_epochs": 20, "lr": 0.0005}
smape_score = forecast_with_window_embeddings(emb_model, training_params)

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:  1.2min
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:  2.4min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  3.6min
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:  4.8min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  6.1min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  6.1min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:   28.5s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:   56.7s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  1.4min
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:  1.9min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  2.4min
[Parallel(n_jobs=1)]: Done   5 tasks      | elapsed:  2.4min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.2s
[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Do

In [20]:
print("SMAPE: ", smape_score)

SMAPE:  19.582587762579827


Window embeddings don't help with this dataset. It means that in common you should try both models and both transforms to get the best results.