# Embedding models

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/210-embedding_models.ipynb)

This notebooks contains examples with embedding models.

**Table of contents**

* [Using embedding models directly](#chapter1)  
* [Using embedding models with transforms](#chapter2)
    * [Baseline](#section_2_1)
    * [EmbeddingSegmentTransform](#section_2_2)
    * [EmbeddingWindowTransform](#section_2_3)
* [Saving and loading models](#chapter3)
* [Loading external pretrained models](#chapter4)

In [1]:
import warnings

warnings.filterwarnings("ignore")

## 1. Using embedding models directly <a class="anchor" id="chapter1"></a>

We have two models to generate embeddings for time series: `TS2VecEmbeddingModel` and `TSTCCEmbeddingModel`.

Each model has following methods:

- `fit` to train model:
- `encode_segment` to generate embeddings for the whole series. These features are regressors.
- `encode_window` to generate embeddings for each timestamp. These features aren't regressors and lag transformation should be applied to them before using in forecasting.
- `freeze` to enable or disable skipping training in `fit` method. It is useful, for example, when you have a pretrained model and you want only to generate embeddings without new training during `backtest`.
- `save` and `load` to save and load pretrained models, respectively.

In [2]:
from lightning.pytorch import seed_everything

seed_everything(42, workers=True)

Seed set to 42


42

In [3]:
from etna.datasets import TSDataset
from etna.datasets import generate_ar_df

df = generate_ar_df(periods=10, start_time="2001-01-01", n_segments=3)
ts = TSDataset(df, freq="D")
ts.head()

segment,segment_0,segment_1,segment_2
feature,target,target,target
timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
2001-01-01,1.624345,1.462108,-1.100619
2001-01-02,1.012589,-0.598033,0.044105
2001-01-03,0.484417,-0.92045,0.945695
2001-01-04,-0.588551,-1.304504,1.44819
2001-01-05,0.276856,-0.170735,2.349046


Now let's work with models directly.

They are expecting array with shapes
(n_segments, n_timestamps, num_features). The example shows working with `TS2VecEmbeddingModel`, it is all the same with `TSTCCEmbeddingModel`.

In [4]:
x = ts.to_pandas().values.reshape(ts.size()).transpose(1, 0, 2)
x.shape

(3, 10, 1)

In [5]:
from etna.transforms.embeddings.models import TS2VecEmbeddingModel
from etna.transforms.embeddings.models import TSTCCEmbeddingModel

model_ts2vec = TS2VecEmbeddingModel(input_dims=1, output_dims=2)
model_ts2vec.fit(x, n_epochs=1)
segment_embeddings = model_ts2vec.encode_segment(x)
segment_embeddings.shape

(3, 2)

As we are using `encode_segment` we get `output_dims` features consisting of one value for each segment.

And what about `encode_window`?

In [6]:
window_embeddings = model_ts2vec.encode_window(x)
window_embeddings.shape

(3, 10, 2)

We get `output_dims` features consisting of `n_timestamps` values for each segment.

You can change some attributes of the model after initialization, for example `device`, `batch_size` or `num_workers`.

In [7]:
model_ts2vec.device = "cuda"

## 2. Using embedding models with transforms <a class="anchor" id="chapter2"></a>

In this section we will test our models on example.

In [8]:
HORIZON = 6

### 2.1 Baseline <a class="anchor" id="section_2_1"></a>

Before working with embedding features, let's make forecasts using usual features.

In [9]:
from etna.datasets import load_dataset

ts = load_dataset("m3_monthly")
ts = TSDataset(ts.to_pandas(features=["target"]), freq=None)
ts.head()

segment,M1000_MACRO,M1001_MACRO,M1002_MACRO,M1003_MACRO,M1004_MACRO,M1005_MACRO,M1006_MACRO,M1007_MACRO,M1008_MACRO,M1009_MACRO,...,M992_MACRO,M993_MACRO,M994_MACRO,M995_MACRO,M996_MACRO,M997_MACRO,M998_MACRO,M999_MACRO,M99_MICRO,M9_MICRO
feature,target,target,target,target,target,target,target,target,target,target,...,target,target,target,target,target,target,target,target,target,target
timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


In [10]:
from etna.metrics import SMAPE
from etna.models import CatBoostMultiSegmentModel
from etna.pipeline import Pipeline
from etna.transforms import LagTransform

model = CatBoostMultiSegmentModel()

lag_transform = LagTransform(in_column="target", lags=list(range(HORIZON, HORIZON + 6)), out_column="lag")

pipeline = Pipeline(model=model, transforms=[lag_transform], horizon=HORIZON)
metrics_df = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=3)["metrics"]

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    3.8s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    7.7s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:   11.6s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:   11.6s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    1.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    1.9s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    2.9s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    2.9s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s


In [11]:
print("SMAPE: ", metrics_df["SMAPE"].mean())

SMAPE:  14.739051071545429


### 2.2 EmbeddingSegmentTransform <a class="anchor" id="section_2_2"></a>

`EmbeddingSegmentTransform` calls models' `encode_segment` method inside.

In [12]:
from etna.transforms import EmbeddingSegmentTransform
from etna.transforms.embeddings.models import BaseEmbeddingModel


def forecast_with_segment_embeddings(
    emb_model: BaseEmbeddingModel, training_params: dict = {}, n_folds: int = 3
) -> float:
    model = CatBoostMultiSegmentModel()

    emb_transform = EmbeddingSegmentTransform(
        in_columns=["target"], embedding_model=emb_model, training_params=training_params, out_column="emb"
    )
    pipeline = Pipeline(model=model, transforms=[lag_transform, emb_transform], horizon=HORIZON)
    metrics_df = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=n_folds)["metrics"]
    smape_score = metrics_df["SMAPE"].mean()
    return smape_score

You can see training parameters of the model to pass it to transform.

Let's begin with `TSTCCEmbeddingModel`

In [13]:
?TSTCCEmbeddingModel.fit

[0;31mSignature:[0m
[0mTSTCCEmbeddingModel[0m[0;34m.[0m[0mfit[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mself[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mx[0m[0;34m:[0m [0mnumpy[0m[0;34m.[0m[0mndarray[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_epochs[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;36m40[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlr[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m0.001[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtemperature[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m0.2[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlambda1[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlambda2[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m0.7[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;34m'TSTCCEmbeddingModel'[0m[0;34m[0m[0;34m[0m[0m
[0;31mDo

In [14]:
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

emb_model = TSTCCEmbeddingModel(input_dims=1, tc_hidden_dim=16, depth=3, output_dims=6, device=device)
training_params = {"n_epochs": 10}
smape_score = forecast_with_segment_embeddings(emb_model, training_params)

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:   57.1s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:  2.0min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  3.2min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  3.2min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    4.7s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    9.7s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:   14.4s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:   14.4s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s


In [15]:
print("SMAPE: ", smape_score)

SMAPE:  14.063941642581863


Better then without embeddings. Let's try `TS2VecEmbeddingModel`.

In [16]:
emb_model = TS2VecEmbeddingModel(input_dims=1, hidden_dims=16, depth=3, output_dims=6, device=device)
training_params = {"n_epochs": 10}
smape_score = forecast_with_segment_embeddings(emb_model, training_params)

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:   26.9s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:   55.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  1.4min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  1.4min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    3.2s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    6.4s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    9.3s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    9.3s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s


In [17]:
print("SMAPE: ", smape_score)

SMAPE:  13.652730950082256


Much better. Now let's try another transform.

### 2.3 EmbeddingWindowTransform <a class="anchor" id="section_2_3"></a>

`EmbeddingWindowTransform` calls models' `encode_window` method inside. As we have discussed, these features are not regressors and should be used as lags for future.

In [18]:
from etna.transforms import EmbeddingWindowTransform
from etna.transforms import FilterFeaturesTransform


def forecast_with_window_embeddings(emb_model: BaseEmbeddingModel, training_params: dict) -> float:
    model = CatBoostMultiSegmentModel()

    output_dims = emb_model.output_dims

    emb_transform = EmbeddingWindowTransform(
        in_columns=["target"], embedding_model=emb_model, training_params=training_params, out_column="embedding_window"
    )
    lag_emb_transforms = [
        LagTransform(in_column=f"embedding_window_{i}", lags=[HORIZON], out_column=f"lag_emb_{i}")
        for i in range(output_dims)
    ]
    filter_transforms = FilterFeaturesTransform(exclude=[f"embedding_window_{i}" for i in range(output_dims)])

    transforms = [lag_transform] + [emb_transform] + lag_emb_transforms + [filter_transforms]

    pipeline = Pipeline(model=model, transforms=transforms, horizon=HORIZON)
    metrics_df = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=3)["metrics"]
    smape_score = metrics_df["SMAPE"].mean()
    return smape_score

In [19]:
emb_model = TSTCCEmbeddingModel(input_dims=1, tc_hidden_dim=16, depth=3, output_dims=6, device=device)
training_params = {"n_epochs": 10}
smape_score = forecast_with_window_embeddings(emb_model, training_params)

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:  1.2min
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:  2.5min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  3.8min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  3.8min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:   19.2s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:   39.3s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:   58.8s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:   58.8s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s


In [20]:
print("SMAPE: ", smape_score)

SMAPE:  121.51925421789397


Oops... What about `TS2VecEmbeddingModel`?

In [21]:
emb_model = TS2VecEmbeddingModel(input_dims=1, hidden_dims=16, depth=3, output_dims=6, device=device)
training_params = {"n_epochs": 10}
smape_score = forecast_with_window_embeddings(emb_model, training_params)

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:   42.6s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:  1.4min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  2.1min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  2.1min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:   17.4s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:   34.9s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:   52.4s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:   52.4s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.0s


In [22]:
print("SMAPE: ", smape_score)

SMAPE:  33.293029647204975


Window embeddings don't help with this dataset. It means that you should try both models and both transforms to get the best results.

## 3. Saving and loading models <a class="anchor" id="chapter3"></a>


If you have a pretrained embedding model and aren't going to train it on calling `fit`, you should "freeze" training loop. It is helpful for using the model inside transforms, which call `fit` method on each `fit` of the pipeline.

In [23]:
MODEL_PATH = "model.zip"

In [24]:
emb_model.freeze()
emb_model.save(MODEL_PATH)

Now you are ready to load pretrained model. 

In [25]:
model_loaded = TS2VecEmbeddingModel.load(MODEL_PATH)

If you need to fine-tune pretrained model, you should "unfreeze" training loop. After that it will start fitting on calling `fit` method.

In [26]:
model_loaded.freeze(is_freezed=False)

To get information about whether model is "freezed" or not use `is_freezed` property.

In [27]:
model_loaded.is_freezed

False

## 4. Loading external pretrained models <a class="anchor" id="chapter4"></a>



In this section we introduce our pretrained embedding models.

In [28]:
HORIZON = 12

ts = load_dataset("m3_monthly")
ts = TSDataset(ts.to_pandas(features=["target"]), freq=None)
ts.head()

segment,M1000_MACRO,M1001_MACRO,M1002_MACRO,M1003_MACRO,M1004_MACRO,M1005_MACRO,M1006_MACRO,M1007_MACRO,M1008_MACRO,M1009_MACRO,...,M992_MACRO,M993_MACRO,M994_MACRO,M995_MACRO,M996_MACRO,M997_MACRO,M998_MACRO,M999_MACRO,M99_MICRO,M9_MICRO
feature,target,target,target,target,target,target,target,target,target,target,...,target,target,target,target,target,target,target,target,target,target
timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


Our base pipeline with lags. 

In [29]:
model = CatBoostMultiSegmentModel()

lag_transform = LagTransform(in_column="target", lags=list(range(HORIZON, HORIZON + 6)), out_column="lag")

pipeline = Pipeline(model=model, transforms=[lag_transform], horizon=HORIZON)
metrics_df = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=1)["metrics"]

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    3.7s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    3.7s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    1.1s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    1.1s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s


In [30]:
print("SMAPE: ", metrics_df["SMAPE"].mean())

SMAPE:  15.419491588673255


It is often useful to encode segment by `SegmentEncoderTransform` when using multi-segment models like now.

In [31]:
from etna.transforms import SegmentEncoderTransform

model = CatBoostMultiSegmentModel()

lag_transform = LagTransform(in_column="target", lags=list(range(HORIZON, HORIZON + 6)), out_column="lag")
segment_transform = SegmentEncoderTransform()

pipeline = Pipeline(model=model, transforms=[lag_transform, segment_transform], horizon=HORIZON)
metrics_df = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=1)["metrics"]

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    9.4s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    9.4s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    2.7s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    2.7s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s


In [32]:
print("SMAPE: ", metrics_df["SMAPE"].mean())

SMAPE:  15.42220251168472


Segment embeddings from `EmbeddingSegmentTransform` can replace `SegmentEncoderTransform`'s feature. The main advantage of using segment embeddings is that you can forecast new segments by your trained pipeline. `SegmentEncoderTransform` can't work with segments that weren't present during training.

To see available embedding models use `list_models` method of `TS2VecEmbeddingModel` or `TSTCCEmbeddingModel`

In [33]:
TS2VecEmbeddingModel.list_models()

['ts2vec_tiny']

Let's load `ts2vec_tiny` pretrained model.

In [34]:
emb_model = TS2VecEmbeddingModel.load(path="ts2vec_model.zip", model_name="ts2vec_tiny")

smape_score = forecast_with_segment_embeddings(emb_model, n_folds=1)

[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    8.5s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    8.5s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    5.5s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    5.5s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s


In [35]:
print("SMAPE: ", smape_score)

SMAPE:  14.581084219761888


We get better result compared to `SegmentEncoderTransform` and opportunity to use pipeline for new segments.