Skip to content

Commit

Permalink
Add tutorial how to work with embedding models (#304)
Browse files Browse the repository at this point in the history
* fix tstcc

* move lr param from __init__ to fit

* add tutorial

* fix notebook

* update changelog

* fix changelog

* lints

* fix notebook

* update readme

* fix readme

* fix readme

* write comment in libs/ts2vec/ts2vec.py

* fix notebook

* remove multiscale option in ts2vec

* lints

* fix notebook

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>
  • Loading branch information
egoriyaa and Egor Baturin committed May 2, 2024
1 parent 2aee71c commit bb538d4
Show file tree
Hide file tree
Showing 9 changed files with 1,297 additions and 60 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Add `EmbeddingSegmentTransform` ([#265](https://github.com/etna-team/etna/pull/265))
- Add `EmbeddingWindowTransform` ([#265](https://github.com/etna-team/etna/pull/265))
- Add `TSTCCEmbeddingModel` ([#294](https://github.com/etna-team/etna/pull/294))
- Add `210-embedding_models` example notebook ([#304](https://github.com/etna-team/etna/pull/304))
-
-
-
Expand Down
43 changes: 22 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,27 +175,28 @@ To set up a configuration for your project you should create a `.etna` file at t

We have also prepared a set of tutorials for an easy introduction:

| Notebook | Interactive launch |
|:----------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------:|
| [Get started](https://github.com/etna-team/etna/tree/master/examples/101-get_started.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/101-get_started.ipynb) |
| [Backtest](https://github.com/etna-team/etna/tree/master/examples/102-backtest.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/102-backtest.ipynb) |
| [EDA](https://github.com/etna-team/etna/tree/master/examples/103-EDA.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/103-EDA.ipynb) |
| [Regressors and exogenous data](https://github.com/etna-team/etna/tree/master/examples/201-exogenous_data.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/201-exogenous_data.ipynb) |
| [Deep learning models](https://github.com/etna-team/etna/tree/master/examples/202-NN_examples.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/202-NN_examples.ipynb) |
| [Ensembles](https://github.com/etna-team/etna/tree/master/examples/303-ensembles.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/203-ensembles.ipynb) |
| [Outliers](https://github.com/etna-team/etna/tree/master/examples/204-outliers.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/204-outliers.ipynb) |
| [AutoML](https://github.com/etna-team/etna/tree/master/examples/205-automl.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/205-automl.ipynb) |
| [Clustering](https://github.com/etna-team/etna/tree/master/examples/206-clustering.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/206-clustering.ipynb) |
| [Feature selection](https://github.com/etna-team/etna/blob/master/examples/207-feature_selection.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/207-feature_selection.ipynb) |
| [Forecasting strategies](https://github.com/etna-team/etna/tree/master/examples/208-forecasting_strategies.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/208-forecasting_strategies.ipynb) |
| [Mechanics of forecasting](https://github.com/etna-team/etna/blob/master/examples/209-mechanics_of_forecasting.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/209-mechanics_of_forecasting.ipynb) |
| [Custom model and transform](https://github.com/etna-team/etna/tree/master/examples/301-custom_transform_and_model.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/301-custom_transform_and_model.ipynb) |
| [Inference: using saved pipeline on a new data](https://github.com/etna-team/etna/tree/master/examples/302-inference.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/302-inference.ipynb) |
| [Hierarchical time series](https://github.com/etna-team/etna/blob/master/examples/303-hierarchical_pipeline.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/303-hierarchical_pipeline.ipynb) |
| [Forecast interpretation](https://github.com/etna-team/etna/tree/master/examples/304-forecasting_interpretation.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/304-forecasting_interpretation.ipynb) |
| [Classification](https://github.com/etna-team/etna/blob/master/examples/305-classification.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/305-classification.ipynb) |
| [Prediction intervals](https://github.com/etna-team/etna/tree/master/examples/306-prediction_intervals.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/306-prediction_intervals.ipynb) |
| [Working with misaligned data](https://github.com/etna-team/etna/tree/master/examples/307-working_with_misaligned_data.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/307-working_with_misaligned_data.ipynb) |
| Notebook | Interactive launch |
|:------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| [Get started](https://github.com/etna-team/etna/tree/master/examples/101-get_started.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/101-get_started.ipynb) |
| [Backtest](https://github.com/etna-team/etna/tree/master/examples/102-backtest.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/102-backtest.ipynb) |
| [EDA](https://github.com/etna-team/etna/tree/master/examples/103-EDA.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/103-EDA.ipynb) |
| [Regressors and exogenous data](https://github.com/etna-team/etna/tree/master/examples/201-exogenous_data.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/201-exogenous_data.ipynb) |
| [Deep learning models](https://github.com/etna-team/etna/tree/master/examples/202-NN_examples.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/202-NN_examples.ipynb) |
| [Ensembles](https://github.com/etna-team/etna/tree/master/examples/303-ensembles.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/203-ensembles.ipynb) |
| [Outliers](https://github.com/etna-team/etna/tree/master/examples/204-outliers.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/204-outliers.ipynb) |
| [AutoML](https://github.com/etna-team/etna/tree/master/examples/205-automl.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/205-automl.ipynb) |
| [Clustering](https://github.com/etna-team/etna/tree/master/examples/206-clustering.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/206-clustering.ipynb) |
| [Feature selection](https://github.com/etna-team/etna/blob/master/examples/207-feature_selection.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/207-feature_selection.ipynb) |
| [Forecasting strategies](https://github.com/etna-team/etna/tree/master/examples/208-forecasting_strategies.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/208-forecasting_strategies.ipynb) |
| [Mechanics of forecasting](https://github.com/etna-team/etna/blob/master/examples/209-mechanics_of_forecasting.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/209-mechanics_of_forecasting.ipynb) |
| [Embedding models](https://github.com/etna-team/etna/blob/master/examples/210-embedding_models.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/210-embedding_models.ipynb) |
| [Custom model and transform](https://github.com/etna-team/etna/tree/master/examples/301-custom_transform_and_model.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/301-custom_transform_and_model.ipynb) |
| [Inference: using saved pipeline on a new data](https://github.com/etna-team/etna/tree/master/examples/302-inference.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/302-inference.ipynb) |
| [Hierarchical time series](https://github.com/etna-team/etna/blob/master/examples/303-hierarchical_pipeline.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/303-hierarchical_pipeline.ipynb) |
| [Forecast interpretation](https://github.com/etna-team/etna/tree/master/examples/304-forecasting_interpretation.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/304-forecasting_interpretation.ipynb) |
| [Classification](https://github.com/etna-team/etna/blob/master/examples/305-classification.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/305-classification.ipynb) |
| [Prediction intervals](https://github.com/etna-team/etna/tree/master/examples/306-prediction_intervals.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/306-prediction_intervals.ipynb) |
| [Working with misaligned data](https://github.com/etna-team/etna/tree/master/examples/307-working_with_misaligned_data.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/307-working_with_misaligned_data.ipynb) |

## Documentation

Expand Down
9 changes: 9 additions & 0 deletions docs/source/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,15 @@ Intermediate
^^^
How pipelines are making forecasts under the hood

.. grid-item-card:: Embedding models
:text-align: center
:link: tutorials/210-embedding_models
:link-type: doc
:class-header: card-tutorial-intermediate

^^^
How to use embedding models

Advanced
--------

Expand Down
29 changes: 6 additions & 23 deletions etna/libs/ts2vec/ts2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
SOFTWARE.
"""
# Note: Copied from ts2vec repository (https://github.com/yuezhihan/ts2vec/tree/main)
# Removed skipping training loop when model is already pretrained
# Removed skipping training loop when model is already pretrained. Removed "multiscale" encode option.
# Move lr parameter to fit method

import torch
import torch.nn.functional as F
Expand All @@ -45,7 +46,6 @@ def __init__(
hidden_dims=64,
depth=10,
device='cuda',
lr=0.001,
batch_size=16,
max_train_length=None,
temporal_unit=0,
Expand All @@ -60,7 +60,6 @@ def __init__(
hidden_dims (int): The hidden dimension of the encoder.
depth (int): The number of hidden residual blocks in the encoder.
device (str): The gpu used for training and inference.
lr (float): The learning rate.
batch_size (int): The batch size.
max_train_length (Union[int, NoneType]): The maximum allowed sequence length for training. For sequence with a length greater than <max_train_length>, it would be cropped into some sequences, each of which has a length less than <max_train_length>.
temporal_unit (int): The minimum unit to perform temporal contrast. When training on a very long sequence, this param helps to reduce the cost of time and memory.
Expand All @@ -70,7 +69,6 @@ def __init__(

super().__init__()
self.device = device
self.lr = lr
self.batch_size = batch_size
self.max_train_length = max_train_length
self.temporal_unit = temporal_unit
Expand All @@ -86,11 +84,12 @@ def __init__(
self.n_epochs = 0
self.n_iters = 0

def fit(self, train_data, n_epochs=None, n_iters=None, verbose=False):
def fit(self, train_data, lr=0.001, n_epochs=None, n_iters=None, verbose=False):
''' Training the TS2Vec model.
Args:
train_data (numpy.ndarray): The training data. It should have a shape of (n_instance, n_timestamps, n_features). All missing data should be set to NaN.
lr (float): The learning rate.
n_epochs (Union[int, NoneType]): The number of epochs. When this reaches, the training stops.
n_iters (Union[int, NoneType]): The number of iterations. When this reaches, the training stops. If both n_epochs and n_iters are not specified, a default setting would be used that sets n_iters to 200 for a dataset with size <= 100000, 600 otherwise.
verbose (bool): Whether to print the training loss after each epoch.
Expand Down Expand Up @@ -119,7 +118,7 @@ def fit(self, train_data, n_epochs=None, n_iters=None, verbose=False):
train_loader = DataLoader(train_dataset, batch_size=min(self.batch_size, len(train_dataset)), shuffle=True,
drop_last=True)

optimizer = torch.optim.AdamW(self._net.parameters(), lr=self.lr)
optimizer = torch.optim.AdamW(self._net.parameters(), lr=lr)

loss_log = []

Expand Down Expand Up @@ -214,22 +213,6 @@ def _eval_with_pooling(self, x, mask=None, slicing=None, encoding_window=None):
if slicing is not None:
out = out[:, slicing]

elif encoding_window == 'multiscale':
p = 0
reprs = []
while (1 << p) + 1 < out.size(1):
t_out = F.max_pool1d(
out.transpose(1, 2),
kernel_size=(1 << (p + 1)) + 1,
stride=1,
padding=1 << p
).transpose(1, 2)
if slicing is not None:
t_out = t_out[:, slicing]
reprs.append(t_out)
p += 1
out = torch.cat(reprs, dim=-1)

else:
if slicing is not None:
out = out[:, slicing]
Expand All @@ -243,7 +226,7 @@ def encode(self, data, mask=None, encoding_window=None, causal=False, sliding_le
Args:
data (numpy.ndarray): This should have a shape of (n_instance, n_timestamps, n_features). All missing data should be set to NaN.
mask (str): The mask used by encoder can be specified with this parameter. This can be set to 'binomial', 'continuous', 'all_true', 'all_false' or 'mask_last'.
encoding_window (Union[str, int]): When this param is specified, the computed representation would the max pooling over this window. This can be set to 'full_series', 'multiscale' or an integer specifying the pooling kernel size.
encoding_window (Union[str, int]): When this param is specified, the computed representation would the max pooling over this window. This can be set to 'full_series' or an integer specifying the pooling kernel size.
causal (bool): When this param is set to True, the future information would not be encoded into representation of each timestamp.
sliding_length (Union[int, NoneType]): The length of sliding window. When this param is specified, a sliding inference would be applied on the time series.
sliding_padding (int): This param specifies the contextual data length used for inference every sliding windows.
Expand Down
2 changes: 1 addition & 1 deletion etna/libs/tstcc/augmentations.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def permutation(x, max_segments=5, seg_mode="random"):
else:
splits = np.array_split(orig_steps, num_segs[i])
# add `np.asarray(splits, dtype=object)` instead of `splits` due to warning about different length of arrays
warp = np.concatenate(np.random.permutation(np.asarray(splits, dtype=object))).ravel()
warp = np.concatenate(np.random.permutation(np.asarray(splits, dtype=object))).ravel().astype(float)
ret[i] = pat[0, warp]
else:
ret[i] = pat
Expand Down
Loading

0 comments on commit bb538d4

Please sign in to comment.