Implement EmbeddingSegmentTransform and EmbeddingWindowTransform #265

egoriyaa · 2024-03-07T05:56:50Z

Before submitting (must do checklist)

Did you read the contribution guide?
Did you update the docs? We use Numpy format for all the methods and classes.
Did you write any new necessary tests?
Did you update the CHANGELOG?

Proposed Changes

Closing issues

closes #262

github-actions · 2024-03-07T06:01:23Z

🚀 Deployed on https://deploy-preview-265--etna-docs.netlify.app

egoriyaa · 2024-03-07T06:28:31Z

etna/transforms/embeddings/models/ts2vec.py

@@ -26,6 +26,12 @@ def __init__(
        output_dims: int = 320,
        hidden_dims: int = 64,
        depth: int = 10,
+        encoding_window: Optional[Union[Literal["multiscale"], int]] = None,


I return encode params in constructor of model because ts2vec и tstcс have different encode params (tstcc doesnt have at all) and they cant be passed to transform

We should discuss possible solutions for this problem. For example, we could introduce in transform parameter like: encoding_params or smth like this. Probably, there are also other solutions.

egoriyaa · 2024-03-07T06:30:57Z

etna/transforms/embeddings/models/ts2vec.py

@@ -101,57 +129,50 @@ def __init__(

        self._is_fitted: bool = False

+    def _prepare_data(self, x: np.ndarray) -> np.ndarray:


I return modification of data to model because ts2vec и tstcс have different data transformations

Why can't we in tstcc accept data in a format (n_segments, n_timestamps, input_dims)? It seems a more general approach to me.

source tstcc implementation gets (n_segments, input_dims, n_timestamps)

It seems very easy just to swap dimensions inside the fit/encode_ methods.

codecov · 2024-03-07T08:03:47Z

Codecov Report

Attention: Patch coverage is 99.30556% with 1 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (embeddings@61d8077). Click here to learn what that means.

Files	Patch %	Lines
etna/libs/ts2vec/ts2vec.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##             embeddings     #265   +/-   ##
=============================================
  Coverage              ?   88.75%           
=============================================
  Files                 ?      211           
  Lines                 ?    13814           
  Branches              ?        0           
=============================================
  Hits                  ?    12261           
  Misses                ?     1553           
  Partials              ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Ama16 · 2024-03-07T10:29:10Z

etna/transforms/embeddings/embedding_segment.py

+class EmbeddingSegmentTransform(IrreversibleTransform):
+    """Create the embedding features for the whole series out of embedding models."""
+
+    def __init__(self, in_columns: List[str], embedding_model: BaseEmbeddingModel, out_column: str = "emb"):


Lets give different default names for out_column for transforms

emb_segment and emb_window?

I think out_column should be named like transform that created it (with all its parameters). Look at other transforms. If it seems too bad, let's at least make it "embedding_segment" here.

Ama16 · 2024-03-07T11:48:14Z

etna/transforms/embeddings/models/ts2vec.py

@@ -26,6 +26,12 @@ def __init__(
        output_dims: int = 320,
        hidden_dims: int = 64,
        depth: int = 10,
+        encoding_window: Optional[Union[Literal["multiscale"], int]] = None,
+        sliding_length: Optional[int] = None,


I don't really like this idea. Maybe we can come up with a compromise? For example, make it possible to explicitly change these parameters

what means "explicitly"?

Ama16 · 2024-03-07T13:01:19Z

etna/transforms/embeddings/embedding_window.py

+    """Create the embedding features for each timestamp of series out of embedding models."""
+
+    def __init__(self, in_columns: List[str], embedding_model: BaseEmbeddingModel, out_column: str = "emb"):
+        """Init EmbeddingSegmentTransform.


EmbeddingSegmentTransform?

egoriyaa · 2024-03-07T14:41:31Z

What shall we do with inference tests?

d-a-bunin · 2024-03-07T16:06:33Z

etna/transforms/embeddings/embedding_segment.py

+class EmbeddingSegmentTransform(IrreversibleTransform):
+    """Create the embedding features for the whole series out of embedding models."""
+
+    def __init__(self, in_columns: List[str], embedding_model: BaseEmbeddingModel, out_column: str = "emb"):


I think out_column should be named like transform that created it (with all its parameters). Look at other transforms. If it seems too bad, let's at least make it "embedding_segment" here.

etna/transforms/embeddings/embedding_segment.py

etna/transforms/embeddings/embedding_window.py

d-a-bunin · 2024-03-07T16:12:40Z

etna/transforms/embeddings/models/ts2vec.py

@@ -101,57 +129,50 @@ def __init__(

        self._is_fitted: bool = False

+    def _prepare_data(self, x: np.ndarray) -> np.ndarray:


Why can't we in tstcc accept data in a format (n_segments, n_timestamps, input_dims)? It seems a more general approach to me.

d-a-bunin · 2024-03-07T16:14:09Z

etna/transforms/embeddings/models/ts2vec.py

@@ -26,6 +26,12 @@ def __init__(
        output_dims: int = 320,
        hidden_dims: int = 64,
        depth: int = 10,
+        encoding_window: Optional[Union[Literal["multiscale"], int]] = None,


We should discuss possible solutions for this problem. For example, we could introduce in transform parameter like: encoding_params or smth like this. Probably, there are also other solutions.

d-a-bunin · 2024-03-07T16:16:47Z

tests/test_transforms/test_embeddings/test_embedding_segment_transform.py

+
+
+@pytest.mark.parametrize("embedding_model", [TS2VecEmbeddingModel(input_dims=3, n_epochs=1)])
+def test_second_fit_not_update_state(ts_with_exog_nan_begin, embedding_model):


Can you explain what does this test? What state do you mean?

source implementation of ts2vec do not support 2nd fit. it will not do nothing.
by the way it is not good in my opinion (but there are some hacks how to overcome this restriction)

3 ways in this situation

leave everything as it was

remove this restriction

write about hacks how to overcome

What is a source implementation?

in libs/ts2vec

d-a-bunin · 2024-03-07T16:20:35Z

tests/test_transforms/test_embeddings/test_embedding_segment_transform.py

+        TS2VecEmbeddingModel(input_dims=3, output_dims=3, n_epochs=1),
+    ],
+)
+def test_transform_new_segments(


Do we need this test if there is a similar test in inference tests?

tests/test_transforms/test_embeddings/test_embedding_segment_transform.py

tests/test_transforms/test_embeddings/test_embedding_window_transform.py

tests/test_transforms/test_inference/test_transform.py

d-a-bunin · 2024-03-07T16:27:50Z

tests/test_transforms/test_inference/test_transform.py

@@ -17,6 +17,8 @@
 from etna.transforms import DensityOutliersTransform
 from etna.transforms import DeseasonalityTransform


About failed inference tests:

Check that two identical calls to encode models gives exactly the same result. May be we could set some random seed to make it more deterministic. If two subsequent calls give the same results it seems like the problem in procoessing different number of segments for some reason. Try to investigate it.

If we can't fix the problem then we could increase the margins for checking equality. It seems like the results are very similar anyway.

etna/transforms/embeddings/embedding_segment.py

tests/test_transforms/test_embeddings/test_embedding_window_transform.py

etna/libs/ts2vec/ts2vec.py

etna/transforms/embeddings/models/ts2vec.py

tests/test_transforms/test_embeddings/test_models/test_ts2vec.py

etna/libs/ts2vec/ts2vec.py

etna/libs/ts2vec/losses.py

etna/transforms/embeddings/models/ts2vec.py

etna/libs/ts2vec/ts2vec.py

* Implement TS2VecModel (#253) * add ts2vec model * delete unnecessary utils * add multiscale mode * revert to common encode in model class * lints * reformat save method, add _is_fitted attr * fix embeddings shapes * fix * one more fix * pass numpy array to fit * add tests checking nans in embeddings * update changelog --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * Implement EmbeddingSegmentTransform and EmbeddingWindowTransform (#265) * add transforms * update changelog * fix ts2vec tests * fix * update rst, encoding_params * fix * fix * fix * fix docstring * add training_params * add freeze method * fix inference tests * lints * fix lisence * fix lisence, fix docs * fix quotes --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * Implement TSTCC (#294) * add tstcc * add einops package * remove pd.testing in inference tests * fix * add verbose param, refactor logging, fix warning * fix logging loss * add changelog * catch torch warning * fix * catch nn.Conv1d warning --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * lints * fix * Add tutorial how to work with embedding models (#304) * fix tstcc * move lr param from __init__ to fit * add tutorial * fix notebook * update changelog * fix changelog * lints * fix notebook * update readme * fix readme * fix readme * write comment in libs/ts2vec/ts2vec.py * fix notebook * remove multiscale option in ts2vec * lints * fix notebook --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * fix atol in inference tests * downgrade poetry --------- Co-authored-by: Egor Baturin <egoriyaa@github.com>

add transforms

3442006

github-actions bot temporarily deployed to pull request March 7, 2024 06:01 Inactive

egoriyaa self-assigned this Mar 7, 2024

egoriyaa added this to the Embeddings milestone Mar 7, 2024

update changelog

103d633

egoriyaa commented Mar 7, 2024

View reviewed changes

egoriyaa requested review from Ama16 and d-a-bunin March 7, 2024 06:32

github-actions bot temporarily deployed to pull request March 7, 2024 06:47 Inactive

fix ts2vec tests

4315de7

github-actions bot temporarily deployed to pull request March 7, 2024 07:27 Inactive

Ama16 requested changes Mar 7, 2024

View reviewed changes

d-a-bunin requested changes Mar 7, 2024

View reviewed changes

d-a-bunin reviewed Mar 7, 2024

View reviewed changes

fix

090d57e

github-actions bot temporarily deployed to pull request March 12, 2024 07:45 Inactive

update rst, encoding_params

3fd2d61

github-actions bot temporarily deployed to pull request March 12, 2024 12:56 Inactive

Egor Baturin added 3 commits March 12, 2024 16:07

fix

4456252

fix

f30b37a

fix

64d9c1f

egoriyaa requested review from d-a-bunin and Ama16 March 12, 2024 13:13

d-a-bunin reviewed Mar 12, 2024

View reviewed changes

etna/transforms/embeddings/embedding_segment.py Outdated Show resolved Hide resolved

tests/test_transforms/test_embeddings/test_embedding_window_transform.py Outdated Show resolved Hide resolved

github-actions bot temporarily deployed to pull request March 12, 2024 14:42 Inactive

fix docstring

4286328

github-actions bot temporarily deployed to pull request March 21, 2024 20:23 Inactive

add training_params

bd0be16

github-actions bot temporarily deployed to pull request March 22, 2024 14:02 Inactive

add freeze method

7bf5fd1

github-actions bot temporarily deployed to pull request March 22, 2024 15:27 Inactive

Egor Baturin added 2 commits March 22, 2024 18:45

fix inference tests

dc5703c

lints

950788f

github-actions bot temporarily deployed to pull request March 22, 2024 15:53 Inactive

egoriyaa requested a review from d-a-bunin March 22, 2024 15:57

d-a-bunin requested changes Mar 22, 2024

View reviewed changes

fix lisence

acf63cc

github-actions bot temporarily deployed to pull request March 23, 2024 21:03 Inactive

egoriyaa requested a review from d-a-bunin March 25, 2024 07:34

d-a-bunin reviewed Mar 25, 2024

View reviewed changes

etna/libs/ts2vec/losses.py Outdated Show resolved Hide resolved

etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved

d-a-bunin requested changes Mar 25, 2024

View reviewed changes

etna/transforms/embeddings/models/ts2vec.py Show resolved Hide resolved

etna/transforms/embeddings/models/ts2vec.py Show resolved Hide resolved

etna/libs/ts2vec/ts2vec.py Show resolved Hide resolved

fix lisence, fix docs

4f940ad

egoriyaa requested a review from d-a-bunin March 25, 2024 12:29

github-actions bot temporarily deployed to pull request March 25, 2024 12:30 Inactive

d-a-bunin approved these changes Mar 25, 2024

View reviewed changes

fix quotes

8058c7e

github-actions bot temporarily deployed to pull request March 25, 2024 15:14 Inactive

egoriyaa merged commit 7761884 into embeddings Mar 26, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement EmbeddingSegmentTransform and EmbeddingWindowTransform #265

Implement EmbeddingSegmentTransform and EmbeddingWindowTransform #265

egoriyaa commented Mar 7, 2024

github-actions bot commented Mar 7, 2024 •

edited

Loading

egoriyaa Mar 7, 2024

d-a-bunin Mar 7, 2024

egoriyaa Mar 7, 2024

d-a-bunin Mar 7, 2024

egoriyaa Mar 11, 2024 •

edited

Loading

d-a-bunin Mar 12, 2024

codecov bot commented Mar 7, 2024 •

edited

Loading

Ama16 Mar 7, 2024

egoriyaa Mar 7, 2024

d-a-bunin Mar 7, 2024

Ama16 Mar 7, 2024

egoriyaa Mar 7, 2024

Ama16 Mar 7, 2024

egoriyaa commented Mar 7, 2024

d-a-bunin Mar 7, 2024

d-a-bunin Mar 7, 2024

d-a-bunin Mar 7, 2024

d-a-bunin Mar 7, 2024

egoriyaa Mar 11, 2024

d-a-bunin Mar 12, 2024

egoriyaa Mar 12, 2024

d-a-bunin Mar 7, 2024

d-a-bunin Mar 7, 2024

		@@ -101,57 +129,50 @@ def __init__(

		self._is_fitted: bool = False

		def _prepare_data(self, x: np.ndarray) -> np.ndarray:



		@pytest.mark.parametrize("embedding_model", [TS2VecEmbeddingModel(input_dims=3, n_epochs=1)])
		def test_second_fit_not_update_state(ts_with_exog_nan_begin, embedding_model):

		@@ -17,6 +17,8 @@
		from etna.transforms import DensityOutliersTransform
		from etna.transforms import DeseasonalityTransform

Implement EmbeddingSegmentTransform and EmbeddingWindowTransform #265

Implement EmbeddingSegmentTransform and EmbeddingWindowTransform #265

Conversation

egoriyaa commented Mar 7, 2024

Before submitting (must do checklist)

Proposed Changes

Closing issues

github-actions bot commented Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egoriyaa Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 7, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egoriyaa commented Mar 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 7, 2024 •

edited

Loading

egoriyaa Mar 11, 2024 •

edited

Loading

codecov bot commented Mar 7, 2024 •

edited

Loading