Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement EmbeddingSegmentTransform and EmbeddingWindowTransform #265

Merged
merged 16 commits into from
Mar 26, 2024

Conversation

egoriyaa
Copy link
Collaborator

@egoriyaa egoriyaa commented Mar 7, 2024

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

Closing issues

closes #262

Copy link

github-actions bot commented Mar 7, 2024

🚀 Deployed on https://deploy-preview-265--etna-docs.netlify.app

@github-actions github-actions bot temporarily deployed to pull request March 7, 2024 06:01 Inactive
@egoriyaa egoriyaa self-assigned this Mar 7, 2024
@egoriyaa egoriyaa added this to the Embeddings milestone Mar 7, 2024
@@ -26,6 +26,12 @@ def __init__(
output_dims: int = 320,
hidden_dims: int = 64,
depth: int = 10,
encoding_window: Optional[Union[Literal["multiscale"], int]] = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I return encode params in constructor of model because ts2vec и tstcс have different encode params (tstcc doesnt have at all) and they cant be passed to transform

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss possible solutions for this problem. For example, we could introduce in transform parameter like: encoding_params or smth like this. Probably, there are also other solutions.

@@ -101,57 +129,50 @@ def __init__(

self._is_fitted: bool = False

def _prepare_data(self, x: np.ndarray) -> np.ndarray:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I return modification of data to model because ts2vec и tstcс have different data transformations

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we in tstcc accept data in a format (n_segments, n_timestamps, input_dims)? It seems a more general approach to me.

Copy link
Collaborator Author

@egoriyaa egoriyaa Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source tstcc implementation gets (n_segments, input_dims, n_timestamps)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems very easy just to swap dimensions inside the fit/encode_ methods.

@egoriyaa egoriyaa requested review from Ama16 and d-a-bunin March 7, 2024 06:32
@github-actions github-actions bot temporarily deployed to pull request March 7, 2024 06:47 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 7, 2024 07:27 Inactive
Copy link

codecov bot commented Mar 7, 2024

Codecov Report

Attention: Patch coverage is 99.30556% with 1 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (embeddings@61d8077). Click here to learn what that means.

Files Patch % Lines
etna/libs/ts2vec/ts2vec.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             embeddings     #265   +/-   ##
=============================================
  Coverage              ?   88.75%           
=============================================
  Files                 ?      211           
  Lines                 ?    13814           
  Branches              ?        0           
=============================================
  Hits                  ?    12261           
  Misses                ?     1553           
  Partials              ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

class EmbeddingSegmentTransform(IrreversibleTransform):
"""Create the embedding features for the whole series out of embedding models."""

def __init__(self, in_columns: List[str], embedding_model: BaseEmbeddingModel, out_column: str = "emb"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets give different default names for out_column for transforms

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emb_segment and emb_window?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think out_column should be named like transform that created it (with all its parameters). Look at other transforms. If it seems too bad, let's at least make it "embedding_segment" here.

@@ -26,6 +26,12 @@ def __init__(
output_dims: int = 320,
hidden_dims: int = 64,
depth: int = 10,
encoding_window: Optional[Union[Literal["multiscale"], int]] = None,
sliding_length: Optional[int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this idea. Maybe we can come up with a compromise? For example, make it possible to explicitly change these parameters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what means "explicitly"?

"""Create the embedding features for each timestamp of series out of embedding models."""

def __init__(self, in_columns: List[str], embedding_model: BaseEmbeddingModel, out_column: str = "emb"):
"""Init EmbeddingSegmentTransform.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmbeddingSegmentTransform?

@egoriyaa
Copy link
Collaborator Author

egoriyaa commented Mar 7, 2024

What shall we do with inference tests?

class EmbeddingSegmentTransform(IrreversibleTransform):
"""Create the embedding features for the whole series out of embedding models."""

def __init__(self, in_columns: List[str], embedding_model: BaseEmbeddingModel, out_column: str = "emb"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think out_column should be named like transform that created it (with all its parameters). Look at other transforms. If it seems too bad, let's at least make it "embedding_segment" here.

etna/transforms/embeddings/embedding_segment.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/embedding_window.py Outdated Show resolved Hide resolved
@@ -101,57 +129,50 @@ def __init__(

self._is_fitted: bool = False

def _prepare_data(self, x: np.ndarray) -> np.ndarray:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we in tstcc accept data in a format (n_segments, n_timestamps, input_dims)? It seems a more general approach to me.

@@ -26,6 +26,12 @@ def __init__(
output_dims: int = 320,
hidden_dims: int = 64,
depth: int = 10,
encoding_window: Optional[Union[Literal["multiscale"], int]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss possible solutions for this problem. For example, we could introduce in transform parameter like: encoding_params or smth like this. Probably, there are also other solutions.



@pytest.mark.parametrize("embedding_model", [TS2VecEmbeddingModel(input_dims=3, n_epochs=1)])
def test_second_fit_not_update_state(ts_with_exog_nan_begin, embedding_model):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what does this test? What state do you mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source implementation of ts2vec do not support 2nd fit. it will not do nothing.
by the way it is not good in my opinion (but there are some hacks how to overcome this restriction)

3 ways in this situation

  1. leave everything as it was
  2. remove this restriction
  3. write about hacks how to overcome

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What is a source implementation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in libs/ts2vec

TS2VecEmbeddingModel(input_dims=3, output_dims=3, n_epochs=1),
],
)
def test_transform_new_segments(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this test if there is a similar test in inference tests?

@@ -17,6 +17,8 @@
from etna.transforms import DensityOutliersTransform
from etna.transforms import DeseasonalityTransform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About failed inference tests:

  • Check that two identical calls to encode models gives exactly the same result. May be we could set some random seed to make it more deterministic. If two subsequent calls give the same results it seems like the problem in procoessing different number of segments for some reason. Try to investigate it.
  • If we can't fix the problem then we could increase the margins for checking equality. It seems like the results are very similar anyway.

@github-actions github-actions bot temporarily deployed to pull request March 12, 2024 07:45 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 12, 2024 12:56 Inactive
Egor Baturin added 3 commits March 12, 2024 16:07
@github-actions github-actions bot temporarily deployed to pull request March 12, 2024 14:42 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 21, 2024 20:23 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 22, 2024 14:02 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 22, 2024 15:27 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 22, 2024 15:53 Inactive
@egoriyaa egoriyaa requested a review from d-a-bunin March 22, 2024 15:57
etna/libs/ts2vec/ts2vec.py Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/libs/ts2vec/ts2vec.py Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request March 23, 2024 21:03 Inactive
@egoriyaa egoriyaa requested a review from d-a-bunin March 25, 2024 07:34
etna/libs/ts2vec/losses.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
@egoriyaa egoriyaa requested a review from d-a-bunin March 25, 2024 12:29
@github-actions github-actions bot temporarily deployed to pull request March 25, 2024 12:30 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 25, 2024 15:14 Inactive
@egoriyaa egoriyaa merged commit 7761884 into embeddings Mar 26, 2024
16 checks passed
egoriyaa added a commit that referenced this pull request May 3, 2024
* Implement TS2VecModel (#253)

* add ts2vec model

* delete unnecessary utils

* add multiscale mode

* revert to common encode in model class

* lints

* reformat save method, add _is_fitted attr

* fix embeddings shapes

* fix

* one more fix

* pass numpy array to fit

* add tests checking nans in embeddings

* update changelog

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>

* Implement EmbeddingSegmentTransform and EmbeddingWindowTransform (#265)

* add transforms

* update changelog

* fix ts2vec tests

* fix

* update rst, encoding_params

* fix

* fix

* fix

* fix docstring

* add training_params

* add freeze method

* fix inference tests

* lints

* fix lisence

* fix lisence, fix docs

* fix quotes

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>

* Implement TSTCC (#294)

* add tstcc

* add einops package

* remove pd.testing in inference tests

* fix

* add verbose param, refactor logging, fix warning

* fix logging loss

* add changelog

* catch torch warning

* fix

* catch nn.Conv1d warning

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>

* lints

* fix

* Add tutorial how to work with embedding models (#304)

* fix tstcc

* move lr param from __init__ to fit

* add tutorial

* fix notebook

* update changelog

* fix changelog

* lints

* fix notebook

* update readme

* fix readme

* fix readme

* write comment in libs/ts2vec/ts2vec.py

* fix notebook

* remove multiscale option in ts2vec

* lints

* fix notebook

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>

* fix atol in inference tests

* downgrade poetry

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants