Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TS2VecModel #253

Merged
merged 12 commits into from
Mar 6, 2024
Merged

Implement TS2VecModel #253

merged 12 commits into from
Mar 6, 2024

Conversation

egoriyaa
Copy link
Collaborator

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

Closing issues

closes #245

@egoriyaa egoriyaa added this to the Embeddings milestone Feb 27, 2024
@egoriyaa egoriyaa self-assigned this Feb 27, 2024
Copy link

github-actions bot commented Feb 27, 2024

🚀 Deployed on https://deploy-preview-253--etna-docs.netlify.app

@github-actions github-actions bot temporarily deployed to pull request February 27, 2024 02:06 Inactive
etna/libs/ts2vec/ts2vec.py Outdated Show resolved Hide resolved
etna/libs/ts2vec/ts2vec.py Outdated Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request February 27, 2024 02:20 Inactive
@egoriyaa
Copy link
Collaborator Author

egoriyaa commented Feb 27, 2024

Can I change something in libs/? Or I should leave all that it was in a source code?

For example, unused imports and typings

Copy link

codecov bot commented Feb 27, 2024

Codecov Report

Attention: Patch coverage is 75.63218% with 106 lines in your changes are missing coverage. Please review.

Project coverage is 88.64%. Comparing base (aa9890a) to head (fa1e821).

Files Patch % Lines
etna/libs/ts2vec/ts2vec.py 66.26% 56 Missing ⚠️
etna/libs/ts2vec/utils.py 56.89% 25 Missing ⚠️
etna/libs/ts2vec/encoder.py 65.45% 19 Missing ⚠️
etna/transforms/embeddings/models/base.py 82.35% 3 Missing ⚠️
etna/libs/ts2vec/losses.py 95.55% 2 Missing ⚠️
etna/libs/ts2vec/dilated_conv.py 97.14% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##           embeddings     #253      +/-   ##
==============================================
- Coverage       89.07%   88.64%   -0.43%     
==============================================
  Files             199      208       +9     
  Lines           13249    13684     +435     
==============================================
+ Hits            11801    12130     +329     
- Misses           1448     1554     +106     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot temporarily deployed to pull request February 27, 2024 14:59 Inactive
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
)

def _prepare_data(self, df: pd.DataFrame) -> np.ndarray:
"""Prepare data for the embedding model."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably clarify what this transformation does. E.g. in the comments or in the docstring.

etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request February 29, 2024 07:24 Inactive
self.embedding_model.fit(train_data=x, n_epochs=self.n_epochs, n_iters=self.n_iters, verbose=self.verbose)
return self

def encode_segment(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i decided to pass parameters for encode here.
the advantage is that we can call encode methods several times with different params with one model

data=x,
mask=mask,
encoding_window=encoding_window,
causal=True,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ama16 is that true for encode_window?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to generate without leaks and prevent the user from generating with leaks, then yes

@github-actions github-actions bot temporarily deployed to pull request February 29, 2024 08:51 Inactive
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Show resolved Hide resolved
etna/transforms/embeddings/models/ts2vec.py Outdated Show resolved Hide resolved


@pytest.mark.parametrize("output_dims", [2, 3])
def test_full_series_equal_values(simple_ts_with_exog, output_dims):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking that values are the same is a part of the format. Or, as I suggested above, we shouldn't make a copy of the same data.



@pytest.mark.parametrize("output_dims", [2, 3])
def test_full_series_equal_values(simple_ts_with_exog, output_dims):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you testing in this test? That all values are equal?
Why then is there a function that cuts off the nans? On the contrary, you need to check that it generates the same values with nans

@egoriyaa egoriyaa requested review from Ama16 and d-a-bunin March 4, 2024 06:04
@github-actions github-actions bot temporarily deployed to pull request March 4, 2024 06:48 Inactive
etna/transforms/embeddings/models/base.py Outdated Show resolved Hide resolved
n_timestamps = len(df.index)
n_segments = df.columns.get_level_values("segment").nunique()
df = df.sort_index(axis=1)
x = df.values.reshape((n_timestamps, n_segments, self.input_dims)).transpose(1, 0, 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there are other features except for target inside df? Do we want to consider such a case?

Copy link
Collaborator Author

@egoriyaa egoriyaa Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume that common interface is using Transform, where attribute in_column is used.
Here we can add note about such case


Notes
-----
Model works with the index sorted in the alphabetic order. Thus, output embeddings correspond to the segments,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think it should be in a Notes section. It is really important for the user, not some implementation detail. There are two options:

  • Return segments in their order in df.
  • Write in a docstring extended description: Output embeddings correspond to the segments, sorted in alphabetical order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ama16 why is there such condition?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above


Notes
-----
Model works with the index sorted in the alphabetic order. Thus, output embeddings correspond to the segments,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above

@egoriyaa egoriyaa requested a review from d-a-bunin March 5, 2024 11:30
@github-actions github-actions bot temporarily deployed to pull request March 5, 2024 11:33 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 5, 2024 16:26 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 6, 2024 07:54 Inactive
@egoriyaa egoriyaa merged commit 61d8077 into embeddings Mar 6, 2024
16 checks passed
@d-a-bunin d-a-bunin mentioned this pull request Mar 7, 2024
egoriyaa added a commit that referenced this pull request May 3, 2024
* Implement TS2VecModel (#253)

* add ts2vec model

* delete unnecessary utils

* add multiscale mode

* revert to common encode in model class

* lints

* reformat save method, add _is_fitted attr

* fix embeddings shapes

* fix

* one more fix

* pass numpy array to fit

* add tests checking nans in embeddings

* update changelog

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>

* Implement EmbeddingSegmentTransform and EmbeddingWindowTransform (#265)

* add transforms

* update changelog

* fix ts2vec tests

* fix

* update rst, encoding_params

* fix

* fix

* fix

* fix docstring

* add training_params

* add freeze method

* fix inference tests

* lints

* fix lisence

* fix lisence, fix docs

* fix quotes

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>

* Implement TSTCC (#294)

* add tstcc

* add einops package

* remove pd.testing in inference tests

* fix

* add verbose param, refactor logging, fix warning

* fix logging loss

* add changelog

* catch torch warning

* fix

* catch nn.Conv1d warning

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>

* lints

* fix

* Add tutorial how to work with embedding models (#304)

* fix tstcc

* move lr param from __init__ to fit

* add tutorial

* fix notebook

* update changelog

* fix changelog

* lints

* fix notebook

* update readme

* fix readme

* fix readme

* write comment in libs/ts2vec/ts2vec.py

* fix notebook

* remove multiscale option in ts2vec

* lints

* fix notebook

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>

* fix atol in inference tests

* downgrade poetry

---------

Co-authored-by: Egor Baturin <egoriyaa@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants