-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement TS2VecModel #253
Conversation
🚀 Deployed on https://deploy-preview-253--etna-docs.netlify.app |
Can I change something in For example, unused imports and typings |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## embeddings #253 +/- ##
==============================================
- Coverage 89.07% 88.64% -0.43%
==============================================
Files 199 208 +9
Lines 13249 13684 +435
==============================================
+ Hits 11801 12130 +329
- Misses 1448 1554 +106 ☔ View full report in Codecov by Sentry. |
tests/test_transforms/test_embeddings/test_models/test_ts2vec.py
Outdated
Show resolved
Hide resolved
tests/test_transforms/test_embeddings/test_models/test_ts2vec.py
Outdated
Show resolved
Hide resolved
) | ||
|
||
def _prepare_data(self, df: pd.DataFrame) -> np.ndarray: | ||
"""Prepare data for the embedding model.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably clarify what this transformation does. E.g. in the comments or in the docstring.
self.embedding_model.fit(train_data=x, n_epochs=self.n_epochs, n_iters=self.n_iters, verbose=self.verbose) | ||
return self | ||
|
||
def encode_segment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i decided to pass parameters for encode here.
the advantage is that we can call encode methods several times with different params with one model
data=x, | ||
mask=mask, | ||
encoding_window=encoding_window, | ||
causal=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ama16 is that true for encode_window
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to generate without leaks and prevent the user from generating with leaks, then yes
|
||
|
||
@pytest.mark.parametrize("output_dims", [2, 3]) | ||
def test_full_series_equal_values(simple_ts_with_exog, output_dims): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checking that values are the same is a part of the format. Or, as I suggested above, we shouldn't make a copy of the same data.
|
||
|
||
@pytest.mark.parametrize("output_dims", [2, 3]) | ||
def test_full_series_equal_values(simple_ts_with_exog, output_dims): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you testing in this test? That all values are equal?
Why then is there a function that cuts off the nans? On the contrary, you need to check that it generates the same values with nans
n_timestamps = len(df.index) | ||
n_segments = df.columns.get_level_values("segment").nunique() | ||
df = df.sort_index(axis=1) | ||
x = df.values.reshape((n_timestamps, n_segments, self.input_dims)).transpose(1, 0, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there are other features except for target
inside df
? Do we want to consider such a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume that common interface is using Transform, where attribute in_column
is used.
Here we can add note about such case
|
||
Notes | ||
----- | ||
Model works with the index sorted in the alphabetic order. Thus, output embeddings correspond to the segments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think it should be in a Notes
section. It is really important for the user, not some implementation detail. There are two options:
- Return segments in their order in
df
. - Write in a docstring extended description:
Output embeddings correspond to the segments, sorted in alphabetical order
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ama16 why is there such condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above
|
||
Notes | ||
----- | ||
Model works with the index sorted in the alphabetic order. Thus, output embeddings correspond to the segments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above
* Implement TS2VecModel (#253) * add ts2vec model * delete unnecessary utils * add multiscale mode * revert to common encode in model class * lints * reformat save method, add _is_fitted attr * fix embeddings shapes * fix * one more fix * pass numpy array to fit * add tests checking nans in embeddings * update changelog --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * Implement EmbeddingSegmentTransform and EmbeddingWindowTransform (#265) * add transforms * update changelog * fix ts2vec tests * fix * update rst, encoding_params * fix * fix * fix * fix docstring * add training_params * add freeze method * fix inference tests * lints * fix lisence * fix lisence, fix docs * fix quotes --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * Implement TSTCC (#294) * add tstcc * add einops package * remove pd.testing in inference tests * fix * add verbose param, refactor logging, fix warning * fix logging loss * add changelog * catch torch warning * fix * catch nn.Conv1d warning --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * lints * fix * Add tutorial how to work with embedding models (#304) * fix tstcc * move lr param from __init__ to fit * add tutorial * fix notebook * update changelog * fix changelog * lints * fix notebook * update readme * fix readme * fix readme * write comment in libs/ts2vec/ts2vec.py * fix notebook * remove multiscale option in ts2vec * lints * fix notebook --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * fix atol in inference tests * downgrade poetry --------- Co-authored-by: Egor Baturin <egoriyaa@github.com>
Before submitting (must do checklist)
Proposed Changes
Closing issues
closes #245