-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement EmbeddingSegmentTransform and EmbeddingWindowTransform #265
Conversation
🚀 Deployed on https://deploy-preview-265--etna-docs.netlify.app |
@@ -26,6 +26,12 @@ def __init__( | |||
output_dims: int = 320, | |||
hidden_dims: int = 64, | |||
depth: int = 10, | |||
encoding_window: Optional[Union[Literal["multiscale"], int]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I return encode params in constructor of model because ts2vec и tstcс have different encode params (tstcc doesnt have at all) and they cant be passed to transform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss possible solutions for this problem. For example, we could introduce in transform parameter like: encoding_params
or smth like this. Probably, there are also other solutions.
@@ -101,57 +129,50 @@ def __init__( | |||
|
|||
self._is_fitted: bool = False | |||
|
|||
def _prepare_data(self, x: np.ndarray) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I return modification of data to model because ts2vec и tstcс have different data transformations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we in tstcc accept data in a format (n_segments, n_timestamps, input_dims)? It seems a more general approach to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source tstcc implementation gets (n_segments, input_dims, n_timestamps)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems very easy just to swap dimensions inside the fit
/encode_
methods.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## embeddings #265 +/- ##
=============================================
Coverage ? 88.75%
=============================================
Files ? 211
Lines ? 13814
Branches ? 0
=============================================
Hits ? 12261
Misses ? 1553
Partials ? 0 ☔ View full report in Codecov by Sentry. |
class EmbeddingSegmentTransform(IrreversibleTransform): | ||
"""Create the embedding features for the whole series out of embedding models.""" | ||
|
||
def __init__(self, in_columns: List[str], embedding_model: BaseEmbeddingModel, out_column: str = "emb"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets give different default names for out_column for transforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emb_segment
and emb_window
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think out_column
should be named like transform that created it (with all its parameters). Look at other transforms. If it seems too bad, let's at least make it "embedding_segment" here.
@@ -26,6 +26,12 @@ def __init__( | |||
output_dims: int = 320, | |||
hidden_dims: int = 64, | |||
depth: int = 10, | |||
encoding_window: Optional[Union[Literal["multiscale"], int]] = None, | |||
sliding_length: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this idea. Maybe we can come up with a compromise? For example, make it possible to explicitly change these parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what means "explicitly"?
"""Create the embedding features for each timestamp of series out of embedding models.""" | ||
|
||
def __init__(self, in_columns: List[str], embedding_model: BaseEmbeddingModel, out_column: str = "emb"): | ||
"""Init EmbeddingSegmentTransform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EmbeddingSegmentTransform?
What shall we do with inference tests? |
class EmbeddingSegmentTransform(IrreversibleTransform): | ||
"""Create the embedding features for the whole series out of embedding models.""" | ||
|
||
def __init__(self, in_columns: List[str], embedding_model: BaseEmbeddingModel, out_column: str = "emb"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think out_column
should be named like transform that created it (with all its parameters). Look at other transforms. If it seems too bad, let's at least make it "embedding_segment" here.
@@ -101,57 +129,50 @@ def __init__( | |||
|
|||
self._is_fitted: bool = False | |||
|
|||
def _prepare_data(self, x: np.ndarray) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we in tstcc accept data in a format (n_segments, n_timestamps, input_dims)? It seems a more general approach to me.
@@ -26,6 +26,12 @@ def __init__( | |||
output_dims: int = 320, | |||
hidden_dims: int = 64, | |||
depth: int = 10, | |||
encoding_window: Optional[Union[Literal["multiscale"], int]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss possible solutions for this problem. For example, we could introduce in transform parameter like: encoding_params
or smth like this. Probably, there are also other solutions.
|
||
|
||
@pytest.mark.parametrize("embedding_model", [TS2VecEmbeddingModel(input_dims=3, n_epochs=1)]) | ||
def test_second_fit_not_update_state(ts_with_exog_nan_begin, embedding_model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what does this test? What state do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source implementation of ts2vec do not support 2nd fit. it will not do nothing.
by the way it is not good in my opinion (but there are some hacks how to overcome this restriction)
3 ways in this situation
- leave everything as it was
- remove this restriction
- write about hacks how to overcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What is a source implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in libs/ts2vec
TS2VecEmbeddingModel(input_dims=3, output_dims=3, n_epochs=1), | ||
], | ||
) | ||
def test_transform_new_segments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this test if there is a similar test in inference tests?
tests/test_transforms/test_embeddings/test_embedding_segment_transform.py
Show resolved
Hide resolved
tests/test_transforms/test_embeddings/test_embedding_window_transform.py
Show resolved
Hide resolved
@@ -17,6 +17,8 @@ | |||
from etna.transforms import DensityOutliersTransform | |||
from etna.transforms import DeseasonalityTransform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About failed inference tests:
- Check that two identical calls to encode models gives exactly the same result. May be we could set some random seed to make it more deterministic. If two subsequent calls give the same results it seems like the problem in procoessing different number of segments for some reason. Try to investigate it.
- If we can't fix the problem then we could increase the margins for checking equality. It seems like the results are very similar anyway.
tests/test_transforms/test_embeddings/test_embedding_window_transform.py
Outdated
Show resolved
Hide resolved
* Implement TS2VecModel (#253) * add ts2vec model * delete unnecessary utils * add multiscale mode * revert to common encode in model class * lints * reformat save method, add _is_fitted attr * fix embeddings shapes * fix * one more fix * pass numpy array to fit * add tests checking nans in embeddings * update changelog --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * Implement EmbeddingSegmentTransform and EmbeddingWindowTransform (#265) * add transforms * update changelog * fix ts2vec tests * fix * update rst, encoding_params * fix * fix * fix * fix docstring * add training_params * add freeze method * fix inference tests * lints * fix lisence * fix lisence, fix docs * fix quotes --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * Implement TSTCC (#294) * add tstcc * add einops package * remove pd.testing in inference tests * fix * add verbose param, refactor logging, fix warning * fix logging loss * add changelog * catch torch warning * fix * catch nn.Conv1d warning --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * lints * fix * Add tutorial how to work with embedding models (#304) * fix tstcc * move lr param from __init__ to fit * add tutorial * fix notebook * update changelog * fix changelog * lints * fix notebook * update readme * fix readme * fix readme * write comment in libs/ts2vec/ts2vec.py * fix notebook * remove multiscale option in ts2vec * lints * fix notebook --------- Co-authored-by: Egor Baturin <egoriyaa@github.com> * fix atol in inference tests * downgrade poetry --------- Co-authored-by: Egor Baturin <egoriyaa@github.com>
Before submitting (must do checklist)
Proposed Changes
Closing issues
closes #262