New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add keep_original_columns
option to FourierFeatures
trafo
#4008
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, really nice! Deprecation by the book.
@ltsaprounis is listed as owner so would have to approve before we merge.
Whilst we're on this topic, I have a question regarding what the behaviour of this transformer should be for panel data. The Here is an example: import numpy as np
import pandas as pd
from sktime.transformations.series.fourier import FourierFeatures
# Panel dataframe with two timeseries
# Note there are entries in the two time series which have the same date.
df = pd.DataFrame(data={'c0':
{('h0_0', pd.to_datetime('2000-01-01 00:00:00')): 2.1482432004268444,
('h0_0', pd.to_datetime('2000-01-02 00:00:00')): 1.5974533778936917,
('h0_0', pd.to_datetime('2000-02-01 00:00:00')): 2.5374903855394737,
('h0_1', pd.to_datetime('2000-02-01 00:00:00')): 1.2991497904155045,
('h0_1', pd.to_datetime('2000-02-02 00:00:00')): 1.920059660128214,
('h0_1', pd.to_datetime('2000-02-03 00:00:00')): 1.0}})
display(df)
c0
h0_0 2000-01-01 2.1482432004268444
h0_0 2000-01-02 1.5974533778936917
h0_0 2000-02-01 2.5374903855394737
h0_1 2000-02-01 1.2991497904155045
h0_1 2000-02-02 1.920059660128214
h0_1 2000-02-03 1.0
# Fit the transformer and create fourier features
trafo = FourierFeatures(sp_list=[7], fourier_terms_list=[1], freq="D")
trafo.fit(df)
result = trafo.transform(df)
display(result)
c0 sin_7_1 cos_7_1
h0_0 2000-01-01 2.148 0.0 1.0
h0_0 2000-01-02 1.597 0.782 0.623
h0_0 2000-02-01 * 2.537 0.434 -0.901
h0_1 2000-02-01 * 1.299 0.0 1.0
h0_1 2000-02-02 1.92 0.782 0.623
h0_1 2000-02-03 1.0 0.975 -0.223 The rows with the same time points (marked with an *) have different feature values. @ltsaprounis - do you know if this is problematic or not? If it is, a simple solution would be to use a single reference time (i.e., the earliest time of all the time series) when computing the integer time index to pass to the sine and cosine functions. However, this would be modelling a shared seasonal component across all the multiple time series. What do you think? |
@KishManani, it could simply be an option: shared time axis (reference point possibly being the earliest time stamp occurring), or individual time axis, which is also the current (and recommended future?) default |
This is a very pragmatic way forward. I'll wait for @ltsaprounis to comment and leave this for a separate PR. |
Hi @fkiraly! Any idea of whether this can be merged? |
it's been a month now without reaction, and this moves an existing transformer towards an interface desideratum (do not keep columns). There is also deprecation which makes this easily reversible should discussion come up. |
Reference Issues/PRs
See discussion here: #3996 (comment)
What does this implement/fix? Explain your changes.
This PR makes the following changes:
keep_original_columns
option and sets the default toTrue
with a deprecation warning that in the future this should be set toFalse
. This will make it easier to useFourierFeatures
in the context of a pipeline (or FeatureUnion) where we do not wish to passthrough the other columns ofX
.keep_original_columns
as an argument.X
to make this more memory efficient.What should a reviewer concentrate their feedback on?
X
here? @ltsaprounis - would be great to hear your thoughts!Any other comments?
For additional context around the impact of removing copies of
X
see the discussion here: #3996 (comment).PR checklist
For all contributions