Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add hour_of_week option to DateTimeFeatures transformer #4724

Merged
merged 12 commits into from Jul 6, 2023

Conversation

VyomkeshVyas
Copy link
Contributor

@VyomkeshVyas VyomkeshVyas commented Jun 20, 2023

What does this implement/fix? Explain your changes.

This PR adds hour_of_week as an option in the manual_section argument in the DateTimeFeatures transformer.
hour_of_week could be used as an alternate to the hour_of_day and day_of_week combined.

Example usage :

import pandas as pd
from sktime.transformations.series.date import DateTimeFeatures

y = pd.DataFrame(
    data={"y": range(6)},
    index=pd.date_range(start="2023-01-01", freq="H", periods=6),
) 
transformer = DateTimeFeatures(
    manual_selection=["hour_of_week"], keep_original_columns=True
)

yt = transformer.fit_transform(y)

What should a reviewer concentrate their feedback on?

Around three lines have been added in the existing code to calculate the hour_of_week (under the comprehensive feature_scope), aligning with the current code structure.
I'm not entirely sure about the utility of fourier column in the DUMMIES dataframe (computed in _prep_dummies function) and its role in calculating the final datetime features.

Did you add any tests for the change?

I've added a unit test in the existing tests test_date.py.

PR checklist

For all contributions
  • I've added a unit test and made sure it pass locally.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 20, 2023

Great contribution!

We should check with @KishManani (current owner of the estimator) whether this addition is fine.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Remarks only from maintenance perspective:

  • @KishManani is the current owner so should review and approve
  • Adding this to the "comprehensive" feature set directly will return additional columns to users currently using that pre-defined feature set, so it may break code somewhere. We will hence need to follow deprecation policy, earliest we can switch it in is 0.21.0. Can we add it and not have it part of any of the pre-defined feature sets for now, and add a warning that this will expand in 0.21.0? Alternatively, just not add it at all.

Copy link
Contributor

@KishManani KishManani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Out of curiosity, have you found this feature useful in practice? What was the use case that drove you to use it?

I'm wondering where we want to draw the line in terms of adding more features in here (e.g., in theory we could add minute of week, second of week, etc.) but these are unlikely to be helpful I would have thought.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 20, 2023

Great! We now just have to deal with the deprecation/change (or not putting it in the feature set).

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 21, 2023

@KishManani, @VyomkeshVyas, does on of you perchance know whether a feature can be added that does not belong to any of the pre-defined feature sets?

For change policy, the easiest options would be to:

  • keep it as not part of a feature set (if it is possible to have it outside)
  • or raise a warning whenever "comprehensive" is selected at init, that from 0.22.0 it will contain this new feature.

@VyomkeshVyas
Copy link
Contributor Author

LGTM.

Out of curiosity, have you found this feature useful in practice? What was the use case that drove you to use it?

I'm wondering where we want to draw the line in terms of adding more features in here (e.g., in theory we could add minute of week, second of week, etc.) but these are unlikely to be helpful I would have thought.

I came across this idea while working on a price forecasting task for data having hourly granularity. I can't generalize the results but, with a simple linear regression model there was a slight increase (0.10) in RMSE by replacing hour_of_day and day_of_week feature with hour_of_week, while a slight decrease (0.06) in RMSE with XG Boost model.

I think theoretically there are many possible combinations of generating features and it wouldn't be worthwhile to include every single of those, but as suggested by @fkiraly, Seasonal Remainder feature extractor is a good option, a user will be able to create any possible combination of features as per the requirement.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 25, 2023

@VyomkeshVyas, have you thought about your preference regarding the warning/deprecation procedure?

@fkiraly fkiraly added module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing enhancement Adding new functionality labels Jun 25, 2023
@VyomkeshVyas
Copy link
Contributor Author

VyomkeshVyas commented Jun 26, 2023

@VyomkeshVyas, have you thought about your preference regarding the warning/deprecation procedure?

@fkiraly I think "raise a warning whenever "comprehensive" is selected at init, that from 0.22.0 it will contain this new feature" is a good option as keeping the feature outside the pre-defined feature sets will itself create a new feature set with just "hour_of_week" as feature, which might not make sense to a user.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 28, 2023

agreed - do you want me to add the warnings, or is it clear where/how?

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 4, 2023

@VyomkeshVyas, do you want me to add the warnings, or do you want to do this?

This is nearly done, let's try to get it into 0.20.1! Happy either way, just let us know.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Good to go now.

(FYI, I added a reminder comment for 0.22.0)

@fkiraly fkiraly merged commit 36695db into sktime:main Jul 6, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants