New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add hour_of_week
option to DateTimeFeatures
transformer
#4724
Conversation
…into dtime-hr-week
Great contribution! We should check with @KishManani (current owner of the estimator) whether this addition is fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Remarks only from maintenance perspective:
- @KishManani is the current owner so should review and approve
- Adding this to the
"comprehensive"
feature set directly will return additional columns to users currently using that pre-defined feature set, so it may break code somewhere. We will hence need to follow deprecation policy, earliest we can switch it in is 0.21.0. Can we add it and not have it part of any of the pre-defined feature sets for now, and add a warning that this will expand in 0.21.0? Alternatively, just not add it at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Out of curiosity, have you found this feature useful in practice? What was the use case that drove you to use it?
I'm wondering where we want to draw the line in terms of adding more features in here (e.g., in theory we could add minute of week, second of week, etc.) but these are unlikely to be helpful I would have thought.
Great! We now just have to deal with the deprecation/change (or not putting it in the feature set). |
@KishManani, @VyomkeshVyas, does on of you perchance know whether a feature can be added that does not belong to any of the pre-defined feature sets? For change policy, the easiest options would be to:
|
I came across this idea while working on a price forecasting task for data having hourly granularity. I can't generalize the results but, with a simple linear regression model there was a slight increase (0.10) in RMSE by replacing hour_of_day and day_of_week feature with hour_of_week, while a slight decrease (0.06) in RMSE with XG Boost model. I think theoretically there are many possible combinations of generating features and it wouldn't be worthwhile to include every single of those, but as suggested by @fkiraly, Seasonal Remainder feature extractor is a good option, a user will be able to create any possible combination of features as per the requirement. |
@VyomkeshVyas, have you thought about your preference regarding the warning/deprecation procedure? |
@fkiraly I think "raise a warning whenever "comprehensive" is selected at init, that from 0.22.0 it will contain this new feature" is a good option as keeping the feature outside the pre-defined feature sets will itself create a new feature set with just "hour_of_week" as feature, which might not make sense to a user. |
agreed - do you want me to add the warnings, or is it clear where/how? |
@VyomkeshVyas, do you want me to add the warnings, or do you want to do this? This is nearly done, let's try to get it into 0.20.1! Happy either way, just let us know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Good to go now.
(FYI, I added a reminder comment for 0.22.0)
What does this implement/fix? Explain your changes.
This PR adds
hour_of_week
as an option in themanual_section
argument in theDateTimeFeatures
transformer.hour_of_week
could be used as an alternate to thehour_of_day
andday_of_week
combined.Example usage :
What should a reviewer concentrate their feedback on?
Around three lines have been added in the existing code to calculate the
hour_of_week
(under thecomprehensive
feature_scope
), aligning with the current code structure.I'm not entirely sure about the utility of
fourier
column in theDUMMIES
dataframe (computed in_prep_dummies
function) and its role in calculating the final datetime features.Did you add any tests for the change?
I've added a unit test in the existing tests
test_date.py
.PR checklist
For all contributions