Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC new example on feature engineering for cyclic time features #20281

Merged
merged 42 commits into from
Jul 26, 2021

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Jun 16, 2021

Here is a prototype example to explore some cyclic date-related feature engineering strategies being discussed in #20259.

This is not meant to be reviewed or merge in its current state. In particular I did not put any narrative yet but once we have converged on which models we want to highlight, we can turn this into a full fledged tutorial.

Update: I think this example is interesting to consider for merging irrespective of the outcome of the discussion in #20259..

@ogrisel ogrisel changed the title WIP cyclic feature engineering example DOC cyclic feature engineering example Jun 17, 2021
@ogrisel ogrisel marked this pull request as ready for review June 17, 2021 18:51
@ogrisel ogrisel force-pushed the cyclic_feature_engineering branch from 4f69c63 to e620c5f Compare June 17, 2021 21:27
@ogrisel
Copy link
Member Author

ogrisel commented Jun 17, 2021

@ogrisel ogrisel changed the title DOC cyclic feature engineering example DOC new example on feature engineering for cyclic time features Jun 18, 2021
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very nice and clearly written example tutorial!

cc @lorentzenchr

examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
# %%
# We observe that this model performance can almost rival the performance of
# the gradient boosted trees with an average error around 6% of the maximum
# demand.
Copy link
Member

@rth rth Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not asking to add it here, since it's already quite long but how does polynomial features without cyclic_spline_transformer perform?

Also I wonder if having KBinsDiscretizer would produce mostly equivalent results in terms or score, though with fewer artifacts in the prediction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively I think I would agree with you. Let me try on my local copy what you suggest out of curiosity.

Copy link
Member Author

@ogrisel ogrisel Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Polynomial Features or Polynomial kernel approximation on the raw time features does not work any better than the linear model on the raw time features.

  • Binning on the other hand only slightly worse than spline features but not by much (one or 2 percents than the matching model, with or without poly kernel approx). Here are the plots when binning:

binned-linear
binned+poly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for checking it!

@rth
Copy link
Member

rth commented Jun 18, 2021

It's probably too late to change in now in sphinx-gallery but plot_cyclical_feature_engineering doesn't really make much sense as a name for something like this. It's not even an example more a tutorial.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2021

It's probably too late to change in now in sphinx-gallery but plot_cyclical_feature_engineering doesn't really make much sense as a name for something like this. It's not even an example more a tutorial.

It's not too late at all. Where should we put this? What name for the file and the title do you suggest?

@ogrisel
Copy link
Member Author

ogrisel commented Jun 20, 2021

I added one-hot encoding of the time features because it's a natural strong baseline in this case and it makes for interesting analysis. I also started to reorganize the order a bit. I want to do it further (move the plot for the features + linear model before introducing the Nystroem kernel models).

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example only requiring ~ 15 seconds is very impressive given the scope.

examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
# %%
# We visualize those predictions by zooming on the last 96 hours (4 days) of
# the test set to get some qualitative insights:
plt.figure(figsize=(12, 4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we use the OO interface of matplotlib? (especially now that we have so many plots)

# Again we zoom on the last 4 days of the test set:

last_hours = slice(-96, None)
plt.figure(figsize=(12, 4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpl OO interface here as well?

Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome example!

np.linspace(0, 26, 1000).reshape(-1, 1),
columns=["hour"],
)
splines = periodic_spline_transformer(24, n_knots=12).fit_transform(hour_df)
Copy link
Member

@TomDLT TomDLT Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find non-intuitive to have 11 splines in the figure. I know this number is arbitrary, but to relate splines with bins, wouldn't it make more sense to have 12 splines (and 13 knots)?
Then in periodic_spline_transformer, the default would be n_knots = period + 1.

Copy link
Member Author

@ogrisel ogrisel Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what I can do. It makes me think that maybe the SplineTransformer should allow for n_splines and a period argument...

But that could make the parameters docstring very complex to understand. /cc @lorentzenchr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing the SplineTransformer, I thought the number of knots is more intuitive than the number of splines/dof. But I documented the numbers very clearly (and it is accesible via n_features_out_). I did not, however, think of periodic splines or a period. That came only a little later with @mlondschien.

A period argument, however, would make sense in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only suggesting to use periodic_spline_transformer(24, n_knots=13) in this example, to get 12 splines instead of 11. I agree the numbers are well documented in SplineTransformer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice writeup! Different to what you are writing above @ogrisel, the number of knots you are choosing are not natural. They are arbitrary. I would vary the period but keep the number of knots fixed for month / weekday / hour (e.g. 5?). If you use period + 1 knots, resulting in period splines, the resulting splines are equivalent as using one-hot-encoded features (assuming integer value features). This is why the performance of splines is so similar to one-hot-encoded features. To benefit from the additional "smoothness" from splines, you will need to reduce the number of splines. Note that you could use non-evenly spaced knots, e.g. via quantiles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to display the strengths of periodic splines, I would suggest to include interactions between periodic transformations of the time variables. For e.g. 4 knots this could be manageable, whereas this would explode for one-hot encoded features.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the example to control for the number of splines and made the number of knots a technical detail.

examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just excellent and a lot of fun to read!

examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved
@lorentzenchr
Copy link
Member

Already now: the best tutorial/example of the year. And we still have a year to go🥳 🍻

@rth
Copy link
Member

rth commented Jun 24, 2021

It's not too late at all. Where should we put this? What name for the file and the title do you suggest?

Maybe something like cyclical_feature_engineering_tutorial.html or cyclical_feature_engineering_example.html would be a better name ? And then we would need to change sphinx-gallery pattern matching to include those patterns.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 25, 2021

And then we would need to change sphinx-gallery pattern matching to include those patterns.

Agreed, but I would rather not change the sphinx gallery as part of this PR but instead coordinate via #18257.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 25, 2021

I think I addressed all the comments. Thanks for the reviews!

#
# Here, we do minimal ordinal encoding for the categorical variables and then
# let the model know that it should treat those as categorical variables by
# using a dedicated tree splitting rule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to mention that we explicitly provide the order of the categories to avoid automatic ordering based on lexicography.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

# %%
# This model has an average error around 4 to 5% of the maximum demand. This is
# quite good for a first trial without any hyper-parameter tuning! We just had
# to make the categorical variables explicit. Note that the time related
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you mentioned this point now. I was expecting to see it a bit earlier :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find more lightweight to do it this way.

@glemaitre glemaitre self-requested a review June 25, 2021 13:19

"""
==========================
Cyclic feature engineering
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should have something more related to "date-time encoding". I might think it might be easier to find than "cyclic" even if the title is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the title. The title and the filename no longer match though. I think it's ok but not 100% sure.

@mlondschien
Copy link
Contributor

This is a very nice tutorial!

Since I added the periodic feature to the SplineTransformer I thought I would weigh in:

  • As mentioned above, there is no benefit in using any transformer that preduces k - 1 features or more on a variable with k distinct values in a linear model. This also holds for the SplineTransformer, e.g. with include_intercept=False and n_knots=8 for weekday. The difference here is due to regularisation.
  • weekday and month do not take enough values to justify using splines. A better use of splines would be on dayoftheyear, where I would assume splines to outperform one-hot-encoded month. However this is not included in the data (and engineering it is probably out of the scope of this tutorial).
  • (periodic) splines allow for a smooth reduction of the number of features compared against one-hot-encoding. This is not necessary for month, weekday and hour, but might be valuable to construct interactions of these features. E.g. adding a hour and workday interaction with (only) 8 features reduces the MAE and RMSE of the one-hot and cyclic splines pipelines by ~25%:
from sklearn.preprocessing import PolynomialFeatures

hour_workday_interaction = make_pipeline(
    ColumnTransformer(
        [
            ("cyclic_hour", periodic_spline_transformer(24, n_splines=8), ["hour"]),
            ("workingday", FunctionTransformer(lambda x: x=="True"), ["workingday"]),
        ]
    ), PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
)
  • I don't think that OLS (or ridge) is a good fit here. A log-link GLM (e.g. GridSearchCV(TweedieRegressor(power=2), param_grid({"alpha": alphas})) is probably a better fit (the gamma performs better). Again I assume that this is out of scope for this tutorial.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 25, 2021

I don't think that OLS (or ridge) is a good fit here. A log-link GLM (e.g. GridSearchCV(TweedieRegressor(power=2), param_grid({"alpha": alphas})) is probably a better fit (the gamma performs better). Again I assume that this is out of scope for this tutorial.

That's a good point. But the execution speed won't be the same. Maybe I will add a note.

I will think a bit how to take your other insightful remarks into account.

@ogrisel ogrisel force-pushed the cyclic_feature_engineering branch from 26c7b6f to a51a667 Compare July 18, 2021 16:04
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nitpicks

@ogrisel
Copy link
Member Author

ogrisel commented Jul 18, 2021

@mlondschien I have updated the notebook to take your remarks into account in the last few commits.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 19, 2021

@mlondschien we have a problem with the periodic splines:

tmp

There are 12 splines for a period of 24 as expected on the figure. The periodic signal seems to start again as expected at the right location to ensure the continuity. But it seems that we have a missing spline near the end of the period. However the number of splines (aka output features was good).

This seems to be caused using include_bias=False. Using include_bias=True and setting n_knots = n_splines + 1 seems to fix the problem:

tmp2

See the commit below:

@mlondschien
Copy link
Contributor

What is the use-case / expected behaviour here? Are you interested in producing pretty plots or features for modelling?

I see three (non-compatible) outcomes here (i.e. for a periodicity of 24):

  • knots that are spaced two hours apart
  • 12 splines
  • include_bias=False

If you want 12 splines with knots that are spaced two hours apart, you need to pass n_knots=13, knots="uniform" (or equivalently knots=np.linspace(0, 24, 13).reshape(-1, 1)) and include_bias=True. The 12 splines will sum to one (include_bias=True), so using them in a non-regularized model is discouraged. This is what you have implemented in 26051c4 via periodic_spline_transformer(24, 12). If you plot the resulting splines, you should get the second figure you posted.

If you want splines with knots that are spaced two hours apart without an intercept, you need to pass n_knots=13, knots="uniform" (or equivalently knots=np.linspace(0, 24, 13).reshape(-1, 1)) and include_bias=False. If you plot the resulting splines, it will appear as if one was missing. To get this you have to revert 26051c4 and pass periodic_spline_transformer(24, 11). I imagine that this is what you might want.

If you want 12 splines without an intercept, you need to pass n_knots=14, knots="uniform" (or equivalently knots=np.linspace(0, 24, 14).reshape(-1, 1)) and include_bias=False. This is what you got before 26051c4 with periodic_spline_transformer(24, 12). The knots will be 24 / 13 ~ 1.85 apart. The result is first figure you posted. I doubt that this is what you intended.

@lorentzenchr
Copy link
Member

As we always use an L2 penalty (Ridge regression), we can set include_bias=True.
As @mlondschien already pointed out, include_bias=True means that the sum over all spline basis functions gives one, in every point. Otherwise said, a linear combination of the splines gives an intercept column.

@lorentzenchr
Copy link
Member

BTW, I reeeeeeeally like the addition of interactions in 8d066f3!

@ogrisel
Copy link
Member Author

ogrisel commented Jul 19, 2021

If you want 12 splines with knots that are spaced two hours apart, you need to pass n_knots=13, knots="uniform" (or equivalently knots=np.linspace(0, 24, 13).reshape(-1, 1)) and include_bias=True.

I agree. Since we use regularization, I think this is what makes most sense for this example: we want a symmetric handling of all the hours. I prefer to not use knots="uniform" to control the period explicitly.

@ogrisel ogrisel merged commit 45bb9ab into scikit-learn:main Jul 26, 2021
@ogrisel ogrisel deleted the cyclic_feature_engineering branch July 26, 2021 09:02
@ogrisel
Copy link
Member Author

ogrisel commented Jul 26, 2021

I merged this. Thank you everyone for the detailed reviews.

@apachaves
Copy link

This looks awesome! Thank you for this.

@glemaitre
Copy link
Member

Yeah indeed 🥇

TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Jul 29, 2021
…it-learn#20281)



Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
…it-learn#20281)



Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants