PolynomialFeatures: allow user defined combinations of features #19533

dhimmel · 2021-02-23T14:44:51Z

As of v0.24.1, sklearn.preprocessing.PolynomialFeatures has three options that determine which combinations of features are generated:

degree: the maximum number of features to combine into a polynomial feature
interaction_only: filters out any combinations that include the same feature multiple times
include_bias: adds a column of ones

These are nice options, but are unable to capture every use case for generating polynomial feature combinations. For example, my data has 4 features: a, b, c, d. I want to transform these features into a, ab, ac, ad. I didn't see any way to achieve this will PolynomialFeatures directly, so instead I created a subsequent step in my pipeline to select a subset of columns from the PolynomialFeatures output.

Letting users specify any combination of features would be a general purpose solution. For example, I propose supporting something like:

# combinations by feature name (for situations when feature names are available)
PolynomialFeatures(combinations=[("a",), ("a", "b"), ("a", "c"), ("a", "d")])

# combinations by index
PolynomialFeatures(combinations=[(0,), (0, 1), (0, 2), (0, 3)])

# another set of combinations by index that isn't currently possible
PolynomialFeatures(combinations=[(0,) (1,), (0, 0), (1, 1, 1), (0, 1)])

Does this make sense? Is there some other way of generating custom combinations of polynomial features in a pipeline that I am overlooking?

The text was updated successfully, but these errors were encountered:

dhimmel · 2021-02-23T15:00:40Z

One challenge is the user might not always know what features exist at some intermediate stage of a pipeline where PolynomialFeatures is applied. Therefore, perhaps combinations could also be a function that takes a list of feature names (or even just the number of features) and returns an iterable of combinations. This would allow combinations to be determined dynamically based on the input features.

This is more complex, and shouldn't detract from solving the case where the user does know all the features of the input data and would like to provide specific static combinations.

ogrisel · 2021-02-23T16:03:49Z

Indeed specifying feature combinations based on names is probably safer than by position. But this would require to propagate feature metadata in transformers in a pipeline.

This is a use case to keep in mind for SLEP015 scikit-learn/enhancement_proposals#48

jnothman · 2021-02-23T22:40:05Z

This could also be achieved with FunctionTransformer.

dhimmel added the New Feature label Feb 23, 2021

cmarmo added the module:preprocessing label Feb 24, 2021

gaspard-dv mentioned this issue Mar 5, 2021

PolynomialFeatures always generates all combinations with degree less than the degree parameter #19627

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PolynomialFeatures: allow user defined combinations of features #19533

PolynomialFeatures: allow user defined combinations of features #19533

dhimmel commented Feb 23, 2021

dhimmel commented Feb 23, 2021

ogrisel commented Feb 23, 2021

jnothman commented Feb 23, 2021 via email

PolynomialFeatures: allow user defined combinations of features #19533

PolynomialFeatures: allow user defined combinations of features #19533

Comments

dhimmel commented Feb 23, 2021

dhimmel commented Feb 23, 2021

ogrisel commented Feb 23, 2021

jnothman commented Feb 23, 2021 via email