You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As of v0.24.1, sklearn.preprocessing.PolynomialFeatureshas three options that determine which combinations of features are generated:
degree: the maximum number of features to combine into a polynomial feature
interaction_only: filters out any combinations that include the same feature multiple times
include_bias: adds a column of ones
These are nice options, but are unable to capture every use case for generating polynomial feature combinations. For example, my data has 4 features: a, b, c, d. I want to transform these features into a, ab, ac, ad. I didn't see any way to achieve this will PolynomialFeatures directly, so instead I created a subsequent step in my pipeline to select a subset of columns from the PolynomialFeatures output.
Letting users specify any combination of features would be a general purpose solution. For example, I propose supporting something like:
# combinations by feature name (for situations when feature names are available)PolynomialFeatures(combinations=[("a",), ("a", "b"), ("a", "c"), ("a", "d")])
# combinations by indexPolynomialFeatures(combinations=[(0,), (0, 1), (0, 2), (0, 3)])
# another set of combinations by index that isn't currently possiblePolynomialFeatures(combinations=[(0,) (1,), (0, 0), (1, 1, 1), (0, 1)])
Does this make sense? Is there some other way of generating custom combinations of polynomial features in a pipeline that I am overlooking?
The text was updated successfully, but these errors were encountered:
One challenge is the user might not always know what features exist at some intermediate stage of a pipeline where PolynomialFeatures is applied. Therefore, perhaps combinations could also be a function that takes a list of feature names (or even just the number of features) and returns an iterable of combinations. This would allow combinations to be determined dynamically based on the input features.
This is more complex, and shouldn't detract from solving the case where the user does know all the features of the input data and would like to provide specific static combinations.
Indeed specifying feature combinations based on names is probably safer than by position. But this would require to propagate feature metadata in transformers in a pipeline.
As of v0.24.1,
sklearn.preprocessing.PolynomialFeatures
has three options that determine which combinations of features are generated:degree
: the maximum number of features to combine into a polynomial featureinteraction_only
: filters out any combinations that include the same feature multiple timesinclude_bias
: adds a column of onesThese are nice options, but are unable to capture every use case for generating polynomial feature combinations. For example, my data has 4 features: a, b, c, d. I want to transform these features into a, ab, ac, ad. I didn't see any way to achieve this will
PolynomialFeatures
directly, so instead I created a subsequent step in my pipeline to select a subset of columns from thePolynomialFeatures
output.Letting users specify any combination of features would be a general purpose solution. For example, I propose supporting something like:
Does this make sense? Is there some other way of generating custom combinations of polynomial features in a pipeline that I am overlooking?
The text was updated successfully, but these errors were encountered: