Skip to content

Interaction terms between categorical and numerical features #15263

@lorentzenchr

Description

@lorentzenchr

Description

While it is possible to create feature interactions with the same individual preprocessing in ColumnTransformer via PolynomialFeatures, I find no (convincing) solution for interactions of features with different individual preprocessing, e.g. categorical column with a continuous numerical column.
Such interactions might improve models from sklearn.linear_model.

Code Example

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder, PolynomialFeatures


df = pd.DataFrame({'a': ['red', 'red', 'blue', 'blue'],
                   'b': ['high', 'low', 'high', 'low'],
                   'x': [1, 1, 1, 2],
                   'y': [2, 3, 4, 2]
                  })

# interactions for features with no individual preprocessing works fine,
# i.e. numerical ones
column_trans = ColumnTransformer(
    [('xy_num',
      PolynomialFeatures(degree=2, interaction_only=True, include_bias=False),
      ['x', 'y'])],
     remainder='drop')
column_trans.fit_transform(df)

# interactions for ohe encoded also works with helper function
cat_cat = make_pipeline(
    OneHotEncoder(),
    PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
)
column_trans = ColumnTransformer(
    [('ab_cat', cat_cat, ['a', 'b'])],
     remainder='drop')
column_trans.fit_transform(df)

Expected Results

# no clue for interactions between one-hot-encoded 'a' and 'x' 
column_trans = ColumnTransformer(
    [('a_x',
      magic_pipeline(OneHotEncoder(), 'passthrough'),
      ['a', 'x'])],
     remainder='drop')

Versions

sklearn version 0.21

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions