# Interaction Transformer

This notebook shows the functionality in the `InteractionTransformer` class. This transformer applys the `pd.DataFrame.product` method to the input `X`. <br>
This transformer means that interaction between columns are generated and degree of interaction can be specifically selected.



In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing

In [2]:
import tubular
from tubular.numeric import InteractionTransformer

In [3]:
tubular.__version__

'0.3.1'

## Load California housing dataset from sklearn

import os, ssl
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
    getattr(ssl, '_create_unverified_context', None)): 
    ssl._create_default_https_context = ssl._create_unverified_context

In [4]:
cali = fetch_california_housing()
cali_df = pd.DataFrame(cali["data"], columns=cali["feature_names"])
print(cali_df.shape)
cali_df.head()

(20640, 8)


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


## Simple usage    
### Initialising InteractionTransformer

The user can specify the following; <br>
- `columns` the columns in the `DataFrame` passed to the `transform` method to be transformed <br>
- `min_degree` the minimum degree of expected interaction (default value is 2) <br>
- `max_degre` the maximum degree of expected interaction (default value is 2) <br>


In [5]:
interaction_transformer = InteractionTransformer(
    columns=["HouseAge", "Population", "MedInc"], min_degree=2, max_degree=3
)

### InteractionTransformer fit
There is no fit method for the InteractionTransformer as the methods that it can run do not 'learn' anything from the data.

### InteractionTransformer transform
When running transform with this configuration new interaction columns are added to the input `X` which is the product of selected columns.

In [6]:
cali_df_2 = interaction_transformer.transform(cali_df)
cali_df_2.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,HouseAge Population,HouseAge MedInc,Population MedInc,HouseAge Population MedInc
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,13202.0,341.3332,2680.7144,109909.2904
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,50421.0,174.3294,19931.6614,418564.8894
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,25792.0,377.3848,3599.6704,187182.8608
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,29016.0,293.4412,3148.8498,163740.1896
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,29380.0,200.0024,2173.103,113001.356


## Automatically generated columns

In [7]:
auto_generated_name_transformer = InteractionTransformer(
    columns=["HouseAge", "Population", "MedInc"]
)

In [8]:
cali_df_3 = auto_generated_name_transformer.transform(cali_df)
cali_df_3.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,HouseAge Population,HouseAge MedInc,Population MedInc
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,13202.0,341.3332,2680.7144
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,50421.0,174.3294,19931.6614
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,25792.0,377.3848,3599.6704
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,29016.0,293.4412,3148.8498
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,29380.0,200.0024,2173.103


## Select degree of interaction

Only available on scikit-learn version 1.0>

In [9]:
interaction_deg_3_only_transformer = InteractionTransformer(
    columns=["HouseAge", "Population", "MedInc"], min_degree=3, max_degree=3
)

In [10]:
cali_df_4 = interaction_deg_3_only_transformer.transform(cali_df)
cali_df_4.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,HouseAge Population MedInc
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,109909.2904
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,418564.8894
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,187182.8608
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,163740.1896
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,113001.356
