# CrossColumnMultiplyTransformer
This notebook shows the functionality in the CrossColumnMultiplyTransformer class. This transformer changes the values of one column via a multiplicative adjustment, based on the values in other columns. <br>

In [1]:
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.mapping import CrossColumnMultiplyTransformer

In [3]:
tubular.__version__

'0.3.0'

## Create dummy dataset

In [4]:
df = pd.DataFrame(
    {
        "factor1": [
            np.nan,
            "1.0",
            "2.0",
            "1.0",
            "3.0",
            "3.0",
            "2.0",
            "2.0",
            "1.0",
            "3.0",
        ],
        "factor2": ["z", "z", "x", "y", "x", "x", "z", "y", "x", "y"],
        "target": [18.5, 21.2, 33.2, 53.3, 24.7, 19.2, 31.7, 42.0, 25.7, 33.9],
        "target_int": [2, 1, 3, 4, 5, 6, 5, 8, 9, 8],
    }
)

In [5]:
df.head()

Unnamed: 0,factor1,factor2,target,target_int
0,,z,18.5,2
1,1.0,z,21.2,1
2,2.0,x,33.2,3
3,1.0,y,53.3,4
4,3.0,x,24.7,5


In [6]:
df.dtypes

factor1        object
factor2        object
target        float64
target_int      int64
dtype: object

## Simple usage

### Initialising CrossColumnMultiplyTransformer

The user must pass in a dict of mappings, each item within must be a dict of mappings for a specific column. <br>
The column to be adjusted is also specified by the user. <br>
As shown below, if not all values of a column are required to define mappings, then these can be excluded from the dictionary. <br>
All multiplicative adjustments defined must be numeric (int or float)

In [7]:
mappings = {
    "factor1": {
        "1.0": 1.1,
        "2.0": 0.5,
        "3.0": 4,
    }
}

adjust_column = "target"

In [8]:
map_1 = CrossColumnMultiplyTransformer(
    adjust_column=adjust_column, mappings=mappings, copy=True, verbose=True
)

BaseTransformer.__init__() called


### CrossColumnMultiplyTransformer fit
There is not fit method for the CrossColumnMultiplyTransformer as the user sets the mappings dictionary when initialising the object.

### CrossColumnMultiplyTransformer transform
Only one column mappings was specified when creating map_1 so only this column will be all be used to adjust the value of the adjust_column when the transform method is run.

In [9]:
df[["factor1", "target"]].head(10)

Unnamed: 0,factor1,target
0,,18.5
1,1.0,21.2
2,2.0,33.2
3,1.0,53.3
4,3.0,24.7
5,3.0,19.2
6,2.0,31.7
7,2.0,42.0
8,1.0,25.7
9,3.0,33.9


In [10]:
df[df["factor1"].isin(["1.0", "2.0", "3.0"])]["target"].groupby(df["factor1"]).mean()

factor1
1.0    33.400000
2.0    35.633333
3.0    25.933333
Name: target, dtype: float64

In [11]:
df_2 = map_1.transform(df)

BaseTransformer.transform() called


In [12]:
df_2[["factor1", "target"]].head(10)

Unnamed: 0,factor1,target
0,,18.5
1,1.0,23.32
2,2.0,16.6
3,1.0,58.63
4,3.0,98.8
5,3.0,76.8
6,2.0,15.85
7,2.0,21.0
8,1.0,28.27
9,3.0,135.6


In [13]:
df_2[df_2["factor1"].isin(["1.0", "2.0", "3.0"])]["target"].groupby(
    df_2["factor1"]
).mean()

factor1
1.0     36.740000
2.0     17.816667
3.0    103.733333
Name: target, dtype: float64

## Column dtype conversion
If all the column to be multiplied has dtype int, but the multipliers specified are non-integer, then the column will be converted to a float dtype. 

In [14]:
mappings_2 = {
    "factor1": {
        "1.0": 1.1,
        "2.0": 0.5,
        "3.0": 4,
    }
}

adjust_column_2 = "target_int"

In [15]:
map_2 = CrossColumnMultiplyTransformer(
    adjust_column=adjust_column_2, mappings=mappings_2, copy=True, verbose=True
)

BaseTransformer.__init__() called


In [16]:
df["target_int"].dtype

dtype('int64')

In [17]:
df["target_int"].value_counts(dropna=False)

5    2
8    2
2    1
1    1
3    1
4    1
6    1
9    1
Name: target_int, dtype: int64

In [18]:
df_3 = map_2.transform(df)

BaseTransformer.transform() called


In [19]:
df_3["target"].dtype

dtype('float64')

In [20]:
df_3["target"].value_counts(dropna=False)

18.5    1
21.2    1
33.2    1
53.3    1
24.7    1
19.2    1
31.7    1
42.0    1
25.7    1
33.9    1
Name: target, dtype: int64

# Specifying multiple columns

If more than one column is used to define the mappings, then as multiplication is a commutative operation it does not matter which order the multipliers are applied in.

In [21]:
mappings_4 = {
    "factor1": {
        "1.0": 1.1,
        "2.0": 0.5,
        "3.0": 4,
    },
    "factor2": {
        "x": 6,
    },
}

adjust_column_4 = "target"

In [22]:
map_4 = CrossColumnMultiplyTransformer(
    adjust_column=adjust_column_4, mappings=mappings_4, copy=True, verbose=True
)

BaseTransformer.__init__() called


In [23]:
df[["factor1", "factor2", "target"]].head()

Unnamed: 0,factor1,factor2,target
0,,z,18.5
1,1.0,z,21.2
2,2.0,x,33.2
3,1.0,y,53.3
4,3.0,x,24.7


In the above example, the target would only be adjusted for row 1 by a factor of 1.1 (as factor1 = '1.0'), whereas row 2 would be adjusted by a factor of 3 (factor1 = '2.0' means a multiplier of 0.5 and factor2 = 'x' giving a multiplier of 6, 0.5 x 6 = 3)

In [24]:
df_5 = map_4.transform(df)

BaseTransformer.transform() called


In [25]:
df_5[["factor1", "factor2", "target"]].head()

Unnamed: 0,factor1,factor2,target
0,,z,18.5
1,1.0,z,23.32
2,2.0,x,99.6
3,1.0,y,58.63
4,3.0,x,592.8
