# TwoColumnOperatorTransformer
This notebook shows the functionality in the TwoColumnOperatorTransformer class. This transformer applies pandas dataframe methods that involve combining two columns under the action of some operator. Examples are shown here for addition and modulo. <br>

In [1]:
import tubular
from tubular.numeric import TwoColumnOperatorTransformer
from sklearn.datasets import fetch_california_housing

import pandas as pd

In [2]:
tubular.__version__

'0.3.3'

## Load Boston house price dataset from sklearn
Note, the load_boston script modifies the original Boston dataset to include nulls values and pandas categorical dtypes.

In [3]:
cali = fetch_california_housing()
cali_df = pd.DataFrame(cali["data"], columns=cali["feature_names"])
print(cali_df.shape)
cali_df.head()

(20640, 8)


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


## Examples

The transformer assigns the output of the method to a new column. The method will be applied in the form (column 1)operator(column 2), so order matters (if the method does not commute). It is possible to supply other key word arguments to the transform method, which will be passed to the pandas.DataFrame method being called.

The minimal arguments to initialise the transformer are given below. More can be found in the class documentation.
- `pd_method_name` The name of the pandas dataframe method to apply
- `column1_name` The name of the 1st column in the operation.
- `column2_name` The name of the 2nd column in the operation.
- `new_column_name` The name of the new column.


### Addition

In [4]:
addition = TwoColumnOperatorTransformer(
    "add", ["Latitude", "Longitude"], "Latitude + Longitude"
)

In [5]:
cali_df_2 = addition.transform(cali_df)

In [6]:
cali_df_2[["Latitude", "Longitude", "Latitude + Longitude"]].head()

Unnamed: 0,Latitude,Longitude,Latitude + Longitude
0,37.88,-122.23,-84.35
1,37.86,-122.22,-84.36
2,37.85,-122.24,-84.39
3,37.85,-122.25,-84.4
4,37.85,-122.25,-84.4


### Modulo

In [7]:
modulo = TwoColumnOperatorTransformer(
    "mod", ["Population", "HouseAge"], "HouseAge mod Population"
)

In [8]:
cali_df_3 = modulo.transform(cali_df)

In [9]:
cali_df_3[["Population", "HouseAge", "HouseAge mod Population"]].head()

Unnamed: 0,Population,HouseAge,HouseAge mod Population
0,322.0,41.0,35.0
1,2401.0,21.0,7.0
2,496.0,52.0,28.0
3,558.0,52.0,38.0
4,565.0,52.0,45.0
