# BaseTransformer
This notebook shows the functionality included in the BaseTransformer class. This is the base class for the package and all other transformers within should inherit from it. This means that the functionality below is also present in the other transformers in the package. <br>
This is more 'behind the scenes' functionality that is useful to be aware of, but not the actual transformations required before building / predicting with models. <br>
Examples of the actual pre processing transformations can be found in the other notebooks in this folder.

In [1]:
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.base import BaseTransformer

In [3]:
tubular.__version__

'0.2.8'

## Load Boston house price dataset from sklearn
Note, the load_boston script modifies the original Boston dataset to include nulls values and pandas categorical dtypes.

In [4]:
boston_df = tubular.testing.test_data.prepare_boston_df()
boston_df.shape

(506, 17)

In [5]:
boston_df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target,ZN_cat,CHAS_cat,RAD_cat
0,0.00632,18.0,2.31,0.0,0.538,6.575,,4.09,,296.0,15.3,396.9,4.98,24.0,18.0,0.0,
1,0.02731,,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6,,0.0,2.0
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,,17.8,392.83,4.03,34.7,0.0,0.0,2.0
3,,,2.18,0.0,0.458,,45.8,6.0622,3.0,222.0,18.7,,,33.4,,0.0,3.0
4,0.06905,0.0,2.18,0.0,0.458,,,6.0622,3.0,222.0,18.7,396.9,5.33,36.2,0.0,0.0,3.0


In [6]:
boston_df.dtypes

CRIM         float64
ZN            object
INDUS        float64
CHAS          object
NOX          float64
RM           float64
AGE          float64
DIS          float64
RAD           object
TAX          float64
PTRATIO      float64
B            float64
LSTAT        float64
target       float64
ZN_cat      category
CHAS_cat    category
RAD_cat     category
dtype: object

## Initialising BaseTransformer
### Not setting columns 
Columns do not have to be specified when initialising BaseTransformer objects. Both the fit and transform methods call the columns_set_or_check to ensure that columns is set before the transformer has to do any work.

In [7]:
base_1 = BaseTransformer(copy = True, verbose = True)

BaseTransformer.__init__() called


## BaseTransformer fit
Not all transformers in the package will implement a fit method, if the user directly specifies the values the transformer needs e.g. passes the impute value, there is no need for it.
### Setting columns in fit
If the columns attribute is not set when fit is called, columns_set_or_check will set columns to be all columns in X.

In [8]:
base_1.columns is None

True

In [9]:
base_1.fit(boston_df)

BaseTransformer.fit() called


BaseTransformer(columns=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE',
                         'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'target',
                         'ZN_cat', 'CHAS_cat', 'RAD_cat'],
                copy=True, verbose=True)

In [10]:
base_1.columns

['CRIM',
 'ZN',
 'INDUS',
 'CHAS',
 'NOX',
 'RM',
 'AGE',
 'DIS',
 'RAD',
 'TAX',
 'PTRATIO',
 'B',
 'LSTAT',
 'target',
 'ZN_cat',
 'CHAS_cat',
 'RAD_cat']

## BaseTransformer transform
All transformers will implement a transform method.
### Transform with copy
This ensures that the input dataset is not modified in transform.

In [11]:
boston_df_2 = base_1.transform(boston_df)

BaseTransformer.transform() called


In [12]:
pd.testing.assert_frame_equal(boston_df_2, boston_df)

In [13]:
boston_df_2 is boston_df

False

### Transform without copy
This can be useful if you are working with a large dataset or are concerned about the time to copy.

In [14]:
base_2 = BaseTransformer(copy = False, verbose = True)

BaseTransformer.__init__() called


In [15]:
boston_df_3 = base_2.fit_transform(boston_df)

BaseTransformer.fit() called
BaseTransformer.transform() called


In [16]:
boston_df_3 is boston_df

True