# BaseTransformer
This notebook shows the functionality included in the BaseTransformer class. This is the base class for the package and all other transformers within should inherit from it. This means that the functionality below is also present in the other transformers in the package. <br>
This is more 'behind the scenes' functionality that is useful to be aware of, but not the actual transformations required before building / predicting with models. <br>
Examples of the actual pre processing transformations can be found in the other notebooks in this folder.

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing

In [2]:
import tubular
from tubular.base import BaseTransformer

In [3]:
tubular.__version__

'1.1.1'

## Load California housing dataset from sklearn

In [4]:
cali = fetch_california_housing()
cali_df = pd.DataFrame(cali["data"], columns=cali["feature_names"])

In [5]:
cali_df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [6]:
cali_df.dtypes

MedInc        float64
HouseAge      float64
AveRooms      float64
AveBedrms     float64
Population    float64
AveOccup      float64
Latitude      float64
Longitude     float64
dtype: object

## Initialising BaseTransformer


In [7]:
base_1 = BaseTransformer(columns="HouseAge", copy=True, verbose=True)

BaseTransformer.__init__() called


## BaseTransformer fit
Not all transformers in the package will implement a fit method, if the user directly specifies the values the transformer needs e.g. passes the impute value, there is no need for it.


## BaseTransformer transform
All transformers will implement a transform method.
### Transform with copy
This ensures that the input dataset is not modified in transform.

In [8]:
cali_df_2 = base_1.transform(cali_df)

BaseTransformer.transform() called


In [9]:
pd.testing.assert_frame_equal(cali_df_2, cali_df)

In [10]:
cali_df_2 is cali_df

False

### Transform without copy
This can be useful if you are working with a large dataset or are concerned about the time to copy.

In [11]:
base_2 = BaseTransformer(copy=False, verbose=True)

BaseTransformer.__init__() called


In [12]:
cali_df_3 = base_2.fit_transform(cali_df)

BaseTransformer.fit() called
BaseTransformer.transform() called


In [13]:
cali_df_3 is cali_df

True