# Scikit Learn custom transformers

Custom transformers manipulate the input DataFrame. You can decide whether to apply the changes directly to the original object or keep it intact and return a transformation of the copy of the original DataFrame.

## Generate example DataFrames

In [1]:
import pandas as pd

In [2]:
df1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
df1

Unnamed: 0,a,b,c
0,1,4,7
1,2,5,8
2,3,6,9


In [3]:
df2 = pd.DataFrame({"x": [7, 8, 9], "y": [1, 2, 3], "z": [4, 5, 6]})
df2

Unnamed: 0,x,y,z
0,7,1,4
1,8,2,5
2,9,3,6


## Define two custom transformer

In [4]:
from sklearn.base import BaseEstimator, TransformerMixin


class ByReference(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        X["new_column"] = X["a"] / X["b"]
        return X


class ByCopy(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        X_copy = X.copy()
        X_copy["new_column"] = X_copy["y"] / X_copy["x"]
        return X_copy

## Transformation by reference

In [5]:
by_ref = ByReference()
df_by_ref = by_ref.fit_transform(df1)

In [6]:
df1

Unnamed: 0,a,b,c,new_column
0,1,4,7,0.25
1,2,5,8,0.4
2,3,6,9,0.5


In [7]:
df_by_ref

Unnamed: 0,a,b,c,new_column
0,1,4,7,0.25
1,2,5,8,0.4
2,3,6,9,0.5


## Transformation by copy

In [8]:
by_copy = ByCopy()
df_by_copy = by_copy.fit_transform(df2)

In [9]:
df2

Unnamed: 0,x,y,z
0,7,1,4
1,8,2,5
2,9,3,6


In [10]:
df_by_copy

Unnamed: 0,x,y,z,new_column
0,7,1,4,0.142857
1,8,2,5,0.25
2,9,3,6,0.333333


## Conclusion

The `df1` DataFrame is passed to the `ByReference` transformer, which directly manipulates the original object in the `transform` method. The `df1` DataFrame after the transformation is equal to the `df_by_ref` DataFrame returned by the transformer.

The `df2` DataFrame instead is passed to the `ByCopy` transformer which creates a copy of the original dataframe before manipulating the data. The returned object will be a deep copy of the original object on which the transformation will be applied.