# Hand-made Standardizer

👇 Consider the following data and the transformations the `StandardScaler` does to it.

In [1]:
import pandas as pd
data = pd.read_csv("data.csv")
data

Unnamed: 0,one,two,three
0,1,2,3
1,2,3,4
2,3,4,5


In [2]:
from sklearn.preprocessing import StandardScaler

StandardScaler().fit_transform(data)

array([[-1.22474487, -1.22474487, -1.22474487],
       [ 0.        ,  0.        ,  0.        ],
       [ 1.22474487,  1.22474487,  1.22474487]])

👇 Recode the `StandardScaler` by hand into a **class object**. As a reminder, standardization consists of subtracting the mean of the feature from the data point and dividing by the standard deviation of the feature.

$$
\Large
z=\frac{(x-\mu )}{s}
$$

<details>
<summary>💡 Hint</summary>

This stackoverflow [post](https://stackoverflow.com/questions/44220290/sklearn-standardscaler-result-different-to-manual-result) on pandas' `std()` function might help 😉
      
</details>





In [3]:
from sklearn.base import TransformerMixin
from sklearn.base import BaseEstimator
# Bonus: allow us to raise a NotFittedError when one call the transform method before fitting the instance
from sklearn.exceptions import NotFittedError

# Create a class
class CustomStandardizer(TransformerMixin,BaseEstimator):
    
    def __init__(self):
        pass
    
    def fit(self, X, y=None):
        # Store X mean and std in instance attributes
        self._mean = X.mean()
        self._std = X.std(ddof=0)
        # Return self to allow chaining
        return self
    
    def transform(self, X, y=None): 
        # Check if the instance was fitted
        if not (hasattr(self, "_mean") and hasattr(self, "_std")):
            raise NotFittedError("This CustomStandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.")
        # Standardization
        standardized_feature = (X - self._mean) / self._std
        return standardized_feature

In [4]:
CustomStandardizer().fit_transform(data)

Unnamed: 0,one,two,three
0,-1.224745,-1.224745,-1.224745
1,0.0,0.0,0.0
2,1.224745,1.224745,1.224745


In [5]:
# new_data: same mean as the original data, but with standard deviation 10 times larger
new_data = 10 * (data - data.mean()) + data.mean()

In [6]:
custom_scaler = CustomStandardizer().fit(data)
custom_scaler.transform(new_data)

Unnamed: 0,one,two,three
0,-12.247449,-12.247449,-12.247449
1,0.0,0.0,0.0
2,12.247449,12.247449,12.247449


Showing the behaviour if we call `transform` before fitting the instance

In [7]:
custom_scaler2 = CustomStandardizer()
custom_scaler2.fit_transform(data)

Unnamed: 0,one,two,three
0,-1.224745,-1.224745,-1.224745
1,0.0,0.0,0.0
2,1.224745,1.224745,1.224745


As a bonus exercise, read about the [`inverse_transform`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler.inverse_transform) method of the `StandardScaler` class and try to implement it in your custom scaler. 