# Hand-made Standardizer

## 1. Key Concept : State-less vs. State-full transformers

👇 Consider the following train and test sets

In [1]:
import numpy as np
import pandas as pd

X_train = pd.DataFrame({
    'A': {0: 1, 1: 2, 2: 3},
    'B': {0: 2, 1: 3, 2: 4},
    'C': {0: 3, 1: 4, 2: 5}})
display(X_train)

X_test = pd.DataFrame({
    'A': {0: 1, 1: 2, 2: 3},
    'B': {0: 2, 1: 3, 2: 4},
    'C': {0: 3, 1: 4, 2: 10}})
display(X_test)

Unnamed: 0,A,B,C
0,1,2,3
1,2,3,4
2,3,4,5


Unnamed: 0,A,B,C
0,1,2,3
1,2,3,4
2,3,4,10


👇 And the following pipeline

In [2]:
from sklearn import set_config; set_config(display='diagram')
from sklearn.preprocessing import FunctionTransformer, StandardScaler
from sklearn.pipeline import make_pipeline, make_union

scaler = StandardScaler()
feature_averager = FunctionTransformer(lambda df: pd.DataFrame(1/3 * (df["A"] + df["B"] + df["C"])))
pipe = make_union(scaler, feature_averager)
pipe

In [3]:
pipe.fit(X_train)
pd.DataFrame(pipe.transform(X_train))

Unnamed: 0,0,1,2,3
0,-1.224745,-1.224745,-1.224745,2.0
1,0.0,0.0,0.0,3.0
2,1.224745,1.224745,1.224745,4.0


In [4]:
pd.DataFrame(pipe.transform(X_test))

Unnamed: 0,0,1,2,3
0,-1.224745,-1.224745,-1.224745,2.0
1,0.0,0.0,0.0,3.0
2,1.224745,1.224745,7.348469,5.666667


☝️ Notice how the `StandardScaler` and the `FunctionTransformer` are fundamentally different:

1️⃣ `FunctionTransformer` can only performs **stateless** transformations
 
$(X_1, X_2, X_3)$ --> $\frac{(X_1 + X_2 + X_3)}{3}$ for our `feature_averager`

other stateless transformations for instance: 

$X$ --> $log(X)$  
$(X_1, X_2)$ --> $X_1 + 5X_2^2$ 

2️⃣ `StandardScaler` performs a **state-full** transformation 

$
X \rightarrow \frac{(X-\mu )}{\sigma}
$

- that requires to **store** information from the train set during the `.fit` (here, `mean_train` and `std_train`)
- In order to **reuse/apply** these back later during the `.transform` phase, on *both* train or test sets

☝️ What if we wanted to code our own state-full custom transformer? For that, we will have to code our own class

## 2. Create your own state-full transformer

### 2.1 CustomStandardizer

👉 Try to code your own class `CustomStandardizer` that should behave exactly like `StandardScaler` from scikit-learn.  
This means having a `fit()` and `transform()` method.

Then, fit it on `X_train` and transform `X_test` with it to compare with the original scikit-learn version!





In [None]:
# TransformerMixin inheritance is used to create fit_transform() method from fit() and transform()
from sklearn.base import TransformerMixin, BaseEstimator

# I need to think about this a bit more

class CustomStandardizer(TransformerMixin, BaseEstimator):
    
    def __init__(self):
        pass
    
    def fit(self, X, y=None):
        # Store what needs to be stored as instance attributes. Return "self" to allow chaining fit and transform.
        
        
        return self
    
    def transform(self, X, y=None): 
        self.transformed
        
        return
    
    

In [None]:
# Try it out below
custom_standardizer = CustomStandardizer()
custom_standardizer.fit(X_train)
custom_standardizer.transform(X_test)

In [None]:
from nbresult import ChallengeResult

tmp = CustomStandardizer()
tmp_train = np.array(tmp.fit_transform(X_train))
tmp_test = np.array(tmp.transform(X_test))

result = ChallengeResult('standardizer', 
                         X_train_transformed=tmp_train,
                         X_test_transformed=tmp_test
)

result.write()
print(result.check())

<details>
<summary>💡 Hint if test above only fail by a small margin </summary>

Be carefull there is a slight difference between `np.std()` and `pd.std` methods! This stackoverflow [post](https://stackoverflow.com/questions/44220290/sklearn-standardscaler-result-different-to-manual-result) might help 😉
      
</details>

### 2.2 Inverse Transform

❗️ Scikit-learn transformer also have [`inverse_transform`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler.inverse_transform) methods. Try to implement it in your custom scaler!

In [None]:
# YOUR CODE HERE

In [None]:
# Test yourself below

custom_scaler = CustomStandardizer().fit(X_train)
X_train_transformed = custom_scaler.transform(X_train)
display(X_train_transformed)

X_train_detransformed = custom_scaler.inverse_transform(X_train_transformed)
display(X_train_detransformed)

In [None]:
assert np.allclose(X_train_detransformed, X_train)

### 2.3 Complete custom pipeline!

👉 Now that we have replicated scikit-learn's `StandardScaler`, we create many new ones !

Try to create the following:

- A `CustomStandardizer(shrink_factor = 1)` which takes one additional argument to allow scaling by more than 1 standard deviation


- A `FeatureAverager()` class that improves upon the one you built on section 1, scaling the result of the 

$$(X_1, X_2, X_3) --> \frac{1/3 * (X_1 + X_2 + X_3)}{max(X_1+X_2+X_3)}$$

Then, use them both to your ininitial feature union `pipe` to make your own custom pipeline!

In [None]:
# Feature Averager



class FeatureAverager(TransformerMixin, BaseEstimator):
    
    def __init__(self):
        pass
    
    def fit(self, X, y=None):
        # Store what needs to be stored as instance attributes. Return "self" to allow chaining fit and transform.
        pass  # YOUR CODE HERE
    
    def transform(self, X, y=None): 
        pass  # YOUR CODE HERE
    

Test you feature averager custom transformer by fitting on `X_train` and transfoming it

In [None]:
pass  # YOUR CODE HERE

In [None]:
from nbresult import ChallengeResult

tmp = FeatureAverager()
tmp_train = np.array(tmp.fit_transform(X_train))
tmp_test = np.array(tmp.transform(X_test))

result = ChallengeResult('feature_averager', 
                         X_train_transformed=tmp_train,
                         X_test_transformed=tmp_test
)

result.write()
print(result.check())

Create a feature union named `pipe` using your custom standardizer and the feature averager created

In [None]:
pass  # YOUR CODE HERE

Fit the `X_train` and `X_test` and transform them

In [None]:
# fit and transform X_train

pass  # YOUR CODE HERE

In [None]:
# fit and transform X_test

pass  # YOUR CODE HERE

In [None]:
from nbresult import ChallengeResult

tmp = pipe
tmp_train = np.array(tmp.fit_transform(X_train))
tmp_test = np.array(tmp.transform(X_test))

result = ChallengeResult('feature_union_custom_transformers', 
                         X_train_transformed=tmp_train,
                         X_test_transformed=tmp_test
)

result.write()
print(result.check())

🏁 Congratulation! Don't forget to commit and push your notebooks