# Behavioral Patterns

The behavioral patterns are guidelines suggested to create classes that allows to communicate objects to each other. The goal of the behavioral patterns is to make those interactions easier and more understandable.

The five creational patterns available are:
1. Chain of Responsability
2. Command
3. Iterator
4. Mediator
5. Memento
6. Observer
7. State
8. Strategy
9. Template method
10. Visitor

Again, we are going to be convering eaach of the possible patterns and making some examples in the context of machine learning engineering.

# Memento

**What is a memento?**

This is a class designed to create checkpoints of objects. Even though it might be confused with the `command` design pattern, it is focused on objects. The command is focused on the attributes of a request. 

**When should we use it?**

This is should be used when it is needed to save a retrieve an specific object state of an application.

**Scenario**

You're a lead machine learning engineer. You have designed a pipeline that process data with a standard scaler, minmax scaler and a robust scaler. You're goal is to design a system that can retrieve and save the object state after applying some transformations.

In [40]:
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import MinMaxScaler, RobustScaler, StandardScaler

import pandas as pd

In [41]:
df = load_diabetes(as_frame=True, return_X_y=True)

In [42]:
df = pd.concat([df[0], df[1]], axis=1)

In [43]:
df2 = df.copy()

## Antipattern

The easiest antipattern to do is doing the things manually. 

In [44]:
class DataNormalization:
    def __init__(self):
        pass

    def minmax_scaler(self, df: pd.DataFrame, columns: list) -> pd.DataFrame:
        scaler = MinMaxScaler()
        df[columns] = scaler.fit_transform(df[columns])
        return df

    def standard_scaler(self, df: pd.DataFrame, columns: list) -> pd.DataFrame:
        scaler = StandardScaler()
        df[columns] = scaler.fit_transform(df[columns])
        return df

    def robuts_scaler(self, df: pd.DataFrame, columns: list) -> pd.DataFrame:
        scaler = RobustScaler()
        df[columns] = scaler.fit_transform(df[columns])
        return df

Let's apply the data normalizer over different pieces of data

In [45]:
data_transformation = DataNormalization()

In [46]:
result_minmax = data_transformation.minmax_scaler(df, df.columns[:-1].tolist())

In [47]:
df == result_minmax

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,True,True,True,True,True,True,True,True,True,True,True
1,True,True,True,True,True,True,True,True,True,True,True
2,True,True,True,True,True,True,True,True,True,True,True
3,True,True,True,True,True,True,True,True,True,True,True
4,True,True,True,True,True,True,True,True,True,True,True
...,...,...,...,...,...,...,...,...,...,...,...
437,True,True,True,True,True,True,True,True,True,True,True
438,True,True,True,True,True,True,True,True,True,True,True
439,True,True,True,True,True,True,True,True,True,True,True
440,True,True,True,True,True,True,True,True,True,True,True


The problem with this is when creating a new variable, python is only assigning a reference of the value. It is not creating a new variable with the results. Thus, we cannot have an snapshot of the original data source and is imposible to make a rollback.

**How to solve the antipattern?**

1. We create a memento class manager that saves the snapshots.
2. Add a loading state function into the data class to load the snapshots given a memento.

## Pattern

In [48]:
class DataMemento:
    def __init__(self, data):
        self.data = data.copy()

    def get_data(self):
        return self.data

We create a new class that can handle the memento

In [49]:
class NewDataNormalization:
    def __init__(self):
        self.history = []

    def minmax_scaler(
        self, df: pd.DataFrame, columns: list, save_backup: bool = False
    ) -> pd.DataFrame:
        if save_backup:
            self.save_state(df)
        scaler = MinMaxScaler()
        df[columns] = scaler.fit_transform(df[columns])
        return df

    def standard_scaler(
        self, df: pd.DataFrame, columns: list, save_backup: bool = False
    ) -> pd.DataFrame:
        if save_backup:
            self.save_state(df)
        scaler = StandardScaler()
        df[columns] = scaler.fit_transform(df[columns])
        return df

    def robust_scaler(
        self, df: pd.DataFrame, columns: list, save_backup: bool = False
    ) -> pd.DataFrame:
        if save_backup:
            self.save_state(df)
        scaler = RobustScaler()
        df[columns] = scaler.fit_transform(df[columns])
        return df

    def save_state(self, df: pd.DataFrame):
        memento = DataMemento(df)
        self.history.append(memento)
        return memento

    def restore_state(self, index: int) -> pd.DataFrame:
        if index < 0 or index >= len(self.history):
            raise IndexError("Invalid checkpoint index")
        return self.history[index].get_data()

In [50]:
new_pipeline = NewDataNormalization()

In [51]:
result_minmax = new_pipeline.minmax_scaler(
    df2, df.columns[:-1].tolist(), save_backup=True
)

In [52]:
new_pipeline.history[0].get_data()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,0.038076,0.050680,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204,75.0
2,0.085299,0.050680,0.044451,-0.005670,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.025930,141.0
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362,206.0
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641,135.0
...,...,...,...,...,...,...,...,...,...,...,...
437,0.041708,0.050680,0.019662,0.059744,-0.005697,-0.002566,-0.028674,-0.002592,0.031193,0.007207,178.0
438,-0.005515,0.050680,-0.015906,-0.067642,0.049341,0.079165,-0.028674,0.034309,-0.018114,0.044485,104.0
439,0.041708,0.050680,-0.015906,0.017293,-0.037344,-0.013840,-0.024993,-0.011080,-0.046883,0.015491,132.0
440,-0.045472,-0.044642,0.039062,0.001215,0.016318,0.015283,-0.028674,0.026560,0.044529,-0.025930,220.0


In [53]:
result_minmax

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,0.666667,1.0,0.582645,0.549296,0.294118,0.256972,0.207792,0.282087,0.562217,0.439394,151.0
1,0.483333,0.0,0.148760,0.352113,0.421569,0.306773,0.623377,0.141044,0.222437,0.166667,75.0
2,0.883333,1.0,0.516529,0.436620,0.289216,0.258964,0.246753,0.282087,0.496578,0.409091,141.0
3,0.083333,0.0,0.301653,0.309859,0.495098,0.447211,0.233766,0.423131,0.572923,0.469697,206.0
4,0.516667,0.0,0.206612,0.549296,0.465686,0.417331,0.389610,0.282087,0.362385,0.333333,135.0
...,...,...,...,...,...,...,...,...,...,...,...
437,0.683333,1.0,0.421488,0.704225,0.431373,0.359562,0.259740,0.282087,0.605672,0.530303,178.0
438,0.466667,1.0,0.285124,0.183099,0.627451,0.619522,0.259740,0.423131,0.415810,0.666667,104.0
439,0.683333,1.0,0.285124,0.530563,0.318627,0.323705,0.272727,0.249647,0.305030,0.560606,132.0
440,0.283333,0.0,0.495868,0.464789,0.509804,0.416335,0.259740,0.393512,0.657026,0.409091,220.0


As you see before, the minmax result, both are different, but we have gain the option to backup our results through different snapshots.