## Create scikit-learn compatible transformers
- We are going to create most of our feature engineering steps with open source libraries.
- But we have to create custom transformers for transformations that we didn't find a suitable class from feature-engine or scikit-learn. For these transformers, we have to create classes with methods/attributes that are "scikit-learn compatible"
    - Capture the elaspe time (in years) between a year variable and YrSold (when the house was sold).
        - For example, we replace YrBuilt with (YrSold - YrBuilt)
    - Categorical variables encoding (the encoding of categorical variables that already have ordered levels from strings to numeric) 

## NOTE: I also create a .py version of this notebook (which we will import in notebook 7) called src/preprocessing.py

## Reference: 
    https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator
    https://sklearn-template.readthedocs.io/en/latest/user_guide.html

In [4]:
import numpy as np
import pandas as pd

'''scikit-learn'''
from sklearn.base import BaseEstimator, TransformerMixin

### Transformer to capture the elaspe time (in years) between a year variable and YrSold (when the house was sold)

In [2]:
# inherit the required parent classes from scikit-learn
class TemporalVariableTransformer(BaseEstimator,TransformerMixin):
    
    # constructor
    def __init__(self,variables,reference_variable): # reference_variable is YrSold
        
        """Constructor
        
        Args:
            variables (List[str]): a list of year variable names
            reference_variable (str): name of the reference variable
        
        Returns:
            void        
        """
        
        # Error handling: Do some sanity checks on the input parameter (it should be a list)
        if not isinstance(variables,list):
            raise ValueError('variables shold be a list')
            
        # set the attributes variables and reference_variable
        self.variables = variables # a list of variables
        self.reference_variable = reference_variable
        
        
    # fit method 
    def fit(self,X,y=None):
        """ Fit
        
        Args:
            X (DataFrame): a input dataframe of features to train the transformer
            y (DataFrame): a input Series of response variable to train the transformer (optional)
            
        Returns:
            self    
        """
        # We don't need to learn any parameters for this transformer. Nonetheless, we still need 
        # to include a fit method so that the Transformer class would be compatible to sklearn
        
        return self

    def transform(self,X):     
        """ Transform
        
        Args:
            X (DataFrame): a input dataframe of features to be transformed
          
        Returns:
            X (DataFrame): the transformed Dataframe of features    
        """
        
        # Make a copy of the input Dataframe of features to be transformed
        # so we won't overwrite the original Dataframe that was passed as argument
        X=X.copy()
        
        # Perform the transformation: df[var] = df[reference_variable] - df[var]
        for var in self.variables:
            X[var] = X[self.reference_variable] - X[var]
        
        return X

### Categorical variables encoding (the encoding of categorical variables that already have ordered levels from strings to numeric) 
- We can use this transformer to recode as many variables as we want
- Put the variables that we want to recode in a list called "variables"
- "mappings" is a dictionary that maps the old encoding to the new encoding

In [3]:
class Mapper(BaseEstimator,TransformerMixin):
    """
    Constructor
    
    Args:
        variables (List[str]): a list of variables to be recoded (specified by user)
        mappings (dict): a dictionary of mappings from old to new encoding
    
    Returns:
        void
    """
    def __init__(self,variables,mappings):
        
        # Error handling: check to ensure variables is a list
        if not isinstance(variables,list):
            raise ValueError('variables should be a list')
            
        # Error handling: check to ensure variables is a dict
        if not isinstance(mappings,dict):
            raise ValueError('mapping should be a dictionary')
        
        # set attributes at instantiation of class
        self.variables = variables
        self.mappings = mappings
        

    def fit(self,X,y=None): # need to have y as argument to make class compatible with sklearn pipeline
        """ Fit
        
        Args:
            X (DataFrame): a input dataframe of features to train the transformer
            y (DataFrame): a input Series of response variable to train the transformer (optional)
            
        Returns:
            self    
        """
        # We don't need to learn any parameters for this transformer. Nonetheless, we still need 
        # to include a fit method so that the Transformer class would be compatible to sklearn
        
        return self
    
    def transform(self,X):
        """ Transform
        
        Args:
            X (DataFrame): a input dataframe of features to be transformed
          
        Returns:
            X (DataFrame): the transformed Dataframe of features    
        """
        
        # Make a copy of the input Dataframe of features to be transformed
        # so we won't overwrite the original Dataframe that was passed as argument
        X=X.copy()
        
        # Perform recoding of the levels of var
        for var in self.variables:
            X[var] = X[var].map(self.mappings)
            
        return X

## NOTE: mappings is a dict where the key are the variable names, and the values are another dictionary with the mapping for each level:


ordinal_mappings = { <br>
    'MSZoning': {'Rare': 0, 'RM': 1, 'RH': 2, 'RL': 3, 'FV': 4},<br>
    'Street': {'Rare': 0, 'Pave': 1},<br>
    'Alley': {'Grvl': 0, 'Pave': 1, 'Missing': 2},<br>
    'LotShape': {'Reg': 0, 'IR1': 1, 'Rare': 2, 'IR2': 3},<br>
    'LandContour': {'Bnk': 0, 'Lvl': 1, 'Low': 2, 'HLS': 3},<br>
    'Utilities': {'Rare': 0, 'AllPub': 1},<br>
    'LotConfig': {'Inside': 0, 'FR2': 1, 'Corner': 2, 'Rare': 3, 'CulDSac': 4},<br>
    'LandSlope': {'Gtl': 0, 'Mod': 1, 'Rare': 2}}