# Object Oriented Programming with SKLearn

## Objectives

- Understand the concept of object-oriented inheritance
- Understand the main object types of the Scikit-Learn API
- Extend and create custom Scikit-Learn classes

# Inheritance

We've learned a lot already on object-oriented programming and how to create our own classes.

We can also define classes in terms of _other_ classes, in which case the new classes **inherit** the attributes and methods from the classes in terms of which they're defined.

## Motivation: So What's the Benefit? 

_More abstraction is better_

Take a look at this code below. Look at how much we've already done:

In [None]:
# Look at all that code we wrote... do we have to do it all again...?
class Robot():
    purpose = 'To love humans'
    
    # We'd like to start off with some initial attributes
    def __init__(self, first_name='?', last_name=''):
        # Clean the names of extra spaces at beginning & end
        first_name = first_name.strip()
        last_name = last_name.strip()    
        # Setting attributes
        self._first_name = first_name
        self._last_name = last_name
        # Combine first and last names and remove any extra spacing
        self.name = ' '.join([first_name,last_name]).strip()

           
    def change_name(self, new_name):
        self.name = new_name
    
    def speak(self):
        print(f'I am {self.name}!')

Let's say we wanted to make another bot with some extra functionality - like keeping track of its battery charge.

Do we have to copy and paste this and then add our new functionality? 

Nope! We can add functionality on top of the stuff we already did!

In [None]:
class BatteryBot(Robot): # Specify the base class(es) we inherit from
    '''A robot that takes care of garbage while we're away!'''
    # Added functionality
    battery = 100
    
    def speak(self):
        print(f"I'm {self.name} and have {self.battery}% battery charged")
        self.battery -= 10

In [None]:
new_robot = BatteryBot('Wall-e')
new_robot.speak()

In [None]:
new_robot.speak()

And I still keep the other functionality from the original class!

In [None]:
new_robot.change_name('E-llaw') # Note we never defined this in BatteryBot!
new_robot.speak()

## Inheritance in Data Science

A lot of motivation in how we write our code can be summed up with, "Never reinvent the wheel". And using **inheritance** can make this really easy.

Later, we'll be taking Scikit-Learn's objects and customizing them to our particular needs. This can be a common practice as we use libraries and tools to write reproducible code.

Inheritance allows us to write some of this code quickly by avoiding a lot of "boilerplate" code (the same code we write over and over just to do a minor change).

# Duck Typing

But we don't need inheritance to do everything. 

A different method of getting functionality using different objects is called **duck typing**. The term comes from the saying: 
> **"If it walks like a duck and it quacks like a duck, then it must be a duck."**

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Rubber_Duck_Front_View_in_Fine_Day_20140107.jpg/800px-Rubber_Duck_Front_View_in_Fine_Day_20140107.jpg?20140107225055" alt="inflatable duck on water image from wikimedia commons" width=400>

When you're using the concept of duck typing, you really don't care about the object _type_.

Instead, all you care about are the **methods and properties** of the object.

## Duck Typing in Scikit-Learn

Scikit-Learn relies more on duck typing over pure inheritance. In general, if an object has certain methods that `sklearn` expects, than it's mostly compatible with other `sklearn` objects!

However, inheritance in Scikit-Learn is typically used to avoid _boilerplate_ code. Usually this involves using [`sklearn.base`](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.base) such as [`sklearn.base.BaseEstimator`](https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator).

# Scikit-Learn's API: (Estimators, Transformers, Predictors)

SKLearn has a great [API](https://scikit-learn.org/stable/developers/develop.html) that has objects that are consistent and easy to make compatible with your own objects!

(But I thought SKLearn was a library...? Explore [APIs VS Libraries](https://rapidapi.com/blog/api-vs-library/#:~:text=and%20Google%20APIs.-,API%20vs.%20Library,-If%20a%20running))

Let's go over the objects that will be most relevant to us in the near future.

For each of these class types, they will **ALWAYS** have these methods! That's the power of SKLearn!

## 1) Estimator

> This is an object that can can take in data and _estimate_ (or *learn*) some parameters. 

This means regression and classification models are estimators but so are objects that transform the original dataset ([Transformers](#Transformer)) such as `StandardScaler`.

### `fit`

All estimators estimate/learn by calling the `fit()` method by passing in the dataset. Other parameters can be passed in to "help" the estimator to learn. These are called **hyperparameters**, parameters used to tweak the learning process.

## 2) Transformer

> Some estimators can change the original data to something new, a **transformation**. 

You can think of examples of these **transformers** when you do scaling, data cleaning, or expanding/reducing on a dataset.

### `transform`

Transformers will call the `transform()` method to apply the transformation to a dataset _after_ a `fit()` call.

###  `fit_transform`

Remember that all estimators have a `fit()` method, so a transformer (a type of estimator in sklearn) can use the `fit()` method to learn something about the given dataset. After learning with `fit()`, a transformation on the dataset can be made with the `transform()` method. 

When you call `fit` and `transform` with the same dataset (for example, X_train), you can simply call the `fit_transform()` method. This essentially has the same results as calling `fit()` and then `transform()` on the dataset but possibly with some optimization and efficiencies baked in.

(But - be careful! Be sure not to re-fit every time. Remember, you want to fit on training data then apply and NOT re-fit when transforming test data)

## 3) Predictor

> We would use the `fit()` method to train our predictor object and then feed in new data to make predictions (based on what it learned in the fitting stage).

We've used **predictors** whenever we've made predictions like with a `LinearRegression` model.

### `predict`

As you probably can guess, the `predict()` method predicts results from a dataset given to it after being trained with a `fit()` method

### `score`

Predictors also have a `score()` method that can be used to evaluate how well the predictor performed on a dataset (such as the test set).

All predictors have a default scoring metric - usually, R-Squared for regression models and accuracy for classification models.

## Observing a Scikit-Learn Class Definition from Source

Let's begin by taking a look at the source code for `sklearn`'s [SimpleImputer](https://github.com/scikit-learn/scikit-learn/blob/baf0ea25d/sklearn/impute/_base.py#L132)

Take a minute to peruse the source code on your own. What do you notice?

# Creating a Scikit-Learn Transformer

Let's try to create a new _transformer_ that will transform the data in the following manner:

- If the value is **positive**, scale the value by the **largest value** in that column
- If the value is **negative**, change it to $0$

In [None]:
# Imports
import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin

## Creating a New Transformer

First, we create our base estimator/transformer through inheritance of [`sklearn.base.BaseEstimator`](https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator):

In [None]:
class SpecialTransformer(BaseEstimator):
    pass

my_special_trans = SpecialTransformer()
my_special_trans

This by itself is pretty useless. But we can now add in new `fit()` method which will find the maximum value for each column/feature.

## Creating a `fit` Method

In [None]:
class SpecialTransformer(BaseEstimator):
    # Don't need an __init__ method
    
    # Let's define our fit method! Takes in `self`, plus X and y
    # Note: by convention, we accept a y parameter (we won't use it)
    
        # Get the maximum value for each column/feature
        # Can use numpy, setting axis=0 to get max for each column of array
        # Note we use a trailing underscore for values "learned" from fit()
        
        # Then, return self
        

In [None]:
my_special_trans = SpecialTransformer()

In [None]:
## Let's use some test data
# Note each column is a feature, each row a data point
X = np.array([
    [-4, 400, 40],
    [10, -100, 1],
    [6, -800, 700],
    [2, 0, 400],
    [8, 200, 1000]
])

X

> Quick check: What would be the max values for each column/feature?

In [None]:
# What happens before we fit?
my_special_trans.max_

In [None]:
# Now let's check what that looks like AFTER we fit
my_special_trans.fit(X)
my_special_trans.max_

Great! 

## Creating `transform` Method

Let's now actually implement a way to transform our data:

In [None]:
class SpecialTransformer(BaseEstimator): 
    # Our fit method
    def fit(self, X, y=None):
        self.max_ = np.max(X, axis=0) 
        return self
    
    # Now define our transform method
    
        '''
        Docstring to remind us of our goal!
        Scale the values passed in: 
            - Negatives go to 0
            - Positives scaled by maximum value found in fit()
        '''
        # A note - nicer to do this on a copy of our array
        
        # If negative value, turn it to 0
        # We can use what looks like an implied loc statement to do this
        # Find all values in X where the cell is less than zero, then set to 0
        
        # Now, divide everything by self's max_ value 
        # (previous negative values will remain 0)
        
        # Be sure to return the transformed data
        

In [None]:
# Recall the data
X

In [None]:
# Create a SpecialTransformer and fit with the data
my_special_trans = SpecialTransformer()
my_special_trans.fit(X)

In [None]:
# Transform the data
X_new = my_special_trans.transform(X)
X_new

## Conclusion

We now created our very own transformer! We could even feed in one data set to _fit_ our object and then a different dataset to _transform_.

We should note that there's still a lot of customization we could have done. 

For example, we didn't consider what happens if the maximum value for a feature was $0$. We really should code how we want that to be handled (but we just ignored it for now).

We also could have gotten the `fit_transform()` method automatically by also inheriting from [`TransformerMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html#sklearn.base.TransformerMixin). See the code below:

In [None]:
class SpecialTransformer(BaseEstimator, TransformerMixin):
    
    def fit(self, X, y=None):
        self.max_ = np.max(X,axis=0) 
        return self
    
    def transform(self, X):
        X_copy = np.copy(X)
        X_copy[X_copy < 0] = 0
        return X_copy / self.max_

In [None]:
my_special_trans = SpecialTransformer()
# Note we can now do fit_transform()
X_new = my_special_trans.fit_transform(X)
X_new

# Exercise: Create Your Own Transformer

Your turn! Let's try to recreate the [`MinMaxScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) object!

Min-max scaling transforms the values in the following way:

```
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
```

Remember from above - by passing `axis=0` like this, you get the min and max for each column effortlessly.

In [None]:
# Recall our test data
X = np.array([
    [-4, 400, 40],
    [10, -100, 1],
    [6, -800, 700],
    [2, 0, 400],
    [8, 200, 1000]
])

X

In [None]:
# Feel free to test out your code here, before writing it into a class


In [None]:
# Now create your class! Call it MyMinMaxScaler


    # Define a fit method
    
        
    
    # Define a transform method
    
    
    

## Test Your Code!

Once you have it, you can test it against the data below and Scikit-Learn's `MinMaxScaler`

In [None]:
# Test against SKLearn's MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
sklearn_scaler = MinMaxScaler()
X_sklearn_scaled = sklearn_scaler.fit_transform(X)
X_sklearn_scaled

In [None]:
# Catches errors
try:
    # Your implementation
    my_scaler = MyMinMaxScaler()
    my_scaler.fit(X)
    X_my_scaled = my_scaler.transform(X)
    display(X_my_scaled)
    
    # Check against StandardScaler
    print('StandardScaler and MyStandardScaler same?')
    print(X_sklearn_scaled.round(5) == X_my_scaled.round(5))
except:
    print('Check your fit() and transform() methods!')