<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Agenda" data-toc-modified-id="Agenda-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Agenda</a></span></li><li><span><a href="#Describe-the-relationship-of-classes-to-objects,-and-learn-to-code-classes" data-toc-modified-id="Describe-the-relationship-of-classes-to-objects,-and-learn-to-code-classes-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Describe the relationship of classes to objects, and learn to code classes</a></span><ul class="toc-item"><li><span><a href="#Classes" data-toc-modified-id="Classes-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Classes</a></span></li><li><span><a href="#Methods" data-toc-modified-id="Methods-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Methods</a></span></li><li><span><a href="#Magic-Methods" data-toc-modified-id="Magic-Methods-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Magic Methods</a></span></li><li><span><a href="#Positional-vs.-Named-arguments" data-toc-modified-id="Positional-vs.-Named-arguments-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Positional vs. Named arguments</a></span></li><li><span><a href="#Exercise" data-toc-modified-id="Exercise-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li><li><span><a href="#Overview-of-inheritance" data-toc-modified-id="Overview-of-inheritance-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Overview of inheritance</a></span><ul class="toc-item"><li><span><a href="#Another-Example" data-toc-modified-id="Another-Example-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Another Example</a></span></li><li><span><a href="#Exercise" data-toc-modified-id="Exercise-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li><li><span><a href="#Important-data-science-tools-through-the-lens-of-objects:" data-toc-modified-id="Important-data-science-tools-through-the-lens-of-objects:-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Important data science tools through the lens of objects:</a></span><ul class="toc-item"><li><span><a href="#StandardScaler" data-toc-modified-id="StandardScaler-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span><code>StandardScaler</code></a></span><ul class="toc-item"><li><span><a href="#Attributes" data-toc-modified-id="Attributes-4.1.1"><span class="toc-item-num">4.1.1&nbsp;&nbsp;</span>Attributes</a></span><ul class="toc-item"><li><span><a href="#.scale_" data-toc-modified-id=".scale_-4.1.1.1"><span class="toc-item-num">4.1.1.1&nbsp;&nbsp;</span><code>.scale_</code></a></span></li></ul></li></ul></li><li><span><a href="#Task:-One-hot-Encoder" data-toc-modified-id="Task:-One-hot-Encoder-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Task: One-hot Encoder</a></span></li></ul></li></ul></div>

![fvo](https://cdn.educba.com/academy/wp-content/uploads/2018/07/Functional-Programming-vs-OOP-1.png)

# Object-Oriented Programming

In [None]:
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_regression

## Agenda

SWBAT:

1. describe the relationship of classes and objects, and to code classes;
2. explain the notion of inheritance;
3. describe how the object structure is used in `sklearn` tools like `StandardScaler` and `OneHotEncoder`.

## Describe the relationship of classes to objects, and learn to code classes

Each object is an instance of a **class** that defines a bundle of attributes and functions (now, as proprietary to the object type, called *methods*), the point being that **every object of that class will automatically have those proprietary attributes and methods**.

A class is like a blueprint that describes how to create a specific type of object.

![blueprint](img/blueprint.jpeg)

### Classes

We can define **new** classes of objects altogether by using the keyword `class`:

In [None]:
class Car:
    """Automotive object"""
    pass # This is called a stub.

In [None]:
# Instantiate a car object

ferrari = Car()
type(ferrari)

In [None]:
# We can give the Ferrari four wheels

ferrari.wheels = 4
ferrari.wheels

But wouldn't it be nice not to have to do that every time? We'll just include the 4-wheels specification in the blueprint!

In [None]:
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.

In [None]:
civic = Car()
civic.wheels

In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

In [None]:
ferrari = Car()
ferrari.doors

In [None]:
ferrari.wheels

In [None]:
# Does your Ferrari have only 2 doors? 
# These attributes can be overwritten.

ferrari.doors = 2
ferrari.doors

### Methods

We can also write functions that are associated with each class.  
As said above, a function associated with a class is called a method.

In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')

In [None]:
ferrari = civic = Car()
ferrari.honk()
civic.honk()

In [None]:
type(ferrari.wheels)

In [None]:
type(ferrari.honk())

Wait a second, what's that `self` doing? <br/> Every method should include `self` as its first parameter, **which refers to the individual object, i.e. to the instance of the class**.

### Magic Methods

It is common for a class to have magic methods. These are identifiable by the "dunder" (i.e. **d**ouble **under**score) prefixes and suffixes, such as `__init__()`. These methods will get called **automatically** as a result of a different call, as we'll see below.

For more on these "magic methods", see [here](https://www.geeksforgeeks.org/dunder-magic-methods-python/).

When we create an instance of a class, Python invokes the __init__ to initialize the object.  Let's add __init__ to our class.

In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    WHEELS = 4                      # As a convention, capital letters
                                    # are used for constants.
    
    def __init__(self, doors, fwd): # By adding doors and moving to init,
                                    # we shall now need to pass parameters when
                                    # instantiating the object!
        self.doors = doors
        self.fwd = fwd
        

    def honk(self):                 # These are methods we can call on *any* car.
        print('Beep beep')

In [None]:
civic = Car()

In [None]:
civic = Car(doors=4, fwd=True)

print(civic.doors)
print(civic.fwd)

We can also pass default arguments if there is a value for a certain parameter which is very common.

In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    WHEELS = 4                     
    
    # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False):
        
        self.doors = doors
        self.fwd = fwd
        

    def honk(self):                  
        print('Beep beep')

In [None]:
civic = Car()
print(civic.doors)
print(civic.fwd)

### Positional vs. Named arguments

In [None]:
# we can pass our arguments without names

civic = Car(4, True)

In [None]:
# or with names

civic = Car(doors=4, fwd=True)

In [None]:
# The self argument allows our methods to update our attributes.

# Then we can add more attributes.

class Car:
    """Automotive object"""
    
    WHEELS = 4                     
    
    # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False,
                 driver_mood='peaceful'):
        
        self.doors = doors
        self.fwd = fwd
        self.driver_mood = driver_mood
        

    def honk(self):                  
        print('Beep beep')
        self.driver_mood = 'aggravated'

In [None]:
civic = Car()
print(civic.driver_mood)
civic.honk()
print(civic.driver_mood)

### Exercise

Let's add an attribute `moving` which indicates, with a boolean, whether the car is moving or not.

Fill in the functions `stop()` and `go()` so that the attribute `moving` will reflect the car's present state of motion after the method is called.

Make sure the method works by calling it, then printing the attribute.

In [None]:
# Then we can add more attributes
class Car:
    """Automotive object"""
    
    WHEELS = 4
    
    # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False, driver_mood='peaceful'):
        
        self.doors = doors
        self.fwd = fwd
        self.driver_mood = driver_mood
        self.moving = moving
        
    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')
        
    def go(self):
        pass
    
    def stop(self):
        pass

In [None]:
# Test your code from above

civic = Car()
print(civic.moving)

civic.go()
print(civic.moving)

civic.stop()
print(civic.moving)

## Overview of inheritance

We can also define classes in terms of *other* classes, in which case the new classes **inherit** the attributes and methods from the classes in terms of which they're defined.

Suppose we decided we want to create an electric car class.

In [None]:
#  Then we can add more attributes
class ElectricCar(Car):
    """Automotive object"""
    
    pass

In [None]:
prius = ElectricCar()
prius.honk()
prius.WHEELS

In [None]:
#  Then we can add more attributes
class ElectricCar(Car):
    """Automotive object"""
    
    # default arguments included now in __init__
    def __init__(self, hybrid=False):
        super().__init__()                   # super() refers to the parent class.
                                             # See https://realpython.com/python-super/
                                             # for more.
        self.hybrid = True

In [None]:
#  And we can overwrite methods and parent attributes
class ElectricCar(Car):
    """Automotive object"""
    
    # default arguments included now in __init__
    def __init__(self, hybrid=False):
        
        # Prius owners are calmer than the average car owner
        super().__init__(driver_mood='serene')
        
        self.hybrid = True
        
    # overwrite inherited methods
    
    def go(self):
        
        print('Whirrrrrr')
        self.moving = True

In [None]:
prius = ElectricCar()
print(prius.moving)

In [None]:
prius.go()
prius.moving
print(prius.driver_mood)

In [None]:
prius.stop()

### Another Example

In [None]:
class Shape:
    def __init__(self, n):
        self.n_sides = n
    sides = []

In [None]:
class Triangle(Shape):
    def __init__(self):
        Shape.__init__(self, 3)

    def findArea(self):
        a, b, c = self.sides
        # calculate the semi-perimeter
        s = (a + b + c) / 2
        area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
        print('The area of the triangle is %0.2f' %area)

In [None]:
isosc = Triangle()

isosc.n_sides

In [None]:
# This will throw an error!

isosc.findArea()

In [None]:
isosc.sides = [2, 2, 2]

isosc.findArea()

### Exercise

Use inheritance together with `StandardScaler` to create your own scaler that includes, as an attribute, a list of the largest and smallest z-scores for each attribute to which the scaler has been fitted.

**Test**: After you run `fit_extra()` on the $X$ defined below and then fetch the extremes attribute, your array values should match these:

| Feature 1 | Feature 2|
| - | - |
|-1.2068162135708724 | 1.6237142612014306| <br/>
 |-1.1298429699595565 | 1.6037759986784352 |

In [None]:
X, y = make_regression(n_features=2, n_samples=5, random_state=42)

In [None]:
class MyScaler(StandardScaler):
    
    def __init__(self):
        super().__init__()
    
    def fit_extra(self, X):
        
        # ???
        
        self.fit(X)
        return self

In [None]:
new = MyScaler()

In [None]:
new.fit_extra(X)

In [None]:
new.extremes

In [None]:
new.transform(X)

<details><summary>
    Answer code here
    </summary>
    <code>self.extremes = [((min(feat)-feat.mean()) / feat.std(),
        (max(feat)-feat.mean()) / feat.std()) for feat in X.T]</code>
    </details>

## Important data science tools through the lens of objects: 

We are becomming more and more familiar with a series of methods with names such as `fit()` and `fit_transform()`.

After instantiating an instance of a `StandardScaler`, `LinearRegression`, or `OneHotEncoder`, we use `fit()` to learn about the dataset and save what is learned. What is learned is saved as attributes.

### `StandardScaler `

The `StandardScaler` takes a series and, for each element, computes the difference between the element and the mean of the series, and then divides by the standard deviation.

$\Large z = \frac{x - \mu}{s}$

What attributes and methods are available for a Standard Scaler object? Let's go back to the code on [GitHub](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/_data.py). In many typical cases the `.fit()` method relies on the [`_incremental_mean_and_var()` function](https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/extmath.py/).

#### Attributes

##### `.scale_`

In [None]:
# Instantiate a standard scaler object
greg = StandardScaler()

# We can instantiate as many scaler objects as we want
max_ = StandardScaler()

In [None]:
greg == max_

In [None]:
# Let's create a dataframe with two series

np.random.seed(42)
series_1 = np.random.normal(3, 1, 1000)

print(series_1.mean())
print(series_1.std())

When we fit the `StandardScaler`, it studies the object passed to it, and saves what is learned in its instance attributes.

In [None]:
greg.fit(series_1.reshape(-1,1))

# standard deviation is saved in the attribute "scale_"
greg.scale_

In [None]:
# mean is saved into the attribute "mean_"
greg.mean_

In [None]:
# Knowledge Check

# What value should I pass into the `transform()` method to
# get a return of 0?

greg.transform([])

In [None]:
# We can then use these attributes to transform objects

np.random.seed(42)
random_numbers = np.random.normal(3, 1, 2)
random_numbers

In [None]:
greg.transform(random_numbers.reshape(-1, 1))

In [None]:
# We can also use a scaler on a DataFrame

series_1 = np.random.normal(3, 1, 1000)
series_2 = np.random.uniform(0, 100, 1000)
df_2 = pd.DataFrame([series_1, series_2]).T
ss_df = StandardScaler()
ss_df.fit_transform(df_2)

In [None]:
ss_df.transform([[5, 50]])

### Task: One-hot Encoder

In [None]:
np.random.seed(42)
# Let's create a DataFrame that records a total number of orders
# by day of the week: 

days = np.random.choice(['m', 't', 'w', 'th', 'f', 's', 'su'], 1000)
orders = np.random.randint(0, 1000, 1000)

df = pd.DataFrame([days, orders]).T
df.columns = ['days', 'orders']
df.head()

In [None]:
df.shape

Let's interact with an important parameter we can pass when instantiating the `OneHotEncoder` object: `drop`.  

By dropping a column, we avoid the [dummy variable trap](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)).  

By passing `drop=True`, the encoder will drop the first category it happens upon.

If we want to drop a particular column, we can also pass that in.

In [None]:
# Instantiate a OneHotEncoder object

ohe = OneHotEncoder(drop=['m'])

In [None]:
ohe_matrix = ohe.fit_transform(df[['days']])

In [None]:
ohe_matrix

In [None]:
# Look at __dict__ and checkout drop_idx_.
# Did it do what you wanted it to do?

ohe.__dict__['drop_idx_']

In [None]:
# check out the categories_ attribute
ohe.categories_

In [None]:
# Check out the object itself
ohe_matrix

It is a sparse matrix, which is a matrix that is composed mostly of zeros

In [None]:
# We can convert it to an array like so
oh_df = pd.DataFrame.sparse.from_spmatrix(ohe_matrix)

In [None]:
# Now, using the categories_ attribute, set the column names
# to the correct days of the week.

ohe_columns = list(ohe.categories_[0])
ohe_columns.pop(int(ohe.drop_idx_))
oh_df.columns = ohe_columns
oh_df.head()

In [None]:
# Add the onehotencoded columns to the original df, and drop the days column

df = df.join(oh_df).drop('days', axis=1)
df.head()