In [1]:
import numpy as np

<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 15px;">

## Object Oriented Programming

Week 6 | 1.2

---

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Understand the structure of a class in python
- Describe the difference between attributes and methods


## Classes and Objects

Python is an object oriented programming language.  It is based around the notion of objects.

A class is a type of object. You can think of a class definition as a sort of "blueprint" that specifies the construction of a new object when instantiated.

> We have encountered these already.  Can you name a few examples? Where have we seen them?



## Everywhere

## Simplified Introduction

Classes can have characteristics / data attached to them (typically referred to as *attributes*) and procedures that define actions (typically referred to as *methods*).  Simplified, attributes can be thought of as information, and methods as functions.

> Can you name an example of an attribute / method you've seen?

## Creating a Class for DSI Student

In [2]:
class dsi_student(object):
    def __init__(self, name, cohort, mood):
        self.name = name
        self.cohort = cohort
        self.mood = mood
        

> What are methods / attributes here?  What is init?

What are the components of this?

**`class`**

- The `class` is like `def`, but instead of defining a function it defines a class.

**`object`**

- `object` in the parentheses of the class definition indicates that this class "inherits" from the `object` class. The object class is a very general, very fundamental class in python. Inheritance means that whatever properties and function are part of the `object` class are passed down to our `dsi_student` class.


**`def __init__(self)`**

- The `def __init__(self):` is our class's initialization function. This function is called when you instantiate the class by typing `dsi_student()`

**`self`**

- `self` is the (confusing) first argument to class definitions. It is a variable that refers to the **current instantiation of the class**. What does this mean? When you instantiate a class and assign it to a variable with `student = dsi_student()`, the `self` argument is now a reference to the current instantiation of the class `student`. 

**class attributes**

- `self.name_`, `self.cohort_`, and `self.mood`  are "attributes" (variables) that are connected to the instantiation of the class. 

## Let's create a student

In [3]:
mike = dsi_student("Michael Roman", "Hoppers", "exuberant")

In [4]:
print mike.name
print mike.cohort
print mike.mood

Michael Roman
Hoppers
exuberant


## We have three attributes.  Let's create a method
Let's say something upsets Mike and changes his mood.  We need a way to change Mike's mood.





In [5]:
class dsi_student(object):
    def __init__(self, name, cohort, mood):
        self.name = name
        self.cohort = cohort
        self.mood = mood
        
    def change_mood(self, new_mood):
        self.mood = new_mood

In [6]:
mike = dsi_student("Michael Roman", "Hoppers", "exuberant")
print mike.mood
mike.change_mood("melancholy")
print mike.mood

exuberant
melancholy


We can use methods to change attributes, and do many more things!

## A slightly more complex example...

[soccer / football](./code/oop-soccer.ipynb)

## Now we're going to build a simple linear regression class, similar to what's in scikit-learn

## A quick dive into linear algebra...


### Deriving the least squares solution to linear regression

With target vector $y$ and prediction matrix $X$, we can formulate a regression as:

### $$ y = \beta X + \epsilon $$

Where $\beta$ is our vector of coefficients and $\epsilon$ is our vector of errors, or residuals.

We can equivalently formulate this as a calculation of the residuals:

### $$ \epsilon = \beta X - y $$

Our goal is to minimize the sum of squared residuals. The sum of squared residuals is equivalent to the dot product of the vector of residuals:

### $$ \sum_{i=1}^n \epsilon_i^2 = 
\left[\begin{array}{cc}
\epsilon_1 \cdots \epsilon_n
\end{array}\right] 
\left[\begin{array}{cc}
\epsilon_1 \\ \cdots \\ \epsilon_n
\end{array}\right] = \epsilon' \epsilon
$$



Therefore we can write the sum of squared residuals as:

### $$ \epsilon' \epsilon = (\beta X - y)' (\beta X - y) $$

Which becomes:

### $$ \epsilon' \epsilon = y'y - y'X\beta - \beta' X' y + \beta' X' X \beta $$

Now take the derivative with respect to $\beta$:

### $$ \frac{\partial \epsilon' \epsilon}{\partial \beta} = 
-2X'y + 2X'X\beta$$

We want to minimize the sum of squared errors, and so we set the derivative to zero and solve for the beta coefficient vector:

### $$ 0 = -2X'y + 2X'X\beta \\
X'X\beta = X'y \\
\beta = (X'X)^{-1}X'y$$

## The starting point

In [7]:
class SimpleLinearRegression(object):
    
    def __init__(self):
        self.coef_ = None
        self.intercept_ = None

### 2. Adding a class function

Now, just like with `__init__`, we can add functions to the class.

Let's add a `calculate_betas()` method that will calculate the coefficients for a linear regression.

In [8]:
class SimpleLinearRegression(object):
    
    def __init__(self):
        self.coef_ = None
        self.intercept_ = None
        
    def fit(self, X, y):
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        self.coef_ = np.dot(XtX_inv_Xt, y)

Notice that we assigned `self.coef_` inside of the `calculate_betas()` function.
This will set the class attribute `self.coef_`, and this attribute can be accessed by any other function in the class without passing it as an argument!
It can also be accessed by you after instantiating the class.

## 3. Assigning attributes during instantiation
There is an issue here - we may pass an X matrix in without an intercept.
Add a keyword argument to the __init__ function which will specify whether the X matrix should have an intercept added or not.

In [9]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def fit(self, X, y):
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        self.coef_ = np.dot(XtX_inv_Xt, y)

Now, if we instantiate the class, it will assign `fit_intercept` to the class attribute `fit_intercept`, like so:

In [10]:
slr = SimpleLinearRegression(fit_intercept=True)
slr.fit_intercept

True

In [11]:
slr = SimpleLinearRegression(fit_intercept=False)
slr.fit_intercept

False

### 4. Add a function to add an intercept to the `X` matrix if necessary

This function will be called from inside the `fit` function.

In [12]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        X = np.concatenate([intercept, X], axis=1)
        return X
        
    def fit(self, X, y):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        betas = np.dot(XtX_inv_Xt, y)
        
        self.coef_ = betas[1:]
        self.intercept_ = betas[0]

### 5. Try out the class...

Let's instantiate the class and try out the coefficient fitting function on the housing data.

In [13]:
import pandas as pd
house = pd.read_csv("housing-data.csv")
y = house.price.values
X = house[['sqft','bdrms','age']].values

In [14]:
slr = SimpleLinearRegression(fit_intercept=True)
print slr.fit_intercept
print slr.coef_
print slr.intercept_

True
None
None


In [15]:
slr.fit(X, y)
print slr.coef_
print slr.intercept_

[  139.33484671 -8621.47045953   -81.21787764]
92451.6278416


### 6. Add the `predict` function.

Let's another one of the class methods that are in the real `LinearRegression` class.

Let's add the `predict` function. It will take a design matrix `X` and return predictions for those rows.

In [16]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        X = np.concatenate([intercept, X], axis=1)
        return X
        
    def fit(self, X, y):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        betas = np.dot(XtX_inv_Xt, y)
        
        self.coef_ = betas[1:]
        self.intercept_ = betas[0]
        
    def predict(self, X):
        if self.fit_intercept:
            X = self.add_intercept(X)
            
        return np.dot(X, np.concatenate([[self.intercept_], self.coef_]))

In [17]:
slr = SimpleLinearRegression(fit_intercept=True)
slr.fit(X,y)
y_hat = slr.predict(X)

In [18]:
y_hat

array([ 354062.48251176,  287248.87062952,  397417.26195747,
        268527.1538634 ,  469878.94531868,  329591.12619219,
        279352.25678876,  260788.62369656,  257732.25463971,
        273535.20928732,  327706.82348284,  343064.02719228,
        326275.2722563 ,  669306.04311939,  238553.16519158,
        372182.11686444,  254095.17616961,  232470.09254391,
        421084.27168899,  478584.09095951,  309218.30398827,
        331856.66518255,  289024.47818102,  327036.16773894,
        605675.92658069,  214982.47518854,  267382.10451866,
        417491.20685022,  370849.7786572 ,  431982.76030362,
        328196.7549217 ,  222758.9147067 ,  336117.4924744 ,
        498239.03279902,  308351.92433696,  262750.49730719,
        237436.29823206,  352753.5386212 ,  639901.74497352,
        355715.31585793,  303813.15674696,  375413.5419335 ,
        411008.87848962,  227616.47381753,  188236.72488687,
        310815.93794646,  233313.64040447])

## Compare to scikit-learn

In [19]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X,y)
lr.predict(X)

array([ 354062.48251176,  287248.87062952,  397417.26195747,
        268527.15386339,  469878.94531868,  329591.12619219,
        279352.25678876,  260788.62369656,  257732.25463971,
        273535.20928732,  327706.82348284,  343064.02719228,
        326275.2722563 ,  669306.04311939,  238553.16519158,
        372182.11686444,  254095.17616961,  232470.09254391,
        421084.27168899,  478584.0909595 ,  309218.30398827,
        331856.66518254,  289024.47818102,  327036.16773894,
        605675.92658069,  214982.47518854,  267382.10451866,
        417491.20685022,  370849.7786572 ,  431982.76030362,
        328196.7549217 ,  222758.9147067 ,  336117.4924744 ,
        498239.03279901,  308351.92433696,  262750.49730719,
        237436.29823206,  352753.5386212 ,  639901.74497352,
        355715.31585793,  303813.15674696,  375413.5419335 ,
        411008.87848962,  227616.47381753,  188236.72488687,
        310815.93794646,  233313.64040447])

## Conclusion
In this class we looked at the basic structure of classes in python, and explained the difference between attributes and methods.  We saw that attributes can be set at instantiation, or set by methods within the class.



## Independent work:

Now your turn!

Take your pick of the two scenarios below.

**Option 1. Design a bank system**.  A bank (one class) can have multiple accounts (also a class), each with a balance and account holder.  
    - Write methods to deposit, withdraw and transfer money between accounts.
    - Consider what attributes the bank and accounts should hae
    - What restrictions should you create on your methods?
    


** Option 2. (harder) Write a tic-tac-toe game!** 
Things to think about:
    - Initialize the board as an attribute variable: self.board
    - Here are some methods you make want to write: turn(), play(), update_board(), check_win()
    - Consider creating booleans self.over and self.win to check if the game is over after each move
    - Start by having the computer make random moves. Then, once the game mechanics work, start to improve the AI. See if you can have it block losses. Then, have it check where it can win.