<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 15px;">

## Linear Regression and Object Oriented Programming

Week 3 | 4.1


---

### Deriving the least squares solution to linear regression

With target vector $y$ and prediction matrix $X$, we can formulate a regression as:

### $$ y = \beta X + \epsilon $$

Where $\beta$ is our vector of coefficients and $\epsilon$ is our vector of errors, or residuals.

We can equivalently formulate this as a calculation of the residuals:

### $$ \epsilon = y - \beta X$$

Our goal is to minimize the sum of squared residuals. The sum of squared residuals is equivalent to the dot product of the vector of residuals:

### $$ \sum_{i=1}^n \epsilon_i^2 = 
\left[\begin{array}{cc}
\epsilon_1 \cdots \epsilon_n
\end{array}\right] 
\left[\begin{array}{cc}
\epsilon_1 \\ \cdots \\ \epsilon_n
\end{array}\right] = \epsilon' \epsilon
$$

Therefore we can write the sum of squared residuals as:

### $$ \epsilon' \epsilon = (y - \beta X)' (y - \beta X) $$

Which becomes:

### $$ \epsilon' \epsilon = y'y - y'X\beta - \beta' X' y + \beta' X' X \beta $$

Now take the derivative with respect to $\beta$:

### $$ \frac{\partial \epsilon' \epsilon}{\partial \beta} = 
-2X'y + 2X'X\beta$$

We want to minimize the sum of squared errors, and so we set the derivative to zero and solve for the beta coefficient vector:

### $$ 0 = -2X'y + 2X'X\beta \\
X'X\beta = X'y \\
\beta = (X'X)^{-1}X'y$$

In [None]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('darkgrid')
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [9]:
# load data
house = '~/DSI-SF-5/datasets/fast_food_chains/discrim.csv'
house = pd.read_csv(house)

---

## Classes and Objects

In python, everything is an "object" of a specific type. This is the basis of what is known as object oriented programming.

A class is a type of object. You can think of a class definition as a sort of "blueprint" that specifies the construction of a new object when instantiated.

Knowing how to define and use classes is esential to programming python at an intermediate or advanced level. 



---

### Coding a simple version of `LinearRegression`


In the afternoon lesson, you will be exposed to opensource implementations that are commonly used in industry. The [statsmodels](http://statsmodels.sourceforge.net/devel/index.html) and [sklearn](http://scikit-learn.org/stable/) packages. Sklearn is especially popular and so we will create our LR class by imitating how skearn structures the class for its own models. For instance, just as sklearn, we will include a 'fit' and 'predict' method in our class. 

Here is the [Sklearn documentation for Linear Regression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) provided
for your reference. Once again, we will go cover both of these packages in the afternoon lesson. 

We will walk through the creation of this class (a simplified version).


### 1. The class definition

Below is the beginning of our class blueprint:

In [2]:
class SimpleLinearRegression(object):
    
    def __init__(self):
        self.coef_ = None
        self.intercept_ = None

What are the components of this?

**`class`**

- The `class` is like `def`, but instead of defining a function it defines a class.

**`object`**

- `object` in the parentheses of the class definition indicate that this class "inherits" from the `object` class. The object class is a very general, very fundamental class in python. Inheritance means that whatever properties and function are part of the `object` class are passed down to our `SimpleLinearRegression` class.

**`def __init__(self)`**

- The `def __init__(self):` is our class's initialization function. This function is called when you instantiate the class by typing `SimpleLinearRegression()`

**`self`**

- `self` is the (confusing) first argument to class definitions. It is a variable that refers to the **current instantiation of the class**. What does this mean? When you instantiate a class and assign it to a variable with `slr = SimpleLinearRegression()`, the `self` argument is now a reference to the current instantiation of the class `slr`. Now, when you use a function that is part of the class, it knows to use that specific object's function. This lets you have multiple instantiations of a class with the same function name.

**class attributes**

- `self.coef_` and `self.intercept_`, likewise, are "attributes" (variables) that are connected to the instantiation of the class. When self becomes `slr`, for example, the `self` becomes `slr` and `self.coef_` becomes `slr.coef`

---

### 2. Adding a class function

Now, just like with `__init__`, we can add functions to the class.

Let's add a `calculate_betas()` method that will calculate the coefficients for a linear regression.

Notice that we assigned `self.coef_` inside of the `calculate_betas()` function.

This will set the class attribute `self.coef_`, and this attribute can be accessed by _any other function in the class without passing it as an argument!_

It can also be accessed by you after instantiating the class.

---

### 3. Assigning attributes during instantiation

There is an issue here - we may pass an `X` matrix in without an intercept. 

Add a keyword argument to the `__init__` function which will specify whether the `X` matrix should have an intercept added or not.

---

### 4. Add a function to add an intercept to the `X` matrix if necessary

This function will be called from inside the `fit` function.

---

### 5. Try out the class...

Let's instantiate the class and try out the coefficient fitting function on the housing data.

---

### 6. Add the `predict` function.

Let's add some more of the class methods that are in the real `LinearRegression` class.

First off add the `predict` function. It will take a design matrix `X` and return predictions for those rows.

---

### 7. Add a `score` method.

This will calculate the $R^2$ of your model on a provided `X` and `y`.

You'll probably need to write a helper function to calculate the sum of squared errors, since this will be run for both the baseline model and the regression model in order to calculate the $R^2$.

Check against sklearn's implementation:

In [3]:
from sklearn.linear_model import LinearRegression

---

### 8. Inspecting a class

When we want to know more about a class object, we can use the "inspect" module. Specifically the `inspect.getmembers()` function takes an instantiated class as an argument and returns an information dictionary.

This can be helpful to know what attributes and methods are avaiable and basically, the blueprint of a class object in memory.  Depending on the way the class was implemented, you can usually find useful information hiding inside of `slr.__class__.__dict__` -- which can be easier to look at.  The "right way" is to use the "inspect" module.

In [4]:
import inspect

---

### Special Class Methods

|Method| Description|
|--|--|
|\_\_init\_\_ ( self [,args...] )| Constructor (with any optional arguments) Sample Call : obj = className(args)
|\_\_del\_\_( self ) | Destructor, deletes an object Sample Call : del obj
|\_\_repr\_\_( self ) | Evaluatable string representation Sample Call : repr(obj)
|\_\_str\_\_( self ) | Printable string representation Sample Call : str(obj)
|\_\_cmp\_\_ ( self, x ) | Object comparison Sample Call : cmp(obj, x)

The `__repr__` function reports back something descriptive about what the class represents.  You can basically do whatever you want with it but the purpose of it is to convey something descirptive about your class.

The `__del__` method is the bookend function of `__init__`. You can use it to run code once your class is done executing.  

Generally it works well but in practice there are a few things watch out for.  Read more about [safely using Python destructors](http://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python)