# Building our Hypothesis Function

### Introduction

In this lesson we'll see how we can use our simple linear regression formula to build our hypothesis function.  To do this, we need to know the linear regression model's coefficient and y-intercept.  We'll also need a list of data to predict on.  We'll write this method and encapsulate the related data using object orientation.

### How SciKit Learn "Fits"

First, let's review the functionality of Sci-Kit Learn.

In [4]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

inputs = [800, 1500, 2000, 3500, 4000]
sklearn_inputs = [
     [800], 
    [1500], 
    [2000], 
    [3500], 
    [4000]
]

outcomes = [330, 780, 1130, 1310, 1780]
model.fit(sklearn_inputs, outcomes)

model.coef_
# .386

model.intercept_
# 153.26

model.predict(sklearn_inputs)
# 
# array([ 462.66593527,  733.39275919,  926.76906199, 1506.89797038,
#        1700.27427318])

array([ 462.66593527,  733.39275919,  926.76906199, 1506.89797038,
       1700.27427318])

Now that we remember how sklearn can predict labels for our outputs, it's time to write our own predict function.

### Creating our Hypothesis Class

In [22]:
class Hypothesis:
    def __init__(self, coef, intercept):
        self.coef = coef
        self.intercept = intercept
        
    def predict(self, data):
        return data.dot(self.coef) + intercept

We will create a class called Hypothesis which will be in charge of making our predictions.  The class should be initialized with values of `coef_` for the coefficient, `intercept_` for the `intercept`.

In [23]:
import numpy as np
coef = 0.39
intercept = 153


> We capture our various amounts to spend on advertising in a numpy array.

In [24]:
hyp = Hypothesis(coef, intercept)

In [25]:
hyp.__dict__
# {'coef_': 0.39, 'intercept_': 153}

{'coef': 0.39, 'intercept': 153}

To make predictions we pass through the array of ad_spends, and the predict function should return a prediction for each value.

In [26]:
ad_spends = np.array([800, 1500, 2000, 3500, 4000])
hyp.predict(ad_spends)
# array([ 465.,  738.,  933., 1518., 1713.])

array([ 465.,  738.,  933., 1518., 1713.])

### Matrix Vector Multiplication

Now let's say that our input data is in the form of the following matrix that represents the number  of bedrooms and squarefeet for an apartment, and our task is to predict the listing price.

In [28]:
#            # brs  sq feet
X = np.array([[3, 150], 
              [4, 400], 
              [2, 400], 
              [1, 200]
             ])

Our hypothesis function is the following:
    
$\text{listing_price} = 2*\text{bedrooms} + 3*\text{sq_feet} + 100$

Use matrix vector multiplication to calcuate the predictions of each observation.

In [29]:
import numpy as np
coef = np.array([2, 3])
bias = 100

In [30]:
X.dot(coef) + bias

# array([ 556, 1308, 1304,  702])

array([ 556, 1308, 1304,  702])

So we can see that we can write make predictions with the following procedure:

$\hat{y} = X \cdot \theta =  \begin{pmatrix}
    3 & 150 \\
    4 & 400\\
    2 & 400\\
    1 & 200
\end{pmatrix} \cdot \begin{pmatrix}
    2 \\ 3
\end{pmatrix} + \begin{pmatrix}
        100 \\
        100\\
        100\\
        100
\end{pmatrix} = \begin{pmatrix}
        556 \\
        1308\\
        1304\\
        702
\end{pmatrix}$

Note that another way to perform matrix vector multiplication is to take the dot product of each row.

In [13]:
first_row = X[0, :]


In [14]:
first_row.dot(coef) + bias

556

### Summary

In this lesson we saw how we can use our simple linear regression formula to write a predict method, and add the method to Hypothesis instances.  To do this, we needed to know the linear regression model's coefficient and y-intercept.  We also need a list of data to predict on.  