## Machine Learning!

Now that we've done our EDA we can build a model.  But wait...what's a model. And what's machine learning?  






## Inference 

So far we've been doing "Descriptive Statistics" which is just statistics that describes the data.  Now we're going to start doing some inferential statistics...  We will infer or predict something given the data.


## Model?
A model is a just a mathematical function that describes the relationship between variables.  In this case we will create a model that describes the relationship between price and all those other variables...

Imagine this model...

$$price = bedrooms* beta_1$$

This model would describe a very simple relationship between bedrooms and price where we'd just multiple the number of bedrooms by a coefficient $beta_1$ and arrive at price


## Ok, but that's not a very good model!

It really isn't.  The technical term is 'underspecified.'  Lets build this one instead...

$$price = bedrooms* beta_1 + baths_full * beta_2 + baths_half * beta_3 + garage * beta_4 + sqft * beta_6 $$

This model would be more fully specified.

## But how do you know what $beta_n$ is?

Those beta coefficients are *learnable* parameters.  This is the learning part of machine learning.  Machine Learning is a subset of AI where we construct models and then use data to teach the machine model coefficients (we say parameters).  

## Ok, fine, but how does a machine actually learn?

* Step 1 = Initialize $beta_n$ to random values

* Step 2 = measure how wrong the model is for each observation
    * This is called a loss function
    * $(\hat{y} - y)^2$   the squared difference between the predicted price and actual
    * So the average loss is called the 'cost function' and it looks like this:
        * $$J = \frac{1}{2m} \sum_{i=1}^{m}( \hat{y} - y)^2 $$
        
* Step 3 = Now that we know how wrong we are, we can nudge $beta_n$ in the correct direction
    * The correct direction is the slope of the cost function w.r.t J
    * We call this the gradient...



## How does a machine actually learn (part 2)

So then we do this thing called gradient descent.

Repeat Until Converged: {
$$\beta = \beta-\alpha\frac{\partial}{\partial\beta}J(\beta)$$
}

So, every time we update theta, we will set theta equal to the previous value minus $\alpha$ (a learning rate) multiplied by the partial derivative of the cost function J, with respect to $\theta$

## If you don't understand this math that's ok, here's what you should remember...
1.  There is a point to calculus.
2.  We start with random betas and then we update them over and over again until we make the cost function as small as possible.

# So lets build a model already!

Overall plan:
* Load our data
* Test/Train Split
* Train our model
* Measure it's goodness

In [12]:
import pandas as pd
# load our data
df = pd.read_csv("../data/sfh_house_data.csv", index_col=0)
df = df[["baths_full", "baths_half", "bedrooms","garage", "sqft", "price"]]

In [13]:
# Train / Test Split
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


y = df['price']
X = df.drop('price', axis=1)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2)
print("X_train =",X_train.shape)
print("y_train =",y_train.shape)
print("X_test =",X_test.shape)
print("y_test =",y_test.shape)

X_train = (803, 5)
y_train = (803,)
X_test = (201, 5)
y_test = (201,)


In [14]:
# Fitting the model
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

## this part just prints the model, it's complicated you can ignore it.
model_coefs = zip(X.columns, model.coef_)
model_string = "price = "
for coef in model_coefs:
    model_string = model_string + str(coef[0]) + "*" + str(coef[1]) + " + "
model_string = model_string + str(model.intercept_)
print(model_string)

price = baths_full*21486.3693338 + baths_half*6164.1007002 + bedrooms*6546.61839286 + garage*17800.6055221 + sqft*76135.7463411 + 221388.590509


In [15]:
# Lets measure how good the model is, using the test set
from sklearn.metrics import mean_absolute_error
y_hat = model.predict(X_test)
print('${:,.2f}'.format(mean_absolute_error(y_test, y_hat)))

$41,087.00
