<a href="https://colab.research.google.com/github/ludawg44/jigsawlabs/blob/master/14Apr20_Machine%20Learning%20with%20a%20Library.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning with a Library

### Introduction

In the last number of lessons, we have been built a simple linear regression machine learning.  We did so using the three components of any machine learning algorithm: first by building our hypothesis function, thena cost function to evaluate the hypothesis, and then fitting our model to our data.  Let's reveiew.

1. Our **hypothesis function** was simply a line, or a function, that given an input predicted an output.  In our example of Tshirt sales, given an advertising budget, the function predicted a number of sales.  

2. We **fit** the model by comparing our model against the actual data.  We do this by calculating the difference between our actual data and the value that our model predicts -- this difference is called the error.  Then we square each of those errors and add up the these squared errors.  

3. Now we can **predict** new outputs with our fitted model.  For example, we could predict sales amounts for ad budgets that we never saw before.

### Using a machine learning library

Now wouldn't it be nice if, instead of writing these algorithms from scratch, we could use a tool to do these for us?  Well we can.

Scikit learn is an excellent tool for running machine learning algorithms.  Let's get going.

## Going through our three steps

With the `sklearn` library we can follow our three step process of (1) creating an initial model, (2) fitting the model and (3) then using the parameters of the model to make new predictions.

### 1. Creating an initial model

Now when working with scikitlearn we create an initial model by using the `LinearRegression` function from the scikitlearn library.

So first we import the `LinearRegression` function.

In [0]:
from sklearn.linear_model import LinearRegression

And now we can create our initial model.

In [0]:
model = LinearRegression()

This model is simply an instance of the linear regression class.  Right now, it has no knowledge of our past data.

In [0]:
model

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

So we were able to build an initial linear regression model two lines:

```python
from sklearn.linear_model import LinearRegression
linear_regression = LinearRegression()
```

And now that we have an initial model, we are ready to move onto step 2: fitting our model to the data.

### 2. Fit the model with the actual data

Now that we have initialized our model with the code `linear_regression = LinearRegression()`, it is now time to pass through some data into this model so what we can fit the model.  In the last lesson, we saw how we can fit the model by choosing parameters that minimize the error of our function.

Here we do the same thing, and we do so by passing the data into our model, so that the linear regression model can then adjust the parameters to fit to this data.

| temperature        | actual customers           
| ------------- |:-------------:| 
|    80       | 120 | 
|    65        | 100 | 
|    50        | 85 | 
|    70        | 100 | 
|    60        | 90 | 


Now remember that the temperatures are the inputs and that each temperature is used to explain the target value of cutomers.  Ok, so we may like to simply pass through these inputs and outputs as two lists to our model.  And out model has a `fit` method to do precisely that.

In [0]:
temperatures = [80, 65, 50, 70, 60]
amounts = [120, 100, 85, 100, 90]

However `scikitlearn` requires our input data to be in a specific format.  It wants us to organize the data associated with our features like so:

In [0]:
inputs = [[80], [65], [50], [70], [60]]

> The other way to accomplish this is with numpy.

In [0]:
import numpy as np
temperatures = np.array([80, 65, 50, 70, 60])
nested_temps = temperatures.reshape(5, -1)
nested_temps

array([[80],
       [65],
       [50],
       [70],
       [60]])

> We'll explain why our input data needs to be in this format in a couple of lessons.  For now, let's just go with it.

The output data can stay the same.

In [0]:
customers = [120, 100, 85, 100, 90]

Ok, now that we know the format for our data, the next thing to do is to fit our linear model to the data.  We do this by using the `fit` method on our linear model and passing through the data in the proper format.

In [0]:
# nested list for the inputs
inputs = [
    [80], 
    [65], 
    [50], 
    [70], 
    [60]
]

# single list for the outputs
outputs = [120, 100, 85, 100, 90]

model.fit(inputs, outputs)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

### 3. Viewing our trained model

Believe it or not with that line there at the end, we have fit our model to our data.  Here we'll prove it.  Remember in our simple linear regression model we have two components: our intercept and our coefficient.  

Let's start with our intercept.

In [0]:
model.intercept_
# 24.25

24.25000000000003

Now for our coefficient.

In [0]:
model.coef_


array([1.15])

Remember that these numbers fit into our general formula of $y = mx + b$ where our $m$ is our coefficient and $b$ is our intercept.  So really what we learned is that when we fit a model to our data, we found a best fit line of $y = 1.15x + 24.5 $. 

Let's plot this our data and our line as we did previously.

In [0]:
import plotly.graph_objects as go

scatter_trace = go.Scatter(x = temperatures, y = outputs, name = 'observations', mode = 'markers')
model_trace = go.Scatter(x = temperatures, y = 1.15*temperatures + 24.5,
                         name = 'expected', mode = 'lines')

layout = {'yaxis': {'title': 'customers'}, 'xaxis': {'title': 'temperature'}}
go.Figure(data = [scatter_trace, model_trace], layout = layout)

It looks like our model did a pretty good job.

### 4. Predicting new distances

Now to plot the line above, what we did was to plug in different values into our formula, and then plot this as a line.

In [0]:
predicted_amounts = 1.15*temperatures + 24.5

predicted_amounts
# array([116.5 ,  99.25,  82.  , 105.  ,  93.5 ])


array([116.5 ,  99.25,  82.  , 105.  ,  93.5 ])

Not bad at all.  Of course, `scikitlearn` has a built in method that allows us to see the outputs of our model.  We can pass through the temperatures we would like predictions for.  Once again we use a nested list for our inputs.

In [0]:
inputs = [
    [80],
    [65],
    [50],
    [70],
    [60]
]

model.predict(inputs)

array([116.25,  99.  ,  81.75, 104.75,  93.25])

And notice that our predictions match what we calculated by hand above.  And we can pass through whichever inputs we like and our trained model will predict the customers for us.  For example, say we see the forecase for next week has temperatures of 92, 87 and 89 degrees.  Let's see how many customers we can expect.

In [0]:
inputs = [
    [92],
    [87],
    [89]
]

model.predict(inputs)

array([130.05, 124.3 , 126.6 ])

### Summary

In this lesson, we saw how to use the scikitlearn library to fit a machine learning model and make new predictions with our fitted model. 



We do so using similar steps to what we saw in our introduction to machine learning lesson.

1. Create an initial model
2. Fit the model to data
3. Use the fitted model to make new predictions

We can translate these steps into code with the following:

In [0]:
# import libraries
import sklearn
from sklearn.linear_model import LinearRegression

# 1. Create an initial model
linear_regression = LinearRegression()

# 2. Fit the model to data
linear_regression.fit(inputs, outputs)

# 3. Use the fitted model to make new predictions
linear_regression.predict(inputs)

array([116.25,  99.  ,  81.75, 104.75,  93.25])

The other thing to remember is that if we want to see the numbers behind these new predictions, we can see them by calling the corresponding methods.

In [0]:
linear_regression.coef_

array([1.15])

In [0]:
linear_regression.intercept_

24.25