# The Machine Learning Process

### Introduction

Ok, let's dive into machine learning feet first.

Let's imagine we are in the business of selling real estate.  We have a list of prospective clients and we want to identify the ones who are most likely to purchase. 

<img src="./leads.jpg" width="40%">

## Our Process

### 1. Get the Training Data

We look to our past data to try to try to predict if a prospective client is likely to purchase. Here are prospective clients, or *leads*, that we tried to sell to in the past.

| Attended College | Under Thirty | Borough   | Income | Customer |
| ---------------- | ------------ | --------- | ------ | :------: |
| ?                | Yes          | Manhattan | < 55   |    0     |
| Yes              | Yes          | Brooklyn  | < 55   |    0     |
| ?                | No           | Brooklyn  | < 55   |    1     |

Each row of our data represents the characteristics of a different lead.  

Let's focus in on the first row in the table above.  We see that:
1. We do not know if the lead attended college
2. The lead is under 30
3. From Manhattan and
4. Makes under 55k
5. And did not become a customer.

> The number `0` means that the lead did not become a customer.  The `1` means that she did.

We call each row of data an **observation** and the entire set of data above our **training data**.

The idea is to use these past observations to come up with a formula that will come close to predicting our leads will turn into customers, and which will not.  

We'll eventually use more than three observations, but this is a fine place to start. 

> The first step in machine learning is to collect our **training data**.  Notice that for each **observation** in our training data, we have both the past inputs, as well as the past outcome that occurred.

### 2. Train an algorithm

Let's take another look at our data. 

| Attended College | Under Thirty | Borough   | Income | Customer |
| ---------------- | ------------ | --------- | ------ | :------: |
| ?                | Yes          | Manhattan | < 55   |    0     |
| Yes              | Yes          | Brooklyn  | < 55   |    0     |
| ?                | No           | Brooklyn  | < 55   |    1     |

Ok, now really imagine that you are a sales person and want to use past data to identify your ideal customers.  One approach might be to look at the past data, and then go through each column to see if there's a characteristic that distinguishes customers from non-customers.

Do you see any?  

Well `Under 30` appears to do the best.  Our first two leads who were under 30 did not become a customers, and our one lead over 30 did become a customer.

> We see this if we split our leads into two piles based on whether or not they are `Under 30`.  

<img src="./leads-under.png" width="30%">

> Notice that splitting based on age perfectly splits our data between 1s and 0s -- customers and non-customers.

Believe it or not, this is how a decision tree is trained.  It looks for the features that best split the data.  We'll learn more details about this later.

The broader point is that we just trained our machine learning algorithm.

> In **training** our machine learning algorithm looks at past observations to discover a pattern that can be used to predict the outcomes of these observations.

### 3. Find a hypothesis function

Now that we have looked to our past data to find a pattern, we can use this past pattern to make predictions on our data.  This is our formula:

Take a lead, and predict that: 
* If he is under 30, he *will* become a customer
* If he is over 30, he *will not* become a customer 

Or if you prefer a diagram:

<img src="./under-30.png" width="30%">

That works.

This is called the hypothesis function of machine learning algorithm.

> Our **hypothesis function** takes in our observations and comes close to predicting their output.

While we can try our hypothesis function on our past data, to see how it does.  Note that we already *know* if these leads in our training data became customers or not.  It's right there in our chart: 

| Attended College | Under Thirty | Borough   | Income | Customer |
| ---------------- | ------------ | --------- | ------ | :------: |
| ?                | Yes          | Manhattan | < 55   |    0     |
| Yes              | Yes          | Brooklyn  | < 55   |    0     |
| ?                | No           | Brooklyn  | < 55   |    1     |

The point is to be able to take new data, where we do not know the outcome, but can use information about our lead to determine if he will become a customer.  For example, let's say two new leads just came in:

| Attended College | Under Thirty | Borough   | Income |
| ---------------- | ------------ | --------- | ------ |
| No                | Yes          | Brooklyn | > 55   |
| Yes              | No          | Brooklyn  | > 55   |

We would feed this through our hypothesis function, and predict that the first lead will become a customer (as she is under 30), and that the second lead would not, as he is not under 30.

<img src="./under-30.png" width="30%">

### Wrapping Up

So can we really begin to make prediction based on just three past observations?  Well, no. 

But the process will stay the same.  And notice that this process is similar to what we as humans would do:

Want to predict future outcomes?  Well, look to the past observations to find the attributes that are associated with one outcome versus another.  And then use those attributes to predict future outcomes.

Here, this was **gathering our training data**, **training our algorithm**, and  

### Wrapping Up

Let's get a couple things out of the way:

1. Algorithm vs Model

* We'll call a machine learning **algorithm** the combination of the training procedure and hypothesis function that predicts future outcomes.
* A specific *example* of this algorithm, trained on specific data, that has a specific hypothesis function is a machine learning **model**.  

So all decision trees will follow a training procedure of finding features that best separate the data.  But our specific *model* was trained on the three observations above and found the hypothesis function below:

<img src="./under-30.png" width="30%">

2. Our training data

Remember that the general idea in machine learning is to start with training data, and use past observations to find a pattern that will allow us to predict future outcomes.

| Attended College | Under Thirty | Borough   | Income | Customer |
| ---------------- | ------------ | --------- | ------ | :------: |
| ?                | Yes          | Manhattan | < 55   |    0     |
| Yes              | Yes          | Brooklyn  | < 55   |    0     |
| ?                | No           | Brooklyn  | < 55   |    1     |


Notice that we seem to be distinguishing our inputs (the first four columns) from the output of whether someone became a customer.  This is typical in machine learning.  We call the input columns our **features** and the output we are trying to predict our **target**. 

Once we have our training data, we then can train a decision tree model by finding the feature that most splits the data, and use that to arrive at a hypothesis function.

### Summary

In this lesson, we learned the general process for training a machine learning algorithm.  We begin with our training data which is a collection of observations.  Each observation consists of one or more features, and an associated outcome called a target.  

We train our machine learning model by seeing what features are associated with what target values.  In a decision tree, we do this by seeing which features can most separate our target values. 

<img src="./leads-under.png" width="30%">

Then through training our model trains it's hypothesis function, which allows it to predict the target values of future data.