# The Machine Learning Process

### Introduction

Ok, let's dive into machine learning feet first.  In this lesson we'll get an overview of the machine learning process.  

> To be specific, we'll be covering the machine learning process for something called **supervised learning**, but let's not worry about that for now.  Supervised learning is the dominant form of machine learning, so let's get started with that.

To start off, let's imagine we are in the business of selling real estate.  We have a list of prospective clients and we want to identify the ones who are most likely to purchase. 

<img src="https://storage.googleapis.com/curriculum-assets/intro-to-ml/leads.jpg" width="40%">

## Our Process

### 1. Get the Training Data

The first step is to look to our past data to try to discover what makes a prospective client likely to purchase. Here are prospective clients, or *leads*, that we tried to sell to in the past.

| Attended College | Under Thirty | Borough   | Income | Customer |
| ---------------- | ------------ | --------- | ------ | :------: |
| ?                | Yes          | Manhattan | < 55   |    0     |
| Yes              | Yes          | Brooklyn  | < 55   |    0     |
| ?                | No           | Brooklyn  | < 55   |    1     |

As we can see, each row of our data represents the characteristics of a different lead.  

Let's zoom in on the first row in the table above.  We see that:

1. Whether the lead attended college is unknown
2. The lead is under 30
3. From Manhattan and
4. Makes under 55k
5. And did not become a customer

> The number `0` means that the lead did not become a customer.  The `1` means that she did.

We call each row of data an **observation** and the entire set of data above our **training data**.

The idea is to use these past observations to find a formula that will come close to predicting which future leads will become customers, and which will not.  

We'll eventually use more than three observations, but this is a fine place to start. 

> The first step in machine learning is to collect our **training data**.  Notice that for each **observation** in our training data, we have both the past inputs, as well as the past outcome that occurred.

### 2. Train an algorithm

Let's take another look at our data. 

| Attended College | Under Thirty | Borough   | Income | Customer |
| ---------------- | ------------ | --------- | ------ | :------: |
| ?                | Yes          | Manhattan | < 55   |    0     |
| Yes              | Yes          | Brooklyn  | < 55   |    0     |
| ?                | No           | Brooklyn  | < 55   |    1     |

Ok, so now really let's really imagine we are a sales person and we want to use past data to identify our ideal customers.  How would we do it?  One approach might be to look at the past data, and then go through each column to see if there's a characteristic that distinguishes customers from non-customers.

> Do you see any?  

Well `Under 30` appears to do the best.  Our first two leads who were under 30 *did not* become a customers, and our one lead over 30 *did* become a customer.

Believe it or not, this is how a decision tree is trained.  It starts by looking for the columns that best split the data.  But we'll come back to that later.

The broader point for now is that we saw how we can *train* a machine learning algorithm.

> In **training** our machine learning algorithm looks at past observations to discover a pattern that can be used to predict the outcomes of these observations.  It can then apply that same logic to predict the outcomes of future data.

### 3. Find a hypothesis function

Now that we have looked to our past data to find a pattern, we can use what we learned to make predictions on data where we do not know the outcome.  This is our formula:

Take a lead, and predict that: 
* If he is under 30, he *will not* become a customer
* If he is over 30, he *will* become a customer 

Or if you prefer a diagram:

<img src="https://storage.googleapis.com/curriculum-assets/intro-to-ml/customer-dtree.png" width="30%">

That works.

This is called the hypothesis function of machine learning algorithm.

> Our **hypothesis function** takes in our observations and comes close to predicting their output.

It's fine to try out our hypothesis function on our training data, to make sure that the predictions are matching up to what we observed.  But the real power of a hypothesis function is in making predictions on new data where we don't know the outcome, but can now predict it:

For example, let's say two new leads just came in:

| Attended College | Under Thirty | Borough   | Income |
| ---------------- | ------------ | --------- | ------ |
| No                | No          | Brooklyn | > 55   |
| Yes              | Yes          | Brooklyn  | > 55   |

We would feed this through our hypothesis function, and predict that the first lead will become a customer (as she is over 30), and that the second lead would not, as he is under 30.

<img src="https://storage.cloud.google.com/curriculum-assets/intro-to-ml/customer-dtree.png" width="40%">

So can we really begin to make prediction based on just three past observations?  Well, no.  

But as we move to larger datasets, and as you learn more machine learning algorithms, the general process will stay the same:

> Start with training data, and use past observations to find a pattern that will allow us to predict future outcomes.

### One more thing

Let's take another look at our training data

| Attended College | Under Thirty | Borough   | Income | Customer |
| ---------------- | ------------ | --------- | ------ | :------: |
| ?                | Yes          | Manhattan | < 55   |    0     |
| Yes              | Yes          | Brooklyn  | < 55   |    0     |


Notice that we seem to be distinguishing our inputs (the first four columns) from the output of whether someone became a customer.  This is typical in machine learning.  We call the input columns our **features** and the output we are trying to predict our **target**. 

### Summary

In this lesson, we were introduced to general process in machine learning.  

We start with training data, and use past observations to find a pattern that will allow us to predict the outcomes of future data.  Let's review:

1. Start with training data  
    * Our training data consists of a set of observations.
    * In each **observation** of our training data, we have both the past inputs, as well as the past outcome that occurred.  We call the inputs the **features** and the output the **target**.  Above, the `Customer` column is the target.
    
2. Training
    * Here, we look at past observations to discover a hypothesis function that we can use to predict the targets.
    
3. Our hypothesis function
    * Takes in the features of observations and comes close to predicting their target.

Above, we saw how we could train a decision tree by looking the training data of customer leads, and discovering that the feature of under 30 was most associated with a lead becoming a customer or not. We then used that discovery to form a hypothesis function that predicted that each new lead that is under 30 would not become a customer and those over 30 would become a customer. 

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="https://storage.cloud.google.com/curriculum-assets/curriculum-assets.nosync/mom-files/jigsaw-labs.png" width="15%" style="text-align: center"></a>
</center>