# The Hypothesis Function

### Introduction

In the last few lessons, we saw the machine learning process by being introduced to decision trees.  We saw that our machine learning process was to gather our training data, train a model to find a hypothesis function, and then use that hypothesis function to make predictions.

Over the next lessons, we'll go deeper into learning about the hypothesis function and training procedure of decision trees.

> Why the focus on decision trees?  Well, decision trees make up some of the most popular and effective machine learning algorithms used today, like XGBoost and Random Forests, whatever that means.

## The Decision Tree's Hypothesis Function

Let's start by viewing an example of a hypothesis function for a decision tree.

<img src="decision-tree-real-estate.png" width="70%">

Notice that the hypothesis function of this decision tree is a little more complex than what we saw in our introduction to machine learning.  But the goal is the same.

> It takes in features of new leads and then predicts the whether or not our lead will become a customer.  

Let's make sure we understand how the hypothesis function above can make predictions.  

At each diamond, we ask a question, for example, did the lead attend college?  Then, based on the lead's answer to that question, we move down a branch of the tree.  

Let's try it out on a new lead to see how we can use it to make a prediction.  

| Attended College | Under Thirty | Borough   | Income |
| ---------------- | ------------ | --------- | ------ |
| ?                | No           | Manhattan | < 55   |

<img src="decision-tree-real-estate.png" width="50%">

Looking at our decision tree, starting from left to right, our decision tree first asks us to look at the lead's value of `college`.  Because in our lead above, college has a value of `?`, we follow the branch to the `under thirty` diamond.  This tells us to ask another question: Is the lead under 30?  And because our lead is not under thirty, we take the `No` branch, and predict that the lead will become a `customer`.

So the hypothesis function above predicts whether each prospective lead will become a customer or not.  At each diamond we ask a question, and then the observation moves down one branch of the tree or the other based on the relevant value, until a prediction is made.

### Translating the Data to Code

Of course, we'll eventually want to automate these predictions with code.  So how do we do this? 

The first thing, is to represent each observation as a dictionary.

| Attended College | Under Thirty | Borough   | Income |
| ---------------- | ------------ | --------- | ------ |
| Yes              | Yes          | Manhattan  | < 55   |
| ?                | Yes          | Brooklyn | < 55   | 

In [None]:
customer_1 = {'college': True, 'under_thirty': False, 'borough': 'Manhattan', 'income_under_55': True}
customer_2 = {'college': '?', 'under_thirty': True, 'borough': 'Brooklyn', 'income_under_55': True}

### The hypothesis function in code

We can represent the hypothesis function of a decision tree as a series of `if else` statements.

For example, let's think about how we can represent a smaller decision tree than the one we have above.

> A smaller decision tree

<img src="./customer-dtree.png" width="40%">

Here is the decision tree in Python.

In [5]:
def decision_tree_1(customer):
    if customer['under_thirty'] == True:
        return 0
    else:
        return 1

> Press shift + enter on the cell above, and the cells below.

In [6]:
customer_1 = {'college': True, 'under_thirty': False, 'borough': 'Manhattan', 'income_under_55': True}
customer_2 = {'college': '?', 'under_thirty': True, 'borough': 'Brooklyn', 'income_under_55': True}

In [9]:
decision_tree_1(customer_1)
# 1 
decision_tree_1(customer_2)
# 0

0

Now it's your turn.  Represent the hypothesis function below in Python.

<img src="tree-small-2.png" width="40%">

> Replace the word `pass` with your `if else` statement.

In [1]:
def decision_tree_2(customer):
    pass

Press `shift + return` on the cell above.  Then check your work below.

In [2]:
customer_1 = {'college': True, 'under_thirty': False, 'borough': 'Manhattan', 'income_under_55': True}
customer_2 = {'college': False, 'under_thirty': True, 'borough': 'Brooklyn', 'income_under_55': True}

decision_tree_2(customer_1)
# 1

1

In [3]:
decision_tree_2(customer_2)
# 0

0

Ok, now it's time to try coding the larger hypothesis function from above.  To do so we'll need some nested `if else` statements.

<img src="decision-tree-real-estate.png" width="70%">

If you'd like to translate the entire hypothesis function above into code, give it a shot here.

In [14]:
def decision_tree(customer):
    pass

> If you find the above a little difficult, we filled in some of the code for you below.

> Replace the word **pass** with the correct return values.  You can check your work below by pressing `shift + return` over the next few cells.  The answer is at the end of this lesson.

In [22]:
def decision_tree(customer):
    if customer['college'] == True:
        return 1
    elif customer['college'] == '?':
        if customer['under_thirty'] == True:
            pass
        else:
            pass
    else:
        return 0

In [16]:
customer_1 = {'college': True, 'under_thirty': False, 'borough': 'Manhattan', 'income_under_55': True}
customer_2 = {'college': '?', 'under_thirty': True, 'borough': 'Brooklyn', 'income_under_55': True}
customer_3 = {'college': '?', 'under_thirty': False, 'borough': 'Brooklyn', 'income_under_55': True}

In [17]:
decision_tree(customer_1)
# 1

1

In [20]:
decision_tree(customer_2)
# 0

0

In [21]:
decision_tree(customer_3)
# 1

1

### Summary 

In this lesson, we learned about the hypothesis function for decision trees. We saw that decision tree hypothesis function provides each observation with a series of tests from which it predicts whether a datapoint will fall into one category or another.  We can represent our decision tree as code with a series of if else statements.

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="jigsaw-main.png" width="15%" style="text-align: center"></a>
</center>

### Answers

In [72]:
def decision_tree_2(customer):
    if customer['college'] == True:
        return 1
    else:
        return 0

In [23]:
def decision_tree(customer):
    if customer['college'] == True:
        return 1
    elif customer['college'] == '?':
        if customer['under_thirty'] == True:
            return 0
        else:
            return 1
    else:
        return 0

In [24]:
decision_tree(customer_2)

0