<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Let's-Play-a-Game" data-toc-modified-id="Let's-Play-a-Game-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Let's Play a Game</a></span></li><li><span><a href="#Decision-Trees-Intro" data-toc-modified-id="Decision-Trees-Intro-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Decision Trees Intro</a></span><ul class="toc-item"><li><span><a href="#Example-of-Decision-Tree" data-toc-modified-id="Example-of-Decision-Tree-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Example of Decision Tree</a></span></li><li><span><a href="#Brief-Summary-of-Steps" data-toc-modified-id="Brief-Summary-of-Steps-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Brief Summary of Steps</a></span><ul class="toc-item"><li><span><a href="#Note:-Greedy-Search" data-toc-modified-id="Note:-Greedy-Search-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Note: Greedy Search</a></span></li></ul></li></ul></li><li><span><a href="#🧠-Knowledge-Check:-Making-the-Decisions" data-toc-modified-id="🧠-Knowledge-Check:-Making-the-Decisions-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>🧠 Knowledge Check: Making the Decisions</a></span><ul class="toc-item"><li><span><a href="#Q-01" data-toc-modified-id="Q-01-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Q-01</a></span><ul class="toc-item"><li><span><a href="#Solution" data-toc-modified-id="Solution-3.1.1"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Solution</a></span></li></ul></li><li><span><a href="#Q-02" data-toc-modified-id="Q-02-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Q-02</a></span><ul class="toc-item"><li><span><a href="#Solution" data-toc-modified-id="Solution-3.2.1"><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>Solution</a></span></li></ul></li><li><span><a href="#Q-03" data-toc-modified-id="Q-03-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Q-03</a></span><ul class="toc-item"><li><span><a href="#Solution" data-toc-modified-id="Solution-3.3.1"><span class="toc-item-num">3.3.1&nbsp;&nbsp;</span>Solution</a></span></li></ul></li></ul></li><li><span><a href="#Visual-Example-of-a-Decision-Tree" data-toc-modified-id="Visual-Example-of-a-Decision-Tree-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Visual Example of a Decision Tree</a></span></li></ul></div>

# Let's Play a Game

Pick a thing for Akinator to guess then let's have him guess!

https://en.akinator.com/game


# Decision Trees Intro

- Supervised Machine Learning algorithm (classification)
- Graph/pathway → making _decisions_ along the way

We continue going through the data splitting it up (**partitions**) based on the features

Basically what happens at each decision

![Road splits at billboard "What Now?"](images/fork_in_road.jpg)

## Example of Decision Tree


Work Status |  Age  | Favorite Website
------------|-------|-------------------------
 Student    | Young | A
 Working    | Young | B
 Working    | Old   | C
 Working    | Young | B
 Student    | Young | A
 Student    | Young | A



- If someone is a young worker, what website do we recommend?
- If someone is an old worker, what website then?

## Brief Summary of Steps

1. There are features and a target (what class the data point is)
2. Make a *decision* (a split) based on some *metric* using the features
    - data are split into partitions
3. Continue on each partition, and do more splits for each using the features in that partition
4. Keep doing that until a **stopping condition** is hit
    - Number of data points in a final partition
    - Layers deep

### Note: Greedy Search

We make the most optimal split at each decision (**greedy**) decision which doesn't necessarily lead to the overall most optimal solution

# 🧠 Knowledge Check: Making the Decisions

Let's plot out some data for 2 different classes on two features. We want to draw as few straight lines as possible to separate the two classes.

In [None]:
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

np.random.seed(27)

In [None]:
def helper_create_plot(n=300):
    '''
    Create a plot to practice how a decision tree makes its cuts/decisions.
    '''
    X = []
    y = []

    for i in range(n):
        # Generate a random number
        nx = np.random.random()*10
        ny = np.random.random()*10
        X.append((nx,ny))

        if nx > 5:
            if ny > 1:
                y.append(0)
            else:
                y.append(1)
        else:
            if ny > 7:
                y.append(0)
            else:
                y.append(1)

    X = np.array(X)

    f, ax = plt.subplots(1)

    ax.scatter(X[:,0], X[:,1], c=y, s=20, cmap='Set1');
    plt.xticks(range(11));
    plt.xlabel('X1');
    plt.yticks(range(11));
    plt.ylabel('X2');
    
    return f, ax

In [None]:
def create_line(ax, direction, threshold, x_range=(0,10), y_range=(0,10), color='blue'):
    '''
    Creates a vertical or horizontal cut at threshold
    '''
    if direction == 'vertical':
        cut = lambda t: ax.vlines(t,y_range[0], y_range[1], colors=color)
    elif direction == 'horizontal':
        cut = lambda t: ax.hlines(t,x_range[0], x_range[1], colors=color)
    else:
        print('Direction does not exist')
        return
    
    cut(threshold)
    

In [None]:
f,ax = helper_create_plot()

## Q-01

Looking at the example above, would a **vertical** or a **horizontal** cut better split the classes?

Also, what threshold should we use?

In [None]:
# 'horizontal' or 'vertical'
q1_direction =
# Between 0 and 10
q1_threshold =

In [None]:
# Test your answer by running this cell
f,ax = helper_create_plot()
create_line(ax,q1_direction,q1_threshold)

### Solution

A **vertical** cut/line would do the best to split with a threshold at about **5**

In [None]:
q1_direction = 'vertical'
q1_threshold = 5

f,ax = helper_create_plot();
create_line(ax,q1_direction, q1_threshold);

## Q-02

Splitting further, what would be the next line & threshold to use?

In [None]:
# 'horizontal' or 'vertical'
q2_direction = 
# Between 0 and 10
q2_threshold =

In [None]:
# Test your answer by running this cell
f,ax = helper_create_plot()
create_line(ax,q1_direction, q1_threshold)
create_line(ax,q2_direction, q2_threshold, x_range=(0, q1_threshold))

### Solution

A **horizontal** cut/line would do the best to split with a threshold at about **7**

In [None]:
q2_direction = 'horizontal'
q2_threshold = 7

f,ax = helper_create_plot()
create_line(ax,q1_direction,q1_threshold)
create_line(ax,q2_direction, q2_threshold, x_range=(0, q1_threshold))

## Q-03

Splitting further, what would be the next line & threshold to use?

In [None]:
# 'horizontal' or 'vertical'
q3_direction = 
# Between 0 and 10
q3_threshold = 

In [None]:
# Test your answer by running this cell
f,ax = helper_create_plot()
create_line(ax, q1_direction, q1_threshold)
create_line(ax, q2_direction, q2_threshold, x_range=(0, q1_threshold))
create_line(ax, q3_direction, q3_threshold, x_range=(q1_threshold, 10))

### Solution

A **horizontal** cut/line would do the best to split with a threshold at about **1**

In [None]:
q3_direction = 'horizontal'
q3_threshold = 1

f,ax = helper_create_plot()
create_line(ax, q1_direction, q1_threshold)
create_line(ax, q2_direction, q2_threshold, x_range=(0, q1_threshold))
create_line(ax, q3_direction, q3_threshold, x_range=(q1_threshold, 10))

# Visual Example of a Decision Tree

> From http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

![Visual of decision tree](images/ex-decision-tree.png)