# Decision Tree Classifiers

## Theory -- Explained by Examples.
### Workings of a Binary Classifier

Predict people go golfing, based on people that went golfing under certain weather conditions.

![Golfing](images/golfing.png)

## Objective
Separate classes maximally in a tree structure, i.e. minimize _Entropy_, or maximize _Information Gain_.

![Golfing](images/golfing.png)

## A look at the data
- Feature values = [sunny, overcast, rainy]
- Target = [play]
- Target values = [yes, no]  (hence 'binary' classification)

- play golf?
  - Outlook
    - sunny   : yes=3, no=2
    - overcast: yes=4, no=0
    - sunny   : yes=2, no=3
  - Temperature
    - hot : yes=2, no=2
    - mild: yes=...
    
The Shannon Entropy Measure

$${\bf H}( {\bf X}) = - \sum_{i=1}^n P(x_i)\textrm{log}_2P(x_i),$$

- If data _all_ true, or _all_ false, then $H = 0$ (minimal entropy, or "mess" in the data set).
- If data are perfectly mixed (equally divided), then $H = 1$ (maximal entropy).

## Shannon Entropy, and entropy in data.
The Shannon Entropy Measure measures how clear data are separable.

$${\bf H}( {\bf X}) = - \sum_{i=1}^n P(x_i)\textrm{log}_2P(x_i),$$

- If data _all_ true, or _all_ false, then $H = 0$ (minimal entropy, or "mess" in the data set).
- If data are perfectly mixed (equally divided), then $H = 1$ (maximal entropy).

We have 9 times 'yes' and 5 times 'no' for label 'play'. Leading to 

$$H('play') = - 0.64*Log2(0.64) - 0.36*Log2(0.36) = 0.94,$$

since 9+5=14, 9/14=0.64, and 5/14=0.36.


## Tree Node Construction
- Iterate through _all_ features (outlook, temperature, etc...)
- For _all_ features iterate through _all_ ranges of values

Then we get the following result:  

## Information Gain
Is defined as 

$$ \mathcal{G} \equiv {\bf H}({\bf X}) - {\bf E}({\bf T},{\bf X}),$$

where 

$${\bf E}({\bf T},{\bf X}) = \sum_x P({\bf x})E({\bf x}).$$

The Information Gain for 'outlook', with our table data becomes:

$$ \frac{5}{14} \cdot 0.971 \frac{5}{14} \cdot 0.0 \frac{5}{14} \cdot 0.971 = 0.247 .$$

For all the features, we get 
- **0.247 for 'outlook'**
- **0.029 for 'temperature'**
- **0.152 for 'humidity'**
- **0.048 for 'wind'**

Concluding that the decision should first split on the 'outlook' feature, since this yields the most information.

### Training "result"
We get a tree something like this:
![TreeResult](images/tree_result.png)

# Classifying the Iris Data Set
### Objective
- get the data
- select the features
- train a model (make the tree)
- plot the tree
- understand the idea

## Introducing the iris dataset

![Iris](images/03_iris.png)

- 50 samples of 3 different species of iris (150 samples total)\n",
- Measurements: sepal length, sepal width, petal length, petal width

In [None]:
from sklearn.datasets import load_iris

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
iris = load_iris()

In [None]:
iris

In [None]:
iris.target_names

In [None]:
iris.feature_names

In [None]:
# Select number of features (btween 2 and 4, really).
n_features = 4

In [None]:
X = iris.data[:,-n_features:]

# Get the target values (classes)
y = iris.target

In [None]:
X, y

In [None]:
tree_clf = DecisionTreeClassifier(max_depth=3)

In [None]:
tree_clf.fit(X,y)

Get the _'Graphviz'_ package with your favourite package manager: 

`$ conda install graphviz`

Then convert the graph to PNG in your command line:

`$ dot -Tpng iris_tree.dot -o iris_tree.png`

In [None]:
from sklearn.tree import export_graphviz
import io

In [None]:
export_graphviz(
    tree_clf, 
    out_file="iris_tree.dot",
    feature_names=iris.feature_names[-n_features:],
    class_names=iris.target_names,
    rounded=True,
    filled=True
)


Or we can do it in the notebook directly...
...after we install the graphviz in the right place:

`$ conda install python-graphviz`

In [None]:
from sklearn import tree
import graphviz 
dot_data = tree.export_graphviz(tree_clf, out_file=None) 
graph = graphviz.Source(dot_data) 
graph.render("iris") 

In [None]:
dot_data = tree.export_graphviz(tree_clf, out_file=None, 
                      feature_names=iris.feature_names,  
                      class_names=iris.target_names,  
                      filled=True, rounded=True,  
                      special_characters=True)  
graph = graphviz.Source(dot_data)  
graph 

In [None]:
prediction = tree_clf.predict([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
        [5.5, 4.2, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.2],
        [5. , 3.2, 1.2, 0.2],
        [5.5, 3.5, 1.3, 0.2],
        [4.9, 3.6, 1.4, 0.1],
        [4.4, 3. , 1.3, 0.2],
        [5.1, 3.4, 1.5, 0.2],
        [5. , 3.5, 1.3, 0.3],
        [4.5, 2.3, 1.3, 0.3],
        [4.4, 3.2, 1.3, 0.2],
        [5. , 3.5, 1.6, 0.6],
        [5.1, 3.8, 1.9, 0.4],
        [4.8, 3. , 1.4, 0.3],
        [5.1, 3.8, 1.6, 0.2],
        [4.6, 3.2, 1.4, 0.2],
        [5.3, 3.7, 1.5, 0.2],
        [5. , 3.3, 1.4, 0.2],
        [7. , 3.2, 4.7, 1.4],
        [6.4, 3.2, 4.5, 1.5],
        [6.9, 3.1, 4.9, 1.5],
        [5.5, 2.3, 4. , 1.3],
        [6.5, 2.8, 4.6, 1.5],
        [5.7, 2.8, 4.5, 1.3],
        [6.3, 3.3, 4.7, 1.6],
        [4.9, 2.4, 3.3, 1. ],
        [6.6, 2.9, 4.6, 1.3],
        [5.2, 2.7, 3.9, 1.4],
        [5. , 2. , 3.5, 1. ],
        [5.9, 3. , 4.2, 1.5],
        [6. , 2.2, 4. , 1. ],
        [6.1, 2.9, 4.7, 1.4],
        [5.6, 2.9, 3.6, 1.3],
        [6.7, 3.1, 4.4, 1.4],
        [5.6, 3. , 4.5, 1.5],
        [5.8, 2.7, 4.1, 1. ],
        [6.2, 2.2, 4.5, 1.5],
        [5.6, 2.5, 3.9, 1.1],
        [5.9, 3.2, 4.8, 1.8],
        [6.1, 2.8, 4. , 1.3],
        [6.3, 2.5, 4.9, 1.5],
        [6.1, 2.8, 4.7, 1.2],
        [6.4, 2.9, 4.3, 1.3],
        [6.6, 3. , 4.4, 1.4],
        [6.8, 2.8, 4.8, 1.4],
        [6.7, 3. , 5. , 1.7],
        [6. , 2.9, 4.5, 1.5],
        [5.7, 2.6, 3.5, 1. ],
        [5.5, 2.4, 3.8, 1.1],
        [5.5, 2.4, 3.7, 1. ],
        [5.8, 2.7, 3.9, 1.2],
        [6. , 2.7, 5.1, 1.6],
        [5.4, 3. , 4.5, 1.5],
        [6. , 3.4, 4.5, 1.6],
        [6.7, 3.1, 4.7, 1.5],
        [6.3, 2.3, 4.4, 1.3],
        [5.6, 3. , 4.1, 1.3],
        [5.5, 2.5, 4. , 1.3],
        [5.5, 2.6, 4.4, 1.2],
        [6.1, 3. , 4.6, 1.4],
        [5.8, 2.6, 4. , 1.2],
        [5. , 2.3, 3.3, 1. ],
        [5.6, 2.7, 4.2, 1.3],
        [5.7, 3. , 4.2, 1.2],
        [5.7, 2.9, 4.2, 1.3],
        [6.2, 2.9, 4.3, 1.3],
        [5.1, 2.5, 3. , 1.1],
        [5.7, 2.8, 4.1, 1.3],
        [6.3, 3.3, 6. , 2.5],
        [5.8, 2.7, 5.1, 1.9],
        [7.1, 3. , 5.9, 2.1],
        [6.3, 2.9, 5.6, 1.8],
        [6.5, 3. , 5.8, 2.2],
        [7.6, 3. , 6.6, 2.1],
        [4.9, 2.5, 4.5, 1.7],
        [7.3, 2.9, 6.3, 1.8],
        [6.7, 2.5, 5.8, 1.8],
        [7.2, 3.6, 6.1, 2.5],
        [6.5, 3.2, 5.1, 2. ],
        [6.4, 2.7, 5.3, 1.9],
        [6.8, 3. , 5.5, 2.1],
        [5.7, 2.5, 5. , 2. ],
        [5.8, 2.8, 5.1, 2.4],
        [6.4, 3.2, 5.3, 2.3],
        [6.5, 3. , 5.5, 1.8],
        [7.7, 3.8, 6.7, 2.2],
        [7.7, 2.6, 6.9, 2.3],
        [6. , 2.2, 5. , 1.5],
        [6.9, 3.2, 5.7, 2.3],
        [5.6, 2.8, 4.9, 2. ],
        [7.7, 2.8, 6.7, 2. ],
        [6.3, 2.7, 4.9, 1.8],
        [6.7, 3.3, 5.7, 2.1],
        [7.2, 3.2, 6. , 1.8],
        [6.2, 2.8, 4.8, 1.8],
        [6.1, 3. , 4.9, 1.8],
        [6.4, 2.8, 5.6, 2.1],
        [7.2, 3. , 5.8, 1.6],
        [7.4, 2.8, 6.1, 1.9],
        [7.9, 3.8, 6.4, 2. ],
        [6.4, 2.8, 5.6, 2.2],
        [6.3, 2.8, 5.1, 1.5],
        [6.1, 2.6, 5.6, 1.4],
        [7.7, 3. , 6.1, 2.3],
        [6.3, 3.4, 5.6, 2.4],
        [6.4, 3.1, 5.5, 1.8],
        [6. , 3. , 4.8, 1.8],
        [6.9, 3.1, 5.4, 2.1],
        [6.7, 3.1, 5.6, 2.4],
        [6.9, 3.1, 5.1, 2.3],
        [5.8, 2.7, 5.1, 1.9],
        [6.8, 3.2, 5.9, 2.3],
        [6.7, 3.3, 5.7, 2.5],
        [6.7, 3. , 5.2, 2.3],
        [6.3, 2.5, 5. , 1.9],
        [6.5, 3. , 5.2, 2. ],
        [6.2, 3.4, 5.4, 2.3],
        [5.9, 3. , 5.1, 1.8]])

In [None]:
prediction

In [None]:
len(prediction)

In [None]:
prediction - iris.target

Strange; should have been all zeroes.... but why?