# Decision Tree (CART)
## <a href="#I">I Implementing Decision Tree with Scikit-Learn</a>
### <a href="#I.1">I.1 Preparing the Data</a>
### <a href="#I.2">I.2 Training the Algorithm</a>
### <a href="#I.3">I.3 Making Predictions</a>
### <a href="#I.4">I.4 Evaluating the Algorithm</a>
### <a href="#I.5">I.5 Advantages and Disadvantages of CART</a>

# Decision Tree (CART)

A decision tree is one of most frequently and widely used __supervised machine learning algorithms__ that can perform both __classification and regression tasks__ (__CART__). <br>

The intuition behind the decision tree algorithm is simple, yet very powerful.<br>

The basic algorithm used to construct the decision tree is known as the ID3 algorithm. <br>
Briefly, the steps to the algorithm are:
- Select the best attribute → A out of the dataset 
- Assign A as the decision attribute for the NODE.
- Partition ("split") the dataset into 2 subset using the attribute A. 
- For each subset create a new descendant of the NODE. 
- If examples are perfectly classified, then STOP else iterate over the new leaf nodes.
<br>

ID3 consider the "best attribute" in terms of which attribute has the most __information gain__, a measure that expresses how well an attribute splits that data into groups based on classification.
<br>
When you create a Decision Tree object with Scikit learn you have the choice between 2 ways to evaluate the information gain: Gini impurity (gini) and Information Gain Entropy (entropy).

ID3 is a greedy algorithm that grows the tree top-down, at each node selecting the attribute that best classifies the local training examples. <br>
This process continues until the tree perfectly classifies the training examples or until all attributes have been used.


<a id="I"></a>
## I Implementing Decision Tree with Scikit-Learn

We will be using the __iris dataset__ to build a decision tree __classifier__. The data set contains information of 3 classes of the iris plant with the following attributes: - sepal length - sepal width - petal length - petal width - class: Iris Setosa, Iris Versicolour, Iris Virginica

The task is to predict the class of the iris plant based on the attributes. 


In [13]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

#Loading the iris data
data = load_iris()
print('Classes to predict: ', data.target_names)


Classes to predict:  ['setosa' 'versicolor' 'virginica']


<a id="I.1"></a>
### I.1 Preparing the Data:

Preparing the data involves:

1. Dividing the data into attributes and labels 
2. Dividing the data into training and testing sets.

In [14]:
#Extracting data attributes
X = data.data
### Extracting target/ class labels
y = data.target
print('Number of examples in the data:', X.shape[0])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0, test_size = 0.25)


Number of examples in the data: 150


<a id="I.2"></a>
### I.2 Training the Algorithm:

We have divided the data into training and testing sets. Now is the time to construct our Decision Tree using the training data.<br>
Since, this is a classification problem, we will import the __DecisionTreeClassifier__ constructor from the sklearn library.<br> Next, we will set the '__criterion__' to '__entropy__', which sets the measure for splitting the attribute to information gain.

In [15]:
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(criterion = 'entropy')
#Training the decision tree classifier. 
clf.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best')

<a id="I.3"></a>
### I.3 Making Predictions:


In [16]:
#Predicting labels on the test set.
y_pred =  clf.predict(X_test)

<a id="I.4"></a>
### I.4 Evaluating the Algorithm:

__Confusion matrix__, __precision__, __recall__, and __F1 measures__ are the most commonly used metrics for classification tasks. <br>
For this case, we will simply __use accuracy_score()__ to calculate the accuracy of the predicted labels.

In [17]:
from sklearn.metrics import accuracy_score
print('Accuracy Score on train data: ', accuracy_score(y_true=y_train, y_pred=clf.predict(X_train)))
print('Accuracy Score on test data: ', accuracy_score(y_true=y_test, y_pred=y_pred))

Accuracy Score on train data:  1.0
Accuracy Score on test data:  0.9736842105263158


From the result it can be observed that the accuracy on the test data is good (> 97%) but we can tune the parameters of the decision tree to try increase the accuracy. <br>
The following script will let you play with different parameters and furthermore visualize the corresponding tree.
These parameters include: 
- criterion for evaluating a split (__entropy__ to calculate information gain or __Gini__ impurity), 
- maximum tree depth, 
- minimum number of samples required at a leaf node, 
- ...

In [1]:
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn.datasets import load_iris
from IPython.display import SVG
from graphviz import Source
from IPython.display import display
from sklearn.model_selection import train_test_split
from ipywidgets import interactive
# load dataset
data = load_iris()
# feature matrix
X = data.data
# target vector
y = data.target
# class labels
labels = data.feature_names
data.target_names
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0, test_size = 0.25)

def plot_tree(crit, split, depth, min_split, min_leaf=0.2):
    estimator = DecisionTreeClassifier(random_state = 0 
                                      , criterion = crit
                                      , splitter = split
                                      , max_depth = depth
                                      , min_samples_split=min_split
                                      , min_samples_leaf=min_leaf)
    estimator.fit(X_train, y_train)
    y_pred =  estimator.predict(X_test)
     
    print('Accuracy Score on train data: ', accuracy_score(y_true=y_train, y_pred=estimator.predict(X_train)))
    print('Accuracy Score on test data: ', accuracy_score(y_true=y_test, y_pred=y_pred))
    graph = Source(tree.export_graphviz(estimator
                                      , out_file=None
                                      , feature_names=labels
                                      , class_names=['0', '1', '2']
                                      , filled = True))

    display(SVG(graph.pipe(format='svg')))
   
    return estimator

inter=interactive(plot_tree 
   , crit = ["gini", "entropy"]
   , split = ["best", "random"]
   , depth=(2,10)
   , min_split=(2,10)
   , min_leaf=(1,10))

display(inter)

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

interactive(children=(Dropdown(description='crit', options=('gini', 'entropy'), value='gini'), Dropdown(descri…

<a id="I.5"></a>
### I.5 Advantages and Disadvantages of CART

Following are the advantages of decision trees: 

- Easy to use and understand. 
- Can handle both categorical and numerical data. 
- Resistant to outliers, hence require little data preprocessing. 
- New features can be easily added. 
- Can be used to build larger classifiers by using ensemble methods (RandomForest).

Following are the disadvantages of decision trees: 

- Prone to overfitting: the algorithm used internaly is splitting on attributes until either it classifies all the data points or there are no more attributes to splits on. As a result, it is prone to creating decision trees that overfit by performing really well on the training data at the expense of accuracy with respect to the entire distribution of data. One way to avoid that is to prevent the tree from growing too deep by stopping it before it perfectly classifies the training data.<br>
The __maximum tree depth__ or __minimum number of samples__ required in a split are 2 parameters that can be controlled when the model is instantiated. 
- Require some kind of measurement as to how well they are doing. 
- Need to be careful with parameter tuning. 
