# Setup

In [1]:
'''
First, let's make sure this notebook works well in both python 2 and 3, import a few common modules, ensure 
MatplotLib plots figures inline and prepare a function to save the figures
'''
# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "decision_trees"

def image_path(fig_id):
    return os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID, fig_id)

def save_fig(fig_id, tight_layout=True):
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(image_path(fig_id) + ".png", format='png', dpi=300)

# Training and Visualizing a Decision Tree

In [5]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)\

'''
You can visualize the trained Decision Tree by first using the export_graphviz() method to output a graph 
definition file called iris_tree.dot
'''
from sklearn.tree import export_graphviz
export_graphviz(
        tree_clf,
        out_file=image_path("iris_tree.dot"),
        feature_names=iris.feature_names[2:],
        class_names=iris.target_names,
        rounded=True,
        filled=True
    )

'''
Then you can convert this .dot file to a variety of formats such as PDF or PNG using the dot command-
line tool from the graphviz package. This command line converts the .dot file to a .png image file 
'dot -Tpng iris_tree.dot -oiris_tree.png'
'''

'''
One of the many qualities of Decision Trees is that they require very little data preparation. In particular, 
they don’t require feature scaling or centering at all.
'''

"\nThen you can convert this .dot file to a variety of formats such as PDF or PNG using the dot command-\nline tool from the graphviz package. This command line converts the .dot file to a .png image file \n'dot -Tpng iris_tree.dot -oiris_tree.png'\n"

# Regularization Hyperparameters

In [6]:
'''
Decision Trees make very few assumptions about the training data (as opposed to linear models, which
obviously assume that the data is linear, for example). If left unconstrained, the tree structure will adapt
itself to the training data, fitting it very closely, and most likely overfitting it. Such a model is often 
called a nonparametric model, not because it does not have any parameters (it often has a lot) but because the
number of parameters is not determined prior to training, so the model structure is free to stick closely to
the data. In contrast, a parametric model such as a linear model has a predetermined number of
parameters, so its degree of freedom is limited, reducing the risk of overfitting (but increasing the risk of
underfitting).

To avoid overfitting the training data, you need to restrict the Decision Tree’s freedom during training. As
you know by now, this is called regularization. The regularization hyperparameters depend on the algorithm
used, but generally you can at least restrict the maximum depth of the Decision Tree. In Scikit-Learn, this is
controlled by the max_depth hyperparameter (the default value is None , which means unlimited). Reducing 
max_depth will regularize the model and thus reduce the risk of overfitting. The DecisionTreeClassifier class
has a few other parameters that similarly restrict the shape of the Decision Tree: min_samples_split 
(the minimum number of samples a node must have before it can be split), min_samples_leaf (the minimum number 
of samples a leaf node must have), min_weight_fraction_leaf (same as min_samples_leaf but expressed as a 
fraction of the total number of weighted instances), max_leaf_nodes (maximum number of leaf nodes), and 
max_features (maximum number of features that are evaluated for splitting at each node). Increasing min_*
hyperparameters or reducing max_* hyperparameters will regularize the model.
'''
'''
NOTE: Other algorithms work by first training the Decision Tree without restrictions, then pruning (deleting) 
unnecessary nodes. A node whose children are all leaf nodes is considered unnecessary if the purity improvement
it provides is not statistically significant. Standard statistical tests, such as the χ2 test, are used to 
estimate the probability that the improvement is purely the result of chance (which is called the null 
hypothesis). If this probability, called the p-value, is higher than a given threshold (typically 5%,
controlled by a hyperparameter), then the node is considered unnecessary and its children are deleted. The 
pruning continues until all unnecessary nodes have been pruned.
'''

'\nNOTE: Other algorithms work by first training the Decision Tree without restrictions, then pruning (deleting) \nunnecessary nodes. A node whose children are all leaf nodes is considered unnecessary if the purity improvement\nit provides is not statistically significant. Standard statistical tests, such as the χ2 test, are used to \nestimate the probability that the improvement is purely the result of chance (which is called the null \nhypothesis). If this probability, called the p-value, is higher than a given threshold (typically 5%,\ncontrolled by a hyperparameter), then the node is considered unnecessary and its children are deleted. The \npruning continues until all unnecessary nodes have been pruned.\n'

# Regression

In [7]:
'''
Decision Trees are also capable of performing regression tasks. Let’s build a regression tree using Scikit-
Learn’s DecisionTreeRegressor class, training it on a noisy quadratic dataset with max_depth=2
'''
from sklearn.tree import DecisionTreeRegressor
tree_reg = DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X,	y)

DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_split=1e-07,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, presort=False, random_state=None,
           splitter='best')