### INTRO

* non-parametric method for classification & regression
* returns model for predicting target vals by building decisin rules.
* little data prep needed
* supports both numbers & categories
* uses white box model (reason for conditions explainable/visible)
* validation with statistical tests possible
* overfitting = risk
* unstable due to data variations. Use ensembles as a workaround
* some concepts not applicable (XOR, parity, ...)
* results can be biased if some classes dominate. do pre-balancing

### CLASSIFICATION

[API](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)

[demo:tree structure](plot_unveil_tree_structure.ipynb) | [demo:iris](plot_iris.ipynb)

In [10]:
#example
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
cross_val_score(clf, iris.data, iris.target, cv=10)

array([ 1.        ,  0.93333333,  1.        ,  0.93333333,  0.93333333,
        0.86666667,  0.93333333,  1.        ,  1.        ,  1.        ])

In [11]:
#example
from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

#predict class
print(clf.predict([[2., 2.]]))
#predict class probabilities
clf.predict_proba([[2., 2.]])

[1]


array([[ 0.,  1.]])

In [12]:
#example
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

#export tree in Graphviz format
with open("iris.dot", 'w') as f:
    f = tree.export_graphviz(clf, out_file=f)
#create PDF output
import os
os.unlink('iris.dot')

In [13]:
#create PDF with pydotplus
#*** BUGGED ***
#import pydotplus 
#dot_data = tree.export_graphviz(clf, out_file=None) 
#graph = pydotplus.graph_from_dot_data(dot_data) 
#graph.write_pdf("iris.pdf")

### REGRESSION

[API](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor) |
[example](plot_tree_regression.ipynb)

In [14]:
#example
from sklearn import tree
X = [[0, 0], [2, 2]]
y = [0.5, 2.5]
clf = tree.DecisionTreeRegressor()
clf = clf.fit(X, y)
clf.predict([[1, 1]])

array([ 0.5])

In [15]:
#example
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
boston = load_boston()
regressor = DecisionTreeRegressor(random_state=0)
cross_val_score(regressor, boston.data, boston.target, cv=10)

array([ 0.52939335,  0.60461936, -1.60907519,  0.4356399 ,  0.77280671,
        0.42090343,  0.23656049,  0.36140653, -2.06488186, -1.01206601])

### MULTIPLE OUTPUTS

* Supported by DT Classifier & DT Regressor classes

[demo](plot_tree_regression_multioutput.ipynb)

### COMPLEXITY

* DT construction time: O(#samples x #features x log(#samples))
* DT query time: O(log(#samples))

### ALGORITHMS

* [ID3](https://en.wikipedia.org/wiki/ID3_algorithm)
* C4.5
* C5.0 - proprietary license?
* [CART](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees_.28CART.29) - supports numerical targets