# ML recipes #2 - Visualising a decision tree

The [iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set) provided with `sklearn` is a data set used to train a ML classifier on identifying between three different types of flowers, based on features of their petals.

In [1]:
from sklearn.datasets import load_iris
from sklearn import tree

In [2]:
iris = load_iris()

In [3]:
print iris.feature_names, "\n", iris.target_names

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] 
['setosa' 'versicolor' 'virginica']


In [4]:
print iris.data[0], iris.target[0]

[ 5.1  3.5  1.4  0.2] 0


In ML, classifiers must be tested using some data which does not form part of the training data set. To do so, the first occurence of each species feature set from the test data set will be removed

In [5]:
import numpy as np

test_idx = [0,50,100]

# Training data
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx, axis=0)

# Testing data
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]

In [6]:
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best')

Our testing data's labels are the following; i.e. when we provide the features to the ML model, we should get the following labels.

In [7]:
print test_target

[0 1 2]


Using the ML model to predict the labels, based on the testing features:

In [8]:
print clf.predict(test_data)

[0 1 2]


Example code from sklearn [tutorial](http://scikit-learn.org/stable/modules/tree.html). `pydot` replaced with `pydotplus` as per StackOverflow discussion [here](http://stackoverflow.com/questions/38176472/graph-write-pdfiris-pdf-attributeerror-list-object-has-no-attribute-writ).

Issues with `pydotplus` finding GraphViz executable solved by [1](http://stackoverflow.com/questions/18438997/why-is-pydot-unable-to-find-graphvizs-executables-in-windows-8), [2](http://stackoverflow.com/questions/28312534/graphvizs-executables-are-not-found-python-3-4), and [3](http://www.graphviz.org/content/executables-not-found-systems-path-new-python-and-graphviz).

In [9]:
from sklearn.externals.six import StringIO  
import pydotplus 
dot_data = StringIO() 
tree.export_graphviz(clf, 
                     out_file = dot_data,
                    feature_names = iris.feature_names,
                    class_names = iris.target_names,
                    filled = True,
                    rounded = True,
                    impurity = False) 
graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) 
graph.write_pdf("iris.pdf") 

True

`predict()` warning:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)