## A simple tutorial on using Decision trees for classification
We first load required libraries.

In [1]:
import numpy as np
from sklearn.tree import DecisionTreeClassifier

Next we load the dataset

In [2]:
from sklearn.datasets import load_iris
data = load_iris()
print(data.data.shape)
print(data.target.shape)

(150, 4)
(150,)


We have 150 datapoints with $4$ features.

Lets first split the data into train and test sets.

In [3]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(data.data, data.target, test_size=0.2)
print(xtrain.shape)
print(xtest.shape)

(120, 4)
(30, 4)


Now lets train the model!

In [4]:
model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)
model

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best')

Lets put our model to the test.

In [5]:
ypredicted = model.predict(xtest)

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
acc, prec, recall, f1 = accuracy_score(ytest, ypredicted), precision_score(ytest, ypredicted, average='micro'), recall_score(ytest, ypredicted, average='micro'), f1_score(ytest, ypredicted, average='micro')
print("Accuracy:",acc)
print("Precision:", prec)
print("Recall:", recall)
print("F1 Score:", f1)

Accuracy: 0.9666666666666667
Precision: 0.9666666666666667
Recall: 0.9666666666666667
F1 Score: 0.9666666666666667


So, what have we learnt?

In [9]:
from sklearn.tree import plot_tree

import matplotlib.pyplot as plt
%matplotlib notebook

plot_tree(model, filled=True, feature_names=data.feature_names)
plt.show()

<IPython.core.display.Javascript object>