In this short notebook, you will practice training and reading a decision tree. First, explore the tree for a built-in dataset (iris). Next, write your own small dataset from scratch. Train a decision tree classifier on it, and visualize the tree. 
- Does the logic match your expectations?
- Are the important features what you expect?

In [0]:
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn import tree

### Part one
An example of using a built-in dataset, of splitting your dataset into train and test, of training a model, and evaluating accuracy.

In [0]:
iris = load_iris()
examples = iris.data
labels = iris.target

In [0]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(examples, labels, test_size=0.2)

In [0]:
print(len(X_train), len(X_test))

In [0]:
clf = tree.DecisionTreeClassifier(criterion="gini") # Experiment with max_depth=2
clf = clf.fit(X_train, y_train)

In [0]:
fig = plt.figure()
fig.set_size_inches((6,6))
fig.set_dpi(200)
_ = tree.plot_tree(clf, 
                   filled=True, 
                   class_names=iris.target_names) 

Evaluate how accurate your model is on the test set

In [0]:
from sklearn.metrics import accuracy_score
y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

In [0]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)

### Part two: Create your own dataset, then train and visualize a tree
Train and visualize a tree as above. Questions to think about:
- Does the logic in the tree match your expectations?
- Are the important features what you expect?
- Compare the accuracy of your tree on the train and test set.
- How accurate is your model on the test set? 
- If the accuracy does not match that of the training set, why?

In [0]:
dataset = [[90, 25, 0, "apple"],
           [110, 30, 0, "apple"],
           [100, 30, 0, "apple"],
           [120, 30, 1, "orange"],
           [140, 40, 1, "orange"], 
           [120, 50, 0, "banana"], 
           [170, 40, 0, "banana"]]

feature_names = ["weight", "radius", "bumpy"]

In [0]:
examples = [row[:-1] for row in dataset]
labels = [row[-1] for row in dataset]

In [0]:
# This block of code will convert your labels
# into an appropriate format for the tree
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(labels)
print(list(le.classes_))
labels = le.transform(labels)
print(labels)

In [0]:
# Your code here
# Train and visualize a tree, and explore the questions above