上一节中把决策树当成一个黑盒子来处理，而在这一节会使用真实的数据来建一棵决策树，编写代码，将其可视化，并练习如何阅读决策树。这样您即可明白决策树是如何在幕后工作的。

Why did we use a decision tree to start? Because they have a very unique property:they're easy to read and understand.

## Goals

1. Import dataset.
2. Train a classifier.
3. Predict label for new flower.
4. Visualize the tree.

In [1]:
# Import dataset
from sklearn.datasets import load_iris
iris = load_iris()
print(iris.feature_names)
print(iris.target_names)
print(iris.data[0])
print(iris.target[0])
for i in range(len(iris.target)):
    print("Example %d: label %s, featurs %s"%(i, iris.target[i],iris.data[i]))

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
['setosa' 'versicolor' 'virginica']
[5.1 3.5 1.4 0.2]
0
Example 0: label 0, featurs [5.1 3.5 1.4 0.2]
Example 1: label 0, featurs [4.9 3.  1.4 0.2]
Example 2: label 0, featurs [4.7 3.2 1.3 0.2]
Example 3: label 0, featurs [4.6 3.1 1.5 0.2]
Example 4: label 0, featurs [5.  3.6 1.4 0.2]
Example 5: label 0, featurs [5.4 3.9 1.7 0.4]
Example 6: label 0, featurs [4.6 3.4 1.4 0.3]
Example 7: label 0, featurs [5.  3.4 1.5 0.2]
Example 8: label 0, featurs [4.4 2.9 1.4 0.2]
Example 9: label 0, featurs [4.9 3.1 1.5 0.1]
Example 10: label 0, featurs [5.4 3.7 1.5 0.2]
Example 11: label 0, featurs [4.8 3.4 1.6 0.2]
Example 12: label 0, featurs [4.8 3.  1.4 0.1]
Example 13: label 0, featurs [4.3 3.  1.1 0.1]
Example 14: label 0, featurs [5.8 4.  1.2 0.2]
Example 15: label 0, featurs [5.7 4.4 1.5 0.4]
Example 16: label 0, featurs [5.4 3.9 1.3 0.4]
Example 17: label 0, featurs [5.1 3.5 1.4 0.3]
Example 18: label 0, feat

## Testing Data

- Examples used to "test" the classifier's accuracy
- Not part of the training data

Just like in programming, testing is a very important  part of ML.

In [2]:
# Train a classifier

import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree

iris = load_iris()
test_index = [0, 50, 100]

# training data
train_target = np.delete(iris.target, test_index)
train_data = np.delete(iris.data, test_index, axis=0)

# testing data
test_target = iris.target[test_index]
test_data = iris.data[test_index]

clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)

print(test_target)
print(clf.predict(test_data))

[0 1 2]
[0 1 2]


In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree
import pydotplus

# COLLECT TRAINING DATA

iris = load_iris()
test_idx = [0, 50, 100]

# training data
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx, axis=0)

# testing data
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]

# TRAIN CLASSIFIER

clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)

# MAKE PREDICTIONS

print(test_target) #expected outcome
print(clf.predict(test_data)) #model predicted outcome

# VISUALISE THE TREE

# non-coloured version
# dot_data = tree.export_graphviz(clf, out_file=None)
# graph = pydotplus.graph_from_dot_data(dot_data)
# graph.write_pdf("iris.pdf")

dot_data = tree.export_graphviz(clf, out_file=None,
                                feature_names=iris.feature_names,
                                class_names=iris.target_names,
                                filled=True, rounded=True,
                                special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_pdf("irisColoured.pdf")

[0 1 2]
[0 1 2]


InvocationException: GraphViz's executables not found

Adding it to the path or using the pip graphviz?

Neither of those. Adding Library\bin\graphviz to PATH could break Spyder and Matplotlib, and the pip graphviz packages are not compatible with the graphviz package provided by conda.

To be clarify even further:

1. conda install graphviz installs the C graphviz executables and libraries (e.g. the dot executable).
2. pip install graphviz installs one of the Python bindings for Graphviz (the one used by Dask).
3. conda install python-graphviz installs the same package as pip install graphviz, but patched to work with conda's Graphviz C package.
Requiring a specific configuration in the python graphviz means that there is a tight coupling between these packages when there shouldn't be.

There's no additional configuration, you just need to run
```
conda install python-graphviz
```
instead of

```
pip install graphviz

```
(as I said above).


*How about networkx? That also uses the dot executable, right? And I think there's many more packages that do.
I don't know, but nobody has complained about it. We'll look into it if somebody reports an issue.*

##  *Choosing good features is one of your most important jobs*