# 【实验】第8.2节决策树建模与可视化

## 实验介绍

在本节实验中，我们将会详解详细介绍如何使用sklearn中的DecisionTreeClassifier模块来完成整个决策树算法的建模过程，同时再对训练完成的决策树进行可视化。

### 知识点

- DecisionTreeClassifier使用及建模
- 决策树可视化graphviz
- 特征重要性评估

## 1.载入数据集

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import tree
# import graphviz


def load_data():
    data = load_iris()
    X, y = data.data, data.target
    feature_names = data.feature_names
    X_train, X_test, y_train, y_test = \
        train_test_split(X, y, test_size=0.3, random_state=42)
    return X_train, X_test, y_train, y_test, feature_names

## 2.模型训练

In [3]:
def train(X_train, X_test, y_train, y_test, feature_names):
    model = tree.DecisionTreeClassifier(criterion='gini', min_samples_leaf=5, random_state=30)
    model.fit(X_train, y_train)
    print("在测试集上的准确率为：", model.score(X_test, y_test))
    dot_data = tree.export_graphviz(model, out_file=None,
                                    feature_names=feature_names,
                                    filled=True, rounded=True,
                                    special_characters=True)
    # graph = graphviz.Source(dot_data)
    # graph.render('iris')
    # 本地也需要安装 graphviz
    # Mac: brew install graphviz
    # Ubuntu: sudo apt install graphviz
    # Centos: sudo yum install graphviz
    # https://graphviz.org/download/
    print("特征重要性为：", model.feature_importances_)

## 3.运行结果

In [4]:

if __name__ == '__main__':
    X_train, X_test, y_train, y_test, feature_names = load_data()
    print("特征为性为：", feature_names)
    train(X_train, X_test, y_train, y_test, feature_names)

特征为性为： ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
在测试集上的准确率为： 1.0
特征重要性为： [0.00536513 0.         0.07057937 0.9240555 ]


## 实验总结

在本节实验中，我们详解详细介绍了如何使用sklearn中的DecisionTreeClassifier模块来完成整个决策树算法的建模过程，并同时对训练完成的决策树进行可视化，以及特征重要性的输出。