## **演示1202：自定义ID3决策树**

### **案例1：自定义决策树分类器对应聘人员进行分类**  
阅读【id3_tree.py】代码文件，查看ID3决策树分类器的定义
* 该分类器实现了一个最简单的，支持True/False两中分类结果判别的决策树
* build_tree_id3方法：用于构造决策树。决策树的结构形如：  
![](../images/120201.png)   
每个节点下都增加一个名为None的分支，该分支下只包含叶子节点，叶子节点直接返回其父节点下的所有样本的主要判别结果
* classify方法：用于根据已经构造好的决策树对新数据分类
 * 待预测的数据中，很可能有样本数据中不存在的特征名称或特征值，因此，决策树分类器必须要能够应对这种情况
 * 本例子中为简单起见，待预测数据中无论是出现了意外的特征名/特征值，还是缺失某些特征名/特征值，都将视为特征值为None，从而将选择决策树当前检索节点的None分支继续搜索

In [1]:
''' 使用自定义构造的ID3决策树来进行判别 '''

import id3_tree

inputs = [
({'level':'Senior', 'lang':'Java', 'tweets':'no', 'phd':'no'}, False),
({'level':'Senior', 'lang':'Java', 'tweets':'no', 'phd':'yes'}, False),
({'level':'Mid', 'lang':'Python', 'tweets':'no', 'phd':'no'}, True),
({'level':'Junior', 'lang':'Python', 'tweets':'no', 'phd':'no'}, True),
({'level':'Junior', 'lang':'R', 'tweets':'yes', 'phd':'no'}, True),
({'level':'Junior', 'lang':'R', 'tweets':'yes', 'phd':'yes'}, False),
({'level':'Mid', 'lang':'R', 'tweets':'yes', 'phd':'yes'}, True),
({'level':'Senior', 'lang':'Python', 'tweets':'no', 'phd':'no'}, False),
({'level':'Senior', 'lang':'R', 'tweets':'yes', 'phd':'no'}, True),
({'level':'Junior', 'lang':'Python', 'tweets':'yes', 'phd':'no'}, True),
({'level':'Senior', 'lang':'Python', 'tweets':'yes', 'phd':'yes'}, True),
({'level':'Mid', 'lang':'Python', 'tweets':'no', 'phd':'yes'}, True),
({'level':'Mid', 'lang':'Java', 'tweets':'yes', 'phd':'no'}, True),
({'level':'Junior', 'lang':'Python', 'tweets':'no', 'phd':'yes'}, False)
]

tree = id3_tree.build_tree_id3(inputs)
print("构建的决策树：")
print(tree)
print("执行预测：")
print(id3_tree.classify(tree, { "level" : "Junior", "lang" : "Java", "tweets" : "yes", "phd" : "no"} )) # True
print(id3_tree.classify(tree, { "level" : "Junior","lang" : "Java", "tweets" : "yes", "phd" : "yes"} )) # False
print(id3_tree.classify(tree, { "level" : "Intern" } )) # True
print(id3_tree.classify(tree, { "level" : "Senior" } )) # False

构建的决策树：
('level', {'Senior': ('tweets', {'no': False, 'yes': True, None: False}), 'Mid': True, 'Junior': ('phd', {'no': True, 'yes': False, None: True}), None: True})
执行预测：
True
False
True
False


### **案例2：自定义决策树分类器对汽车进行分类，并计算准确率**
* 汽车分类训练数据，请查阅【car.csv】文件

In [2]:
''' 使用自定义ID3决策树判别汽车数据 '''
import numpy as np    
import collections as col
import id3_tree

data   = []    
labels = []   
inputs = [] 
with open("car.csv") as ifile:    
        first_line = True
        for line in ifile:
            if first_line:              # 跳过第一行(标题行)
                first_line = False
                continue
            rowDict = {}
            tokens = line.strip().split(',')  
            rowDict['buying']=tokens[0]
            rowDict['maint']=tokens[1]  
            rowDict['doors']=tokens[2]  
            rowDict['persons']=tokens[3]  
            rowDict['lug_boot']=tokens[4]  
            rowDict['safety']=tokens[5]  
            inputs.append((rowDict, False if tokens[-1]=='unacc' else True))    # 最后一列作为分类标签(acc或unacc)

total_count = len(inputs)
train_inputs = []
test_inputs = []
temp = train_inputs, test_inputs
ratio = 0.75
for i in range(len(inputs)):
    dataSetIndex = 0 if np.random.random() < ratio else 1
    temp[dataSetIndex].append(inputs[i])

tree = id3_tree.build_tree_id3(train_inputs)

correct_count = 0
for row in test_inputs:
    predict = id3_tree.classify(tree, row[0])
    if predict == row[1]:
        correct_count += 1
print("预测正确率：", correct_count / len(test_inputs))

预测正确率： 0.9462102689486552
