# AdaBoost
AdaBoost是adaptive boosting的缩写，其运行过程如下：

训练数据中的每个样本，并赋予其一个权重，这些权重构成了向量D。一开始，这些权重都初始化成相等值。首先在训练数据上训练一个弱分类器并计算该分类器的错误率，然后在同一数据集上再次训练弱分类器。在分类器的二次训练中，会重新调整每个样本的权重，其中第一次分对的样本的权重会降低，而第二次分错的样本的权重会提高。为了从所有的弱分类器中得到最终的分类结果，AdaBoost为每个分类器都分配了一个权重值$\alpha$，这些$\alpha$值是根据每个弱分类器的错误率进行计算的。$$\alpha = \frac 12 ln(\frac {1 - \varepsilon }{\varepsilon })$$其中$\varepsilon$为弱分类器的错误率。

根据$\alpha$更改样本权重的方法：
- 如果某个样本被正确分类，那么样本的权重更改为：$D_{(t+1)} = \frac {D_i^{(t)}\space e^{-\alpha}}{Sum(D)}$
- 如果某个样本被错误分类，那么样本的权重更改为：$D_{(t+1)} = \frac {D_i^{(t)}\space e^{\alpha}}{Sum(D)}$

In [1]:
import numpy as np

In [2]:
def load_dataset():
    dataset = np.matrix([[1., 2.1],
                        [2., 1.1],
                        [1.3, 1.],
                        [1., 1.],
                        [2., 1.]])
    labels = [1.0, 1.0, -1.0, -1.0, 1.0]
    return dataset, labels

In [3]:
dataset, labels = load_dataset()

In [4]:
dataset

matrix([[ 1. ,  2.1],
        [ 2. ,  1.1],
        [ 1.3,  1. ],
        [ 1. ,  1. ],
        [ 2. ,  1. ]])

In [5]:
labels

[1.0, 1.0, -1.0, -1.0, 1.0]

## 单层决策树
### 伪代码
```
将最小错误率设为正无穷
对数据集中的每一个特征（第一层循环）：
    对每个步长（第二层循环）：
        对每个不等号（第三层循环）：
            建立一颗单层决策树并利用加权数据对它进行测试
            如果错误率低于最小错误率，则将当前单层决策树设为最佳单层决策树
返回最佳单层决策树
```

In [6]:
def stump_classify(data_matrix, feature, threshold, inequal):
    """just classify the data"""
    result = np.ones((data_matrix.shape[0], 1))
    if inequal == 'leqslant':
        result[data_matrix[:, feature] <= threshold] = -1.0
    else:
        result[data_matrix[:, feature] > threshold] = -1.0
    return result

In [7]:
def build_stump(dataset, labels, D):
    """
    :param D: 样本权重
    :return:
    """
    data_matrix = np.mat(dataset)  # (5, 2)
    label_matrix = np.mat(labels).T  # (5, 1)
    n_samples, n_features = data_matrix.shape
    num_steps = 10.0  # 步长
    best_stump = {}  # 最佳决策树桩
    result = np.mat(np.zeros((n_samples, 1)))  # (5, 1)
    min_error = np.inf  # init error sum, to +infinity

    for i in range(n_features):  # loop over all dimensions
        range_min = data_matrix[:, i].min()
        range_max = data_matrix[:, i].max()
        step_size = (range_max - range_min) / num_steps

        for j in range(-1, int(num_steps) + 1):  # loop over all range in current dimension
            for inequal in ['leqslant', 'geqslant']:  # go over less than and greater than
                threshold = (range_min + float(j) * step_size)
                # call stump classify with i, j, lessThan
                predictions = stump_classify(data_matrix, i, threshold, inequal)
                errors = np.mat(np.ones((n_samples, 1)))  # (5, 1)
                errors[predictions == label_matrix] = 0
                weighted_error = D.T * errors  # calc total error multiplied by D
                print("split: feature %d, threshold %.2f, ineqal: %s, the weighted error is %.3f"
                      % (i, threshold, inequal, weighted_error))

                if weighted_error < min_error:
                    min_error = weighted_error
                    result = predictions.copy()
                    best_stump['feature'] = i
                    best_stump['threshold'] = threshold
                    best_stump['inequal'] = inequal
    return best_stump, min_error, result

## 完整AdaBoost算法的实现
```
对每次迭代：
    利用build_stump()函数找到最佳的单层决策树
    将最佳单层决策树加入到单层决策树组
    计算alpha
    计算新的权重向量D
    更新累计类别估计值
    如果错误率等于0.0，则退出循环
```

In [8]:
def adaboost_train(dataset, labels, iters=40):
    stumps = []
    n_features = dataset.shape[0]
    D = np.mat(np.ones((n_features, 1)) / n_features)  # init D to all equal
    agg_result = np.mat(np.zeros((n_features, 1)))

    for i in range(iters):
        best_stump, error, result = build_stump(dataset, labels, D)  # build Stump
        print("D:", D.T)
        # calc alpha, throw in max(error, eps) to account for error=0
        alpha = float(0.5 * np.log((1.0 - error) / max(error, 1e-16)))
        best_stump['alpha'] = alpha
        stumps.append(best_stump)  # store Stump Params in Array
        print("result: ", result.T)
        expon = np.multiply(-1 * alpha * np.mat(labels).T, result)  # exponent for D calc, getting messy
        D = np.multiply(D, np.exp(expon))  # Calc New D for next iteration
        D = D / D.sum()
        # calc training error of all classifiers, if this is 0 quit for loop early (use break)
        agg_result += alpha * result
        print("agg_result: ", agg_result.T)
        agg_error = np.multiply(np.sign(agg_result) != np.mat(labels).T, np.ones((n_features, 1)))
        error_rate = agg_error.sum() / n_features
        print("total error: ", error_rate)
        if error_rate == 0.0:
            break
    return stumps, agg_result

In [9]:
stumps, agg_result = adaboost_train(dataset, labels)

split: feature 0, threshold 0.90, ineqal: leqslant, the weighted error is 0.400
split: feature 0, threshold 0.90, ineqal: geqslant, the weighted error is 0.600
split: feature 0, threshold 1.00, ineqal: leqslant, the weighted error is 0.400
split: feature 0, threshold 1.00, ineqal: geqslant, the weighted error is 0.600
split: feature 0, threshold 1.10, ineqal: leqslant, the weighted error is 0.400
split: feature 0, threshold 1.10, ineqal: geqslant, the weighted error is 0.600
split: feature 0, threshold 1.20, ineqal: leqslant, the weighted error is 0.400
split: feature 0, threshold 1.20, ineqal: geqslant, the weighted error is 0.600
split: feature 0, threshold 1.30, ineqal: leqslant, the weighted error is 0.200
split: feature 0, threshold 1.30, ineqal: geqslant, the weighted error is 0.800
split: feature 0, threshold 1.40, ineqal: leqslant, the weighted error is 0.200
split: feature 0, threshold 1.40, ineqal: geqslant, the weighted error is 0.800
split: feature 0, threshold 1.50, ineqal

In [10]:
stumps

[{'alpha': 0.6931471805599453,
  'feature': 0,
  'inequal': 'leqslant',
  'threshold': 1.3},
 {'alpha': 0.9729550745276565,
  'feature': 1,
  'inequal': 'leqslant',
  'threshold': 1.0},
 {'alpha': 0.8958797346140273,
  'feature': 0,
  'inequal': 'leqslant',
  'threshold': 0.90000000000000002}]

In [11]:
agg_result

matrix([[ 1.17568763],
        [ 2.56198199],
        [-0.77022252],
        [-0.77022252],
        [ 0.61607184]])

In [12]:
def ada_test(test_data, stumps):
    data_matrix = np.mat(test_data)
    n_features = data_matrix.shape[0]
    agg_result = np.mat(np.zeros((n_features, 1)))
    for i in range(len(stumps)):
        result = stump_classify(data_matrix, 
                                stumps[i]['feature'], 
                                stumps[i]['threshold'], 
                                stumps[i]['inequal'])
        agg_result += stumps[i]['alpha'] * result
        print(agg_result)
    return np.sign(agg_result)

In [13]:
ada_test([0., 0.], stumps)

[[-0.69314718]]
[[-1.66610226]]
[[-2.56198199]]


matrix([[-1.]])