# AdaBoost

Input: Given training dataset: $T = {(x_1, y_1), (x_2, y_2), \ldots, (x_N, y_N)}$, where $N$ is the size of training dataset, $y_i = \{-1, +1\}$.

Ouput: A linear combination of classifiers $G(x) = \alpha_1 G_1(x) + \alpha_2 G_2(x) + \ldots + \alpha_m G_m(x)$, where $m$ is the number of classifiers.

## AdaBoost Algorithm
### 1. Train
1. Initialize the weights for all samples:
$$D_1 = (w_{1, 1}, w_{1, 2}, \ldots, w_{1, N}), w_{1, i} = \frac{1}{N}, i = 1, 2, \ldots, N$$
2. For each base learner G_k(x) do
    1. Determine the base learner $G_k(x)$ based on some rules. For example, if the classifier uses threshold for classification, we can search all the possible thresholds and choose the one with least error score (the error score is calculated using the formula below)
    2. Fit the weak classifier to the training set to compute the error for the base learner
    $$ e_k = \sum_{i=1}^N {{w_{k, i}}_{G_k(x_i) \neq y_i }} $$
    3. Calculate the weight for the current weak classifier
    $$\alpha_k = \frac{1}{2} \ln{\frac{1 - e_k}{e_k}}$$
    4. Update the weights for each point:
    $$D_{k+1} = (w_{k+1, 1}, w_{k+1, 2}, \ldots, w_{k+1, N})$$
    $$w_{k+1, i} = \frac{w_{k, i} e^{-\alpha_k y_i G_k(x_i)}}{Z_k} $$
    $$Z_k = \sum_{i=1}^N{w_{k, i} e^{-\alpha_k y_i G_k(x_i)}}$$
    
3. Combine the classifiers
$$ G(x) = \text{sign}( \alpha_1 G_1(x) + \alpha_2 G_2(x) + \ldots + \alpha_m G_m(x) )$$


### 2. Predict
$$ G(x) = \text{sign}( \alpha_1 G_1(x) + \alpha_2 G_2(x) + \ldots + \alpha_m G_m(x) )$$

## Theorem
1. The summation of weights of samples equal to 1:
$$\sum_{i=1}^N{w_{k, i}} = 1, k=1,2, \ldots, m$$
2. The summation of weights of classifiers might not be 1:
$$\sum_{k=1}^m{\alpha_k} \neq 1$$


## Codes

### 1. Examples in LiHange Book

In [0]:
import math
import numpy as np

def get_d(d, gm_x, y, alpha):
    '''
        d: (N,), (w_1, w_2, ..., w_N)
        alpha: scalar
        y: (N, ), true labels
        gm_x: (N, ), predicted labels
    '''
    z = np.sum(d * np.exp(-alpha * y * gm_x))
    new_d = d / z * np.exp(-alpha * y * gm_x)
    return new_d
def get_alpha(e):
    '''
        e: scalar
    '''
    alpha = (1.0 / 2) * math.log((1 - e) / e) # 1 / 2 is int type
    return alpha

def update_d(d, gm_x, y):
    '''
        d: weight of samples. (w_1, w_2, ..., w_N), (N,)
        y: true labels. (N,)
        gm_x: predicted labels (N,)
    '''
    e = np.sum(d[y != gm_x])
    alpha = get_alpha(e)
    return get_d(d, y, gm_x, alpha)

In [2]:
d1 = np.array([0.1] * 10)
y = np.array([1, 1, 1, -1, -1, -1, 1, 1, 1, -1])
g1_x = np.array([1] * 3 + [-1] * 7)

d2 = update_d(d1, g1_x, y)
print(d2)

[0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857
 0.16666667 0.16666667 0.16666667 0.07142857]


In [3]:
g2_x = np.array([1] * 9 + [-1])

d3 = update_d(d2, g2_x, y)
print(d3)

[0.04545455 0.04545455 0.04545455 0.16666667 0.16666667 0.16666667
 0.10606061 0.10606061 0.10606061 0.04545455]


In [4]:
g3_x = np.array([1] * 6 + [-1] * 4)

d4 = update_d(d3, g3_x, y)
print(d4)

[0.125      0.125      0.125      0.10185185 0.10185185 0.10185185
 0.06481481 0.06481481 0.06481481 0.125     ]


### 2. MNIST
reference:[lihang_book_algorithm](https://github.com/WenDesi/lihang_book_algorithm/blob/master/AdaBoost/adaboost.py)

#### 1. Import packages

In [0]:
import math
import numpy as np
import pandas as pd
import cv2
import tqdm
from collections import defaultdict
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

#### 2. Prepare data

In [0]:
def binarization(img):
    bin_img = img.astype(np.uint8)
    cv2.threshold(bin_img, 50, 1, cv2.THRESH_BINARY_INV, bin_img) # pixel = 0 if value > 50 else 1
    return bin_img

raw_data = pd.read_csv('/content/gdrive/My Drive/data/train_binary.csv', header=0) # binary classification
data = raw_data.values
imgs = data[0:, 1:] # for one row, the first column is the label followed by the image data
labels = data[:, 0]

# reduce the size of training dataset to train faster
imgs = imgs[:1000] # (1000, 784)
labels = labels[:1000] # (1000,)

# binarization
for index, img in enumerate(imgs):
    imgs[index] = binarization(img)

# map 0 to -1 for labels
labels = np.array(list(map(lambda x:int(2*x - 1), labels)))
    
# choose 33% of samples for training, and the rest for testing
x_train, x_test, y_train, y_test = train_test_split(imgs, labels, test_size=0.33, random_state=23323)

#### 3. Build model

In [0]:
class BaseClassifier(object): # +1 if x > threshold else -1
    def fit(self, features, labels, d):
        '''
            features: (B, )
            labels: (B, ), {-1, +1}
            d: (w1, w2, ..., wN)
        '''
        # select the best sign and threshold
        best_error = math.inf
        for sign in ["<", ">"]:
            for threshold in [-0.5, 0.5, 1.5]:
                self.sign = sign
                self.threshold = threshold
                y_predicted = self.predict(features)
                error = np.sum(d[y_predicted != labels])
                if error < best_error: # store current model
                    best_error = error
                    best_sign = sign
                    best_threshold = threshold
                    
        self.sign = best_sign
        self.threshold = best_threshold
        return best_error      
      
    def predict(self, features):
        '''
            features: (B,)
        '''
        y_predicted = list()
        for feature in features:
            if (self.sign == '<' and feature < self.threshold) or (self.sign == '>' and feature > self.threshold):
                y_predicted.append(1)
            else:
                y_predicted.append(-1)
        return np.array(y_predicted)
    def __str__(self):
        return '1 if x {} {} else 0'.format(self.sign, self.threshold)
              
class AdaBoost(object):
    def __init__(self, m=20):
        self.m = m # number of base classifiers

    def _get_alpha(self, e):
        '''
            e: scalar
        '''
        alpha = (1.0 / 2) * math.log((1 - e) / e) # 1 / 2 is int type
        return alpha
      
      
    def _get_d(self, d, gm_x, y, alpha):
        '''
            d: (N,), (w_1, w_2, ..., w_N)
            alpha: scalar
            y: (N, ), true labels
            gm_x: (N, ), predicted labels
        '''
        z = np.sum(d * np.exp(-alpha * y * gm_x))
        new_d = d / z * np.exp(-alpha * y * gm_x)
        return new_d
          
      
    def fit(self, features, labels):
        '''
            features: (B, feature_size)
        '''
        self.alphas = list()
        self.i_classifiers = list() # one classifier is determined by index of feature and corresponding feature
        
        
        print('N:', features.shape[0])
        d = np.array([1 / features.shape[0]] * features.shape[0])
        
        # iterate m classifiers
        for i in range(self.m):
            error, index, classifier = self._find_classifier(features, labels, d) # (index of feature, Sign classifier)
            
            print('k={} || error:{}'.format(i, error))
            print('k={} || classifier:{}'.format(i, classifier))
            print('k={} || index:{}'.format(i, index))
            
            # calculate weight of current classifier (alpha)
            alpha = self._get_alpha(error)
            print('k={} || alpha:{}'.format(i, alpha))
            
            # update weights of samples (d)
            d = self._get_d(d, classifier.predict(features[:, index]), labels, alpha)
            
            self.alphas.append(alpha)
            self.i_classifiers.append((index, classifier))
            
        self.alphas = np.array(self.alphas)
        self.i_classifiers = np.array(self.i_classifiers)
            
    def predict(self, features):
        y_predicted = np.zeros(features.shape[0])
        for i in range(self.m):
            alpha  = self.alphas[i]
            (index, classifier) = self.i_classifiers[i]
            y_predicted += alpha * classifier.predict(features[:, index])
        return np.sign(y_predicted)
      
      
    def _find_classifier(self, features, labels, d):
        best_error = math.inf
        for index in range(features.shape[1]):
            classifier = BaseClassifier()
            error = classifier.fit(features[:, index], labels, d)
            if error < best_error:
                best_error = error
                best_classifier = classifier
                best_index = index
        return best_error, best_index, best_classifier

#### 4. Train and Test

In [8]:
ab = AdaBoost()
ab.fit(x_train, y_train)
y_predicted = ab.predict(x_test)
score = accuracy_score(y_predicted, y_test)
print(score)

N: 670
k=0 || error:0.08656716417910448
k=0 || classifier:1 if x > 0.5 else 0
k=0 || index:359
k=0 || alpha:1.1781446359829533
k=1 || error:0.14550935316655395
k=1 || classifier:1 if x < 0.5 else 0
k=1 || index:435
k=1 || alpha:0.885132594620742
k=2 || error:0.18600766381303824
k=2 || classifier:1 if x > 0.5 else 0
k=2 || index:441
k=2 || alpha:0.7380815373971066
k=3 || error:0.253161490551473
k=3 || classifier:1 if x > 0.5 else 0
k=3 || index:511
k=3 || alpha:0.5409106943050461
k=4 || error:0.2891682607321551
k=4 || classifier:1 if x < 1.5 else 0
k=4 || index:0
k=4 || alpha:0.4497135062042572
k=5 || error:0.20771102822392118
k=5 || classifier:1 if x < 0.5 else 0
k=5 || index:380
k=5 || alpha:0.6693891811298859
k=6 || error:0.2881779709044539
k=6 || classifier:1 if x > 0.5 else 0
k=6 || index:656
k=6 || alpha:0.45212483861074665
k=7 || error:0.2579865586070224
k=7 || classifier:1 if x > 0.5 else 0
k=7 || index:397
k=7 || alpha:0.528229936473063
k=8 || error:0.25066544882409225
k=8 || c