***
## AdaBoost
***

Combine multiple weak classifiers into one strong classifier.

#### Weak Learner (Decision Stump)

A decision tree with only one split. Look at only one feature of the sample and use only one threshold.

#### Error

In the first iteration:
$$\epsilon_t = \frac{misclassifications}{samples} = \frac{misclassifications}{N}$$

$$\epsilon_t = \sum_{miss} weigths$$

If $error > 0.5$, just flip the decision and $error = 1 - error$

#### Weights

Initial weights are $w_0 = \frac{1}{N}$ for each sample

Update rule:

$$w = \frac{w \cdot exp \left(- \alpha \cdot y \cdot h(X) \right)}{sum(w)}$$

where $h(X)$ is prediction of $t$ (1 if correct classification, -1 if misclassification)

Misclassified samples have a higher impact on the next classifier.

#### Performance

$$\alpha = 0.5 \cdot log \left( \frac{1 - \epsilon_t}{\epsilon_t} \right)$$

#### Prediction

$$y = sign \left( \sum_{t}^{T} \alpha_t \cdot h(X) \right)$$

#### Training

Initialize weights for each sample, set to $\frac{1}{N}$

for $t$ in $T$:

- Train week classifier (greedy search to find best feature and threshold)
- Calculate error $\epsilon_t = \sum_{miss} weigths$
    - flip error end decision if $error > 0.5$
- Calculate $\alpha$
- Update weights $w$
***

In [1]:
import numpy as np

# Decision stump used as weak classifier
class DecisionStump():
    def __init__(self):
        self.polarity = 1
        self.feature_idx = None
        self.threshold = None
        self.alpha = None

    def predict(self, X):
        n_samples = X.shape[0]
        X_column = X[:, self.feature_idx]
        predictions = np.ones(n_samples)
        if self.polarity == 1:
            predictions[X_column < self.threshold] = -1
        else:
            predictions[X_column > self.threshold] = -1

        return predictions


class Adaboost():

    def __init__(self, n_clf=5):
        self.n_clf = n_clf

    def fit(self, X, y):
        n_samples, n_features = X.shape

        # Initialize weights to 1/N
        w = np.full(n_samples, (1 / n_samples))

        self.clfs = []
        # Iterate through classifiers
        for _ in range(self.n_clf):
            clf = DecisionStump()

            min_error = float('inf')
            # greedy search to find best threshold and feature
            for feature_i in range(n_features):
                X_column = X[:, feature_i]
                thresholds = np.unique(X_column)

                for threshold in thresholds:
                    # predict with polarity 1
                    p = 1
                    predictions = np.ones(n_samples)
                    predictions[X_column < threshold] = -1

                    # Error = sum of weights of misclassified samples
                    misclassified = w[y != predictions]
                    error = sum(misclassified)

                    if error > 0.5:
                        error = 1 - error
                        p = -1

                    # store the best configuration
                    if error < min_error:
                        clf.polarity = p
                        clf.threshold = threshold
                        clf.feature_idx = feature_i
                        min_error = error

            # calculate alpha
            EPS = 1e-10
            clf.alpha = 0.5 * np.log((1.0 - min_error + EPS) / (min_error + EPS))

            # calculate predictions and update weights
            predictions = clf.predict(X)

            w *= np.exp(-clf.alpha * y * predictions)
            # Normalize to one
            w /= np.sum(w)

            # Save classifier
            self.clfs.append(clf)

    def predict(self, X):
        clf_preds = [clf.alpha * clf.predict(X) for clf in self.clfs]
        y_pred = np.sum(clf_preds, axis=0)
        y_pred = np.sign(y_pred)

        return y_pred

In [2]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split


def accuracy(y_true, y_pred):
    accuracy = np.sum(y_true == y_pred) / len(y_true)
    return accuracy

data = datasets.load_breast_cancer()
X = data.data
y = data.target

y[y == 0] = -1

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)

# Adaboost classification with 5 weak classifiers
clf = Adaboost(n_clf=5)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

acc = accuracy(y_test, y_pred)
print ("Accuracy:", acc)

Accuracy: 0.9736842105263158


## References
- [AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial](https://youtu.be/wF5t4Mmv5us?list=PLqnslRFeH2Upcrywf-u2etjdxxkL8nl7E)
- [MLfromscratch](https://github.com/python-engineer/MLfromscratch/blob/master/mlfromscratch/adaboost.py)