# Building Adaboost

### Introduction

* Remember to show the cost function of adaboost vs gradient boosting.

### Do it once

* Hypothesis function

Here, we'll let $f_m$  represent singly weak learner or decision tree, and we'll let $F(x)$ represent the adaboost model.  We'll have a positive target be represented by $1$, and a negative be represented by $-1$.

For a classification problem, adaboost then makes predictions by each decision tree voting on the prediction of each observation.  The votes are weighted the cost of each decision tree, with those with a lower cost getting a weight.  This is what the hypothesis function looks like:

$F(x) = sign(\alpha_1*f_1(x) + \alpha_2*f_2(x) ... \alpha_M*f_m(x)) = sign \left(  \sum_{m=1}^M \alpha_m*f_m(x) \right ) $

So, after adding up the weighted vote for each observation, if the sum is positive the adaboost model predicts 1, and if the sum is negative the model predicts -1.

* Training procedure

Now let's move onto how an adaboost model trains.  With adaboost, trees are trained one after the other.  After the first tree is trained, the next tree places a higher weight on training to the observations that the previous tree predicted incorrectly.  So each tree trains to the weaknesses of the previsouly trained tree.

> We can see this if we look at the graph below, illustrating this process of placing higher weights on the misclassified observations.

<img src="./successive-trees.png" width="100%">

> From [Adaboost Intuition](https://xavierbourretsicotte.github.io/AdaBoost.html).

1. It's a voting algorithm 

2. Not everyone gets equal votes

3. Then we weight the observations

So, going from top left to bottom right, in the first square each observation receives equal weight.  We can see that the classifier properly classified each of the red observations but missed on some of the lower blue observations.  In the next second square, these improperly blue observations were weighted higher, and this time properly classified.  

This procedure continues.  Each time, the decision tree trains by assigning higher weight to the observations previously trained incorrectly.

### Onto the Code

Below is the pseudocode for the adaboost algorithm.  Let's look over it now.  Then we'll develop a deeper understanding of it as we implement it step by step in code.

<img src="./adaboost.png" width="70%">

In [34]:
from sklearn.datasets import make_moons

X, bool_y = make_moons(n_samples=3000, random_state = 1)

In [35]:
import numpy as np
y = np.where(bool_y == 0, -1, 1)
y

array([ 1,  1, -1, ..., -1, -1, -1])

In [36]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [37]:
from sklearn.tree import DecisionTreeClassifier

In [38]:
import numpy as np
w = np.ones(y_train.shape[0])

In [39]:
dtr = DecisionTreeClassifier(max_depth = 2).fit(X_train, y_train, sample_weight = w)

In [40]:
y_hat = dtr.predict(X_train)

> Calculate error of t.

In [41]:
error_t = w[y_hat != y_train].sum()/y_train.shape[0]
error_t

0.08088888888888889

In [42]:
# error_t = .15

> The higher the error, the lower the value of alpha.  In other words, alpha is how well the learner performs.

In [43]:
alpha = .5*np.log((1 - error_t)/error_t)
alpha

1.2151652742757622

> Then find the new weights.

In [16]:
w*np.exp(-alpha*y_train*y_hat)

array([0.29040893, 0.29040893, 0.29040893, ..., 0.29040893, 0.29040893,
       0.29040893])

> Is it correct, `y_train*y_hat` (positive when correct, negative when incorrect)
> When correct, multiply weight by fraction, when incorrect to exponent.  

> If the model performs well, but incorrectly classified this, then weight even higher.  If model performs well, and correctly classified, then weight even lower.

In [17]:
(y_train==y_hat).mean()

0.9222222222222223

### Complete Many Times

Ok, now let's loop through this procedure.

In [57]:
import numpy as np

ws = []
alphas = []
y_hats = []
errors = []
dtcs = []
w = np.ones(y_train.shape[0])/y_train.shape[0]

for i in range(30):
    dtc = DecisionTreeClassifier(max_depth = 2).fit(X_train, y_train,
                                                    sample_weight = w)
    y_hat = dtc.predict(X_train)
    error_t = w[y_hat != y_train].sum()/y_train.shape[0]
    alpha = .5*np.log((1 - error_t)/error_t)
    w = w*np.exp(-alpha*y_train*y_hat)
    w = w/w.sum()
    ws.append(w)
    alphas.append(alpha)
    y_hats.append(y_hat)
    errors.append(error_t)
    dtcs.append(dtc)

In [58]:
alphas_arr = np.array(alphas)
tree_preds = np.array(y_hats) 

In [59]:
tree_preds.shape

(30, 2250)

In [60]:
def predict(dtrs, alphas, X):
    preds = np.vstack([alpha*dtr.predict(X) for dtr, alpha in zip(dtrs, alphas)])
    cum_preds = preds.sum(axis = 0)
    return np.where(cum_preds > 0, 1, -1)

In [61]:
predictions = predict(dtcs, alphas, X_test)
# predictions

In [62]:
from sklearn.metrics import accuracy_score

In [63]:
accuracy_score(y_test, predictions)

0.996

In [64]:
from sklearn.metrics import precision_score, recall_score

precision_score(y_test, predictions), recall_score(y_test, predictions)

(0.9972222222222222, 0.9944598337950139)

### Resources

[Building Adaboost from scratch](https://geoffruddock.com/adaboost-from-scratch-in-python/)

[Adaboost Summary](https://xavierbourretsicotte.github.io/AdaBoost.html)

[Boosting MIT Lecture](https://www.youtube.com/watch?v=UHBmv7qCey4)

[Adaboost TA Session](https://www.youtube.com/watch?v=gmok1h8wG-Q&t=3s)