# AdaBoost
For this exercise you will implement AdaBoost from scratch and applied it to a spam dataset. You will be classifying data into spam and not spam. You can call DecisionTreeClassifier from sklearn (with default max_depth=1) to learn your base classifiers.

Here is how you train a decision tree classifier with weights.

`
h = DecisionTreeClassifier(max_depth=max_depth, random_state=0)`

`
h.fit(X, Y, sample_weight=w)
`

In [1]:
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from pathlib import Path
from adaboost import *

from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [2]:
PATH = Path("data")

In [3]:
y_test = np.array([1., -1., 1., 1., -1., -1., 1., 1., 1., -1.])
X, Y = parse_spambase_data(PATH/"tiny.spam.train")
for i in range(len(y_test)): assert(y_test[i] == Y[i])
n, m = X.shape
assert(n == 10)
assert(m == 57)

In [4]:
X, Y = parse_spambase_data(PATH/"tiny.spam.train")
trees, weights = adaboost(X, Y, 2)
y_hat_0 = trees[0].predict(X)
assert(len(trees) == 2)
assert(len(weights) == 2)
assert(isinstance(trees[0], DecisionTreeClassifier))
assert(np.array_equal(y_hat_0[:5], [1.,-1.,1, 1, -1]))

In [5]:
y_hat_0

array([ 1., -1.,  1.,  1., -1., -1., -1.,  1.,  1., -1.])

In [6]:
y_hat_0 = trees[0].predict(X)
assert(np.array_equal(y_hat_0[:5], [1.,-1.,1, 1, -1]))

In [7]:
x = np.array([[0, -1], [1, 0], [-1, 0]])
y = np.array([-1, 1, 1])
trees, weights = adaboost(x, y, 1)
pred = adaboost_predict(x, trees, weights)
assert(np.array_equal(pred, y))

In [8]:
X, Y = parse_spambase_data(PATH/"spambase.train")
X_test, Y_test = parse_spambase_data(PATH/"spambase.test")
trees, trees_weights = adaboost(X, Y, 10)
Yhat = adaboost_predict(X, trees, trees_weights)
Yhat_test = adaboost_predict(X_test, trees, trees_weights)
    
acc_test = accuracy(Y_test, Yhat_test)
acc_train = accuracy(Y, Yhat)
print("Train Accuracy %.4f" % acc_train)
print("Test Accuracy %.4f" % acc_test)
assert(np.around(acc_train, decimals=4)==0.9111)
assert(np.around(acc_test, decimals=4)==0.9190)

Train Accuracy 0.9111
Test Accuracy 0.9190
