#### Section - B

- In this project you will apply the AdaBoost boosting algorithm to implement an ensemble learning approach for solving a (binary) classification problem. 
- The (one- dimensional) training data set D is given in Table 4.12 on page 352 of the textbook. The base classifier is a simple, one-level decision tree (decision stump) (as explained on p. 303 of the textbook).
- Determine the number of boosting rounds and show the result of each round (the probability distribution pi’s at each round, the records chosen at each round, the model (tree) obtained at each round, the ε and the α at each round), as well as the result obtained on D with the final ensemble classifier. 
- Note that the textbook uses the notation w (weight) for what we called p (probability) in the derivation we did in the lectures. (The textbook has quite a few typos!) Also, do not forget the stopping condition we discussed.
- What is the result of running your ensemble classifier on the following test data? X = 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0


Submit source code and your output (the round-wise results and the result on the test data) following the styles of Figures 4.46, 4.49, and 4.50 of the textbook.

In [163]:
import pandas as pd
import numpy as np

#### Section - B : 1. Load datasets

In [164]:
training_dataset_df = pd.read_csv("1d_training_dataset.csv")
training_dataset_df = training_dataset_df.T
training_dataset_df.reset_index(inplace=True)
training_dataset_df.columns = ["x", "y"]
training_dataset_df = training_dataset_df[1:]
training_dataset_df

Unnamed: 0,x,y
1,0.5,-
2,3.0,-
3,4.5,+
4,4.6,+
5,4.9,+
6,5.2,-
7,5.3,-
8,5.5,+
9,7.0,-
10,9.5,-


In [165]:
X_test = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
X_test

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]

##### Replace class labels "y" --- "+" and "-" with 1 and 0

In [166]:
training_dataset_df['y'] = training_dataset_df['y'].map({'+': 1, "-": 0})
training_dataset_df

Unnamed: 0,x,y
1,0.5,0
2,3.0,0
3,4.5,1
4,4.6,1
5,4.9,1
6,5.2,0
7,5.3,0
8,5.5,1
9,7.0,0
10,9.5,0


In [167]:
training_dataset_df.dtypes

x    object
y     int64
dtype: object

In [168]:
training_dataset_df['x'] = training_dataset_df['x'].astype(float)
training_dataset_df

Unnamed: 0,x,y
1,0.5,0
2,3.0,0
3,4.5,1
4,4.6,1
5,4.9,1
6,5.2,0
7,5.3,0
8,5.5,1
9,7.0,0
10,9.5,0


In [169]:
training_dataset_df.dtypes

x    float64
y      int64
dtype: object

#### Section B - 2. Task

- apply the AdaBoost boosting algorithm
- Use the base classifier is a simple, one-level decision tree (decision stump) (as explained on p. 303 of the textbook).
- Determine the number of boosting rounds and show the result of each round (the probability distribution pi’s at each round, the records chosen at each round, the model (tree) obtained at each round, the ε and the α at each round), as well as the result obtained on D with the final ensemble classifier. 
- Note that the textbook uses the notation w (weight) for what we called p (probability) in the derivation we did in the lectures. (The textbook has quite a few typos!) Also, do not forget the stopping condition we discussed.
- What is the result of running your ensemble classifier on the following test data? X = 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0


Submit source code and your output (the round-wise results and the result on the test data) following the styles of Figures 4.46, 4.49, and 4.50 of the textbook.

**Algorithm from textbook:**

Write it here
    

In [170]:
from sklearn.tree import DecisionTreeClassifier

In [366]:
def train_adaboost(XTRAIN, YTRAIN):
    
    SAMPLESIZE = len(XTRAIN)
    # setting initial probabilities
    samples_probabilities = np.repeat(1/SAMPLESIZE, SAMPLESIZE)
    
    selecround_models = list()
    alphas = list()
    #yhats = np.empty((SAMPLESIZE, NROUNDS))
    
    curr_round = 1
    epsilion_threshold = 0.4
    epsilon_t_current = 1.1 # set initial error to maximum
    while epsilon_t_current > epsilion_threshold:
        print("\n\nWorking on round: ", curr_round)

        # a) Create a training dataset D(t) (of the same size n) by sampling (with replacement)
        #from the distribution defined by p(t)
        bootstrap_sample = np.random.choice(np.arange(len(XTRAIN)),
                                               size=len(XTRAIN),
                                               replace=True,
                                           p=samples_probabilities)
        
        print("Records (Indexes) chosen in this round: ")
        print(bootstrap_sample)
        bootstrap_XTRAIN = XTRAIN[bootstrap_sample]
        bootstrap_YTRAIN = YTRAIN[bootstrap_sample]
        
        # b) Create model M(t) using A on D(t) (e.g., if A is a decision tree induction 
        # algorithm, M(t) is a decision tree)
        # max_depth =1 gives us decision stump
        
        base_classifier_t = DecisionTreeClassifier(max_depth=1)
        base_classifier_t.fit(bootstrap_XTRAIN, bootstrap_YTRAIN,sample_weight=samples_probabilities)
        
        yhat_t = base_classifier_t.predict(XTRAIN)
        print(yhat_t != YTRAIN.flatten())
        # c) Calculate the error of M(t) on D (note: not on D(t))
        epsilon_t = 0
        for index in range(SAMPLESIZE):
            y_i = YTRAIN[index][0]
            yhat_t_i = yhat_t[index]
            if y_i!= yhat_t_i:
                # misclassification
                epsilon_t += samples_probabilities[index]
        print("epsilon_t: ", epsilon_t)
        epsilon_t_current = epsilon_t
        # d) Find model M(t)’s weight: α(t) = 1 ln 1−ε(t)
        alpha_t = 0.5 * np.log((1-epsilon_t)/epsilon_t)
        
        print("alpha_t", alpha_t)
        
        #(e) if ε(t) ≥ 0.5:
        #• Re-start the current iteration by setting t = t − 1 and re-initializing each p(t)
        #to 1/n for i = 1,2,...,n; • Go to Step (a)
        
        if epsilon_t > 0.5:
            print(f"Epsilon is greater than 0.5, hence resetting this round")
            # Do not consider this round completion
            # re-initializing each p(t) to 1/n for i = 1,2,...,n
            samples_probabilities = np.repeat(1/SAMPLESIZE, SAMPLESIZE)
        else:
            # consider this round completion
            curr_round += 1
            
            # update the probabilities of the records in D:
            # For i = 1,...,n and (xi,yi) ∈ D:
            
            new_sample_probabilities = []
            for index in range(SAMPLESIZE):
                y_i = YTRAIN[index][0]
                yhat_t_i = yhat_t[index]
                if y_i!= yhat_t_i:
                    # misclassification
                    new_probability_for_this_index = samples_probabilities[index]/(2*epsilon_t)
                else:
                    # correct classification
                    new_probability_for_this_index = samples_probabilities[index]/(2*(1-epsilon_t))
                new_sample_probabilities.append(new_probability_for_this_index)
            print("new_sample_probabilities: ", new_sample_probabilities)
            
            # only append alphas and models here
            alphas.append(alpha_t)
            
        
        
        
        
        #alpha_t = np.log((1- epsilon_t)/epsilon_t)
        #samples_probabilities = np.array([p*(1-epsilon_t)/epsilon_t if yhat_t[i] != YTRAIN[i]
                                    #else p for i, p in enumerate(samples_probabilities)])
        
        #alphas.append(alpha_t)
        #round_models.append(base_classifier_t)
        #yhats[:,t_round] = yhat_t 

    #yhat = np.sign(np.dot(yhats, alphas))
    #3. Obtain the ensemble model M(x) = 􏰂Tt=1 α(t)M(t)(x) for any 
    # (training or test) record x. Output ensemble classification result as sign(M(x)).

        
        

In [367]:
XTRAIN_numpy = training_dataset_df['x'].to_numpy()
YTRAIN_numpy = training_dataset_df['y'].to_numpy()

In [368]:
XTRAIN_numpy = XTRAIN_numpy.reshape(-1,1)
XTRAIN_numpy.shape

(10, 1)

In [369]:
XTRAIN_numpy

array([[0.5],
       [3. ],
       [4.5],
       [4.6],
       [4.9],
       [5.2],
       [5.3],
       [5.5],
       [7. ],
       [9.5]])

In [370]:
YTRAIN_numpy = YTRAIN_numpy.reshape(-1,1)
YTRAIN_numpy.shape

(10, 1)

In [371]:
YTRAIN_numpy

array([[0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0]])

#### TRAINING

In [372]:
train_adaboost(XTRAIN_numpy, YTRAIN_numpy)



Working on round:  1
Records (Indexes) chosen in this round: 
[7 8 5 2 9 2 7 2 5 9]
[ True  True False False  True False False  True False False]
epsilon_t:  0.4
alpha_t 0.2027325540540821
new_sample_probabilities:  [0.125, 0.125, 0.08333333333333334, 0.08333333333333334, 0.125, 0.08333333333333334, 0.08333333333333334, 0.125, 0.08333333333333334, 0.08333333333333334]
