### Colab Activity 20.2: Implementing the AdaBoost Algorithm

**Time: 60 minutes**

This activity focuses on using the `AdaBoostClassifier` and the performance resulting from changing the base classifier that is used.  As discussed in the lectures, adaptive boosting is a successive reweighting of data using a set number of estimators.  These weighted estimators are what form the ensemble, and the predictions are a result of a weighted combination of the estimators.  

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, train_test_split

In [2]:
df = pd.read_csv('data/fetal.zip', compression = 'zip')

In [3]:
X = df.drop('fetal_health', axis = 1).values
y = df['fetal_health']

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                   random_state=42)

[Back to top](#-Index)

### Problem 1

#### `AdaBoostClassifier`



Instantiate an `AdaBoostClassifier` estimator with `max_depth=1` and assign it to `ans1` below.

In [None]:

ans1 = ''
    


### ANSWER CHECK
ans1

DecisionTreeClassifier(max_depth=1)

[Back to top](#-Index)

### Problem 2

#### Fitting the Ensemble


Define an `AdaBoostClassifier` estimator with default parameters and to fit to the data `X_train` and `y_train`. Assign this model to `model_1` below.

Assign the accuracy of the model on the test data to `model_1_acc` below.

In [None]:

model_1 = ''
model_1_acc = ''
    


### ANSWER CHECK
print(model_1_acc)

0.881578947368421


[Back to top](#-Index)

### Problem 3

#### Grid Searching the Ensemble


As the documentation states [on this page](https://scikit-learn.org/stable/modules/ensemble.html#usage), the main parameters to search are the number of estimators and the complexity of the base estimator.  

In the code cell below, create a parameter grid that considers the following parameters:

- *number of estimators*: 100, 200
- *max_depths*: 1, 2, 3

Name this grid `params`.

Next, use the grid `params` with the `AdaBoostClassifier` to perform a grid search named `tree_grid` on the train data.  For this step, be sure to set the `random_state = 42` in your `AdaBoostClassifier`. 

Finally, calculate the score on the test data as `grid_acc`.  



In [None]:

params = ''
tree_grid = ''
grid_acc = ''



### ANSWER CHECK
print(grid_acc)

0.9210526315789473


[Back to top](#-Index)

### Problem 4

#### A Different Base Estimator


Consider using a different base estimator such as `LogisticRegression` estimator.  Explore the neighbors parameters with 

- `C = [.001, 0.01, 0.1, 1.0, 10.0]`

Create a `Pipeline` that scales the data first and then implements an `AdaBoostClassifier` with `random_state = 42` and a Logistic Regression model.  Grid search the pipeline with a grid and assign the score on the test data to `score2`. 



In [None]:

score2 = ''

    


### ANSWER CHECK
print(score2)

0.9078947368421053


[Back to top](#-Index)

### Problem 5

#### Evaluating the models


Which model performed the best on the test data?

- `a`: Base `AdaBoostClassifier`
- `b`: Grid Searched Tree Model
- `c`: Grid Searched Logistic Model
- `d`: None of the above

Assign your answer as a string to `ans5` below.

In [None]:

ans5 = ''
    


### ANSWER CHECK
print(ans5)

b
