# Boosting

This notebook implements boosting, which is an ensemble method that combines several weak models to make a final prediction. Specifically, each model will attempt to correct the mistakes made by the previous models.

Unlike bagging, boosting does not involve sampling the data randomly or training models independently. Instead, boosting adjusts the weights of misclassified data points to focus on the instances that are difficult to predict. Furthermore, boosting is primarily used to reduce model bias. This is thanks to the sequential training of weak models allowing the model to gradually refine its ability to capture complex patterns in the data.

The scikit-learn library has two boosting methods that I will be exploring: AdaBoost and Gradient Boosting.

---

First, load the relevant libraries needed.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Import a nice function for plotting decision boundaries
from mlxtend.plotting import plot_decision_regions

# Set the Seaborn theme
sns.set_theme()

# Import functions to help with training/testing endeavors and evaluate performance
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

# Import functions to perform boosting, with relevant models
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
#from sklearn.ensemble import GradientBoostingRegressor

## The Data

The model will be trained using the [Hawks](https://github.com/kary5678/INDE-577/blob/main/Data/hawks.csv) dataset. This dataset contains observations for three species of hawks, and attributes such as age, sex, wing length, body weight, tail length, etc. 

The code block below reads the dataset into a pandas DataFrame object, subsets the DataFrame to the relevant variables, and drops any rows where there are missing values for these relevant variables.

In [3]:
# Read in the data and subset it to the relevant columns/observations
hawks = pd.read_csv("../../../Data/hawks.csv")
hawks = hawks[["Species", "Wing", "Tail", "Weight", "Culmen", "Hallux"]].dropna(axis=0)
hawks

Unnamed: 0,Species,Wing,Tail,Weight,Culmen,Hallux
0,RT,385.0,219,920.0,25.7,30.1
2,RT,381.0,235,990.0,26.7,31.3
3,CH,265.0,220,470.0,18.7,23.5
4,SS,205.0,157,170.0,12.5,14.3
5,RT,412.0,230,1090.0,28.5,32.2
...,...,...,...,...,...,...
903,RT,380.0,224,1525.0,26.0,27.6
904,SS,190.0,150,175.0,12.7,15.4
905,RT,360.0,211,790.0,21.9,27.6
906,RT,369.0,207,860.0,25.2,28.0


## 1. AdaBoost

### 1A. Logistic Regression

Logistic regression is often used as a weak learner in boosting because it is computationally efficient and has low complexity.

### 1B. Decision Stumps

Decision stumps, which are decision trees with a single split, are often used as the weak learners in boosting. This is because decision stumps have low complexity and can capture simple patterns in the data.

### 1C. Decision Trees
 
Decision trees are the most commonly used base models in boosting due to their simplicity and ability to capture complex interactions between features.

Finally, neural networks can also be used as base models in boosting. However, due to their high complexity, neural networks are often used in combination with other weaker models to create a diverse set of base models.

## 2. Gradient Boosting