# Mathematical Intuition Behind AdaBoost Boosting Technique
## Introduction to AdaBoost  
AdaBoost (Adaptive Boosting) is a powerful ensemble technique that combines multiple "weak learners" to form a strong predictive model. Each weak learner attempts to correct the errors made by previous learners by focusing more on the misclassified instances. This is achieved by dynamically adjusting the weights of data points after each iteration.

## Weak Learners  
A weak learner is a model that performs slightly better than random guessing (accuracy > 50% for binary classification). Decision stumps (single-level decision trees) are commonly used as weak learners in AdaBoost.

## AdaBoost Model Construction  
AdaBoost iteratively trains weak classifiers $f_m(x)$ and assigns them weights based on their performance. The final strong classifier is computed as a weighted sum of these weak learners:

$F(x) = \sum_{m=1}^{M} \alpha_m f_m(x)$

Where:
- $\alpha_m$ is the weight assigned to weak learner $f_m(x)$
- $f_m(x)$ is the output of the weak learner

### Weight Update  
After each weak learner, the weights of incorrectly classified samples are increased, making them more important for the next learner.
The weight update for sample $i$ is given by:

$w_i^{(m+1)} = w_i^{(m)} \times e^{\alpha_m \cdot I(y_i \neq f_m(x_i))}$

Where:
- $w_i^{(m)}$ is the current weight of sample $i$
- $\alpha_m$ is the weight of the current weak learner
- $I(y_i \neq f_m(x_i))$ is an indicator function that equals 1 if the sample is misclassified and 0 otherwise

## Example Dataset  
| Feature 1 ($X_1$) | Feature 2 ($X_2$) | Label ($Y$) |
|----------------|----------------|-----------|
| 1              | 2              | +1        |
| 2              | 1              | -1        |
| 1              | 1              | +1        |
| 2              | 2              | -1        |

## Decision Stump  
A decision stump is a weak learner that splits the data based on a single feature. For example:
- **Stump 1:** If $X_1 < 1.5$, predict +1; otherwise, predict -1
- **Stump 2:** If $X_2 < 1.5$, predict +1; otherwise, predict -1

## Choosing the Best Stump (Feature Selection) Using Entropy  
To select the best feature for the stump, we compute the **information gain** by evaluating the entropy before and after splitting:

### Entropy Formula
$H(S) = - \sum_{c \in \{+1, -1\}} p(c) \log_2 p(c)$

Where $p(c)$ is the proportion of class $c$ in the dataset.

## Weight Normalization  
The weights $w_i$ are normalized after each iteration to ensure they sum to 1:

$w_i^{(m+1)} \leftarrow \frac{w_i^{(m+1)}}{\sum_{i=1}^{N} w_i^{(m+1)}}$

## Giving Weights to the Second Weak Learner  
1. Compute the error rate of the first weak learner:
   
   $\epsilon_m = \frac{\sum_{i=1}^{N} w_i^{(m)} I(y_i \neq f_m(x_i))}{\sum_{i=1}^{N} w_i^{(m)}}$

2. Compute the weight for the weak learner:
   
   $\alpha_m = \frac{1}{2} \log \left( \frac{1 - \epsilon_m}{\epsilon_m} \right)$

3. Update the weights for the next weak learner.

By iteratively adjusting weights and focusing on difficult samples, AdaBoost builds a strong ensemble model capable of achieving high accuracy.