```{contents}
```


# Assumptions

## 1. **Weak Learners Should Perform Slightly Better than Random**

* The base learners (often shallow decision trees, called *decision stumps*) should have an accuracy just above random guessing:

  * For binary classification → slightly better than 50% accuracy.
* Boosting works by combining many weak rules, so if each base learner is no better than chance, AdaBoost fails.

---

### 2. **Additive Model of Errors**

* AdaBoost assumes that errors from weak learners can be combined and corrected sequentially.
* Misclassified samples get higher weights → future weak learners focus on them.
* This assumes misclassification can be *reduced step-by-step* instead of being random noise.

---

### 3. **Data is (Relatively) Clean**

* AdaBoost is **sensitive to noisy data and outliers**, because:

  * Misclassified points get higher weights repeatedly.
  * Outliers that are impossible to classify correctly receive disproportionate focus.
* Implicit assumption: dataset has low noise and few extreme outliers.

---

### 4. **Feature Independence Isn’t Required (Unlike Naive Bayes)**

* AdaBoost does **not assume independence of features**.
* It can handle correlated features, but redundant features may make training inefficient.

---

### 5. **Sufficient Number of Weak Learners**

* Boosting assumes that with **enough iterations (weak learners)**, the combined strong learner will converge to a low-error classifier.
* Too few learners → underfitting; too many learners → risk of overfitting (though AdaBoost is surprisingly resistant to overfitting on clean data).

---

### 6. **Weak Learners Should Be Simple**

* Base learners should be simple (e.g., decision stumps or very shallow trees).
* If base learners are too complex (deep trees), boosting loses meaning (becomes just an ensemble of strong models).

---

**Summary**

AdaBoost works best under these assumptions:

* Weak learners perform slightly better than chance.
* Errors can be sequentially corrected.
* Data is relatively clean (not dominated by noise or outliers).
* Enough learners are combined to reduce bias.
* Base learners are simple and diverse.
