### Core Idea

* Build many **weak learners (decision trees)** sequentially.
* Each new tree tries to correct errors made by previous trees.
* Uses **gradient descent** on a loss function to optimize.

---

### Workflow

1. **Initialize model** with a simple prediction (e.g., mean of target for regression, log-odds for classification).
2. **Compute gradients (residuals):** Errors between prediction and actual.
3. **Fit decision tree** to these residuals.
4. **Update prediction:** Add weighted tree outputs to previous prediction.
5. **Repeat** steps until reaching max trees or convergence.

---

### Objective Function

XGBoost minimizes:

$$
Obj = \sum_i l(y_i, \hat{y}_i) + \sum_k \Omega(f_k)
$$

* $l$: loss function (e.g., log loss for classification, MSE for regression)
* $\Omega$: regularization term = prevents overfitting by penalizing complex trees

---

### Key Features

* **Regularization (L1 + L2)** → controls overfitting.
* **Parallel tree construction** → faster training than standard GBM.
* **Handling missing values** automatically.
* **Tree pruning** (stops splitting if gain < threshold).
* **Learning rate (η)** → scales contribution of each tree.
* **Early stopping** → halts training when no improvement.

---

### Important Hyperparameters

* `n_estimators` → number of trees
* `max_depth` → tree depth
* `learning_rate` → shrinkage factor
* `subsample` → fraction of rows per tree
* `colsample_bytree` → fraction of features per tree
* `lambda`, `alpha` → L2 and L1 regularization

---

### Example Use Cases

* Classification: spam detection, fraud detection, churn prediction
* Regression: house prices, demand forecasting
* Ranking: recommender systems, search ranking

