# XGBoost Tutorial

**XGBoost** (eXtreme Gradient Boosting) is a highly optimized and scalable **ensemble machine learning algorithm** that uses the **gradient boosted decision tree** framework. It is widely known for its exceptional speed, performance, and ability to handle large, complex datasets, often winning machine learning competitions like those on Kaggle. 

---

## üõ†Ô∏è How XGBoost Works (The Boosting Concept)

XGBoost is an advanced implementation of the **Boosting** ensemble technique. Unlike **Random Forest**, where trees are built independently (bagging), XGBoost builds trees **sequentially** and iteratively to correct the errors of the previous trees.

1.  **Initial Prediction:** The process starts with a simple initial prediction (often a constant value).
2.  **Calculate Residuals (Errors):** The difference (residual) between the predicted value and the actual target value is calculated.
3.  **Train a New Tree:** A new, simple decision tree (a "weak learner") is trained specifically to predict these **residuals**.
4.  **Update the Model:** The prediction of this new tree is added to the initial model's prediction, and the process is repeated. Each subsequent tree learns from the **errors** of the **cumulative model** up to that point.
5.  **Final Prediction:** The final prediction is the **sum** of the initial prediction and the weighted predictions of all the sequentially built trees.

This sequential process allows the model to continuously focus on the data points that were previously difficult to classify, turning a sequence of weak learners into a single, strong predictive model.

---

## üöÄ Key Optimization Features

The "eXtreme" in XGBoost refers to its performance optimizations and architectural enhancements compared to traditional gradient boosting:

* **Regularization:** It includes $\text{L}1$ (Lasso) and $\text{L}2$ (Ridge) **regularization** terms in its objective function. This penalizes complex models, which helps prevent **overfitting** and improves generalization.
* **Parallel Processing:** Although the trees are built sequentially, XGBoost utilizes **parallelization** within the tree-building process (for split-finding) to significantly speed up training time.
* **Handling Missing Data:** It has a built-in routine to handle **sparse data** and **missing values** by automatically learning the best direction for a split when a value is absent.
* **Tree Pruning:** It uses a complexity parameter ($\gamma$) to prune trees **after** they are fully grown, which is a more robust technique to combat overfitting than stopping the tree growth prematurely.

---

## üÜö XGBoost vs. Random Forest

| Feature | XGBoost (Boosting) | Random Forest (Bagging) |
| :--- | :--- | :--- |
| **Tree Construction** | Trees are built **sequentially**, correcting the errors of the preceding trees. | Trees are built **in parallel** and independently from random data subsets. |
| **Focus** | Primarily focuses on **reducing bias** (error) by learning from mistakes. | Primarily focuses on **reducing variance** by averaging independent results. |
| **Accuracy** | Often achieves **higher predictive accuracy**, especially on large or heterogeneous datasets. | Generally **highly accurate**, but can sometimes overfit noisy data. |
| **Overfitting** | Highly resistant due to built-in **regularization** and careful tree pruning. | Resistant, but less control over individual tree complexity. |
| **Performance** | **Highly optimized** for speed and scalability (eXtreme) on large datasets. | Efficient, but not as optimized for distributed computing as XGBoost. |

f(x) ‚âà y\
\
f(x) = t1(x) + t2(x) + ... + tn(x)\
\
t1(x ‚âà y\
\
t2(x) ‚âà y - t1(x)\
\
t3(x) ‚âà y - t1(x) - t2(x)\
\
Œ£1 = t1(x)-y\
\
Œ£2 = t2(x) - Œ£1\
\
$\Sigma_3$ = $t_3(x) - Œ£_2$



# Attempt 1

## Import Libraries

In [None]:
import pandas as pd
import numpy as np
import xgboost as xgb

from sklearn.model_selection import train_test_split