<a href="https://colab.research.google.com/github/peeka-boo0/ml-learning-journey/blob/main/notebooks/notebook_2/Day_17_01_XGB_parametersByGridsearch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Got it 👍 Let’s do a **full XGBoost tuning guide** all at once with **easy explanations**.

---

## 🔑 XGBoost Hyperparameters (with simple meaning)

### **1. Tree complexity (biggest effect)**

* `n_estimators`: number of trees (more trees = better fit but slower).
* `learning_rate`: how fast the model learns. Small → slow but accurate.
* `max_depth`: how deep the trees go. Big depth → overfit.
* `min_child_weight`: min samples per split. Big → fewer splits → simpler model.

### **2. Sampling (to prevent overfit & speed up)**

* `subsample`: fraction of training samples per tree (like bagging).
* `colsample_bytree`: fraction of features per tree.

### **3. Regularization (to punish complexity)**

* `gamma`: min loss reduction to split. Big gamma → fewer splits.
* `reg_alpha`: L1 regularization (makes model sparse).
* `reg_lambda`: L2 regularization (keeps weights small).

### **4. Other**

* `scale_pos_weight`: balances classes if data is imbalanced.
* `n_jobs`: CPUs to use.

---

In [None]:

## ⚡ Example Full GridSearch with All Important Params

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier

# Load data
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

# Base model
xgb = XGBClassifier(use_label_encoder=False, eval_metric="mlogloss")

# Parameter grid (common choices)
param_grid = {
    "n_estimators": [100, 200],       # number of trees
    "learning_rate": [0.05, 0.1, 0.3], # step size
    "max_depth": [3, 5, 7],           # tree depth
    "min_child_weight": [1, 3, 5],    # min samples per split
    "subsample": [0.8, 1.0],          # % of data per tree
    "colsample_bytree": [0.8, 1.0],   # % of features per tree
    "gamma": [0, 1],                  # min loss reduction
    "reg_alpha": [0, 0.1],            # L1 regularization
    "reg_lambda": [1, 2]              # L2 regularization
}

# Grid search
grid = GridSearchCV(
    xgb,
    param_grid,
    cv=3,          # 3-fold cross-validation
    scoring="accuracy",
    verbose=1,
    n_jobs=-1
)
grid.fit(X_train, y_train)

# Results
print("Best Params:", grid.best_params_)
print("Best CV Score:", grid.best_score_)

best_model = grid.best_estimator_
y_pred = best_model.predict(X_test)
print("Test Accuracy:", accuracy_score(y_test, y_pred))

#Fitting 3 folds for each of 1728 candidates, totalling 5184 fits (oo hell nah my laptop desnot know that mush power exist , it will gona take 25min to creat this result so thats my i didnt run this code)