## 🔹 **Basic Level Questions (1–10)**

1. **What is Gradient Boosting?**
   Gradient Boosting is an ensemble technique that builds models sequentially, where each new model tries to correct the errors of the previous one by minimizing a loss function using gradient descent.



2. **How does Gradient Boosting work?**
   It starts with an initial weak model, calculates residuals (errors), then fits a new model to these residuals. This process continues, and models are added until the errors are minimized.



3. **What is a weak learner in Gradient Boosting?**
   A weak learner is typically a shallow decision tree (e.g., depth 1 or 3) that performs slightly better than random guessing.



4. **How is Gradient Boosting different from AdaBoost?**

   * **AdaBoost** uses weighted errors and updates sample weights.
   * **Gradient Boosting** minimizes a loss function using gradients (residuals).



5. **What is the loss function in Gradient Boosting?**
   It can vary:

   * Regression: MSE or MAE
   * Classification: Log Loss (binary), Multiclass log loss


6. **Why do we use gradient descent in Gradient Boosting?**
   To minimize the chosen loss function by computing its gradient with respect to predictions and adjusting the model accordingly.



7. **What are residuals in Gradient Boosting?**
   The difference between the actual value and the predicted value of the model; these are used to train the next model.



8. **Why are decision trees commonly used as base learners in Gradient Boosting?**
   They can capture nonlinear relationships, are fast to train, and handle different feature types well.



9. **What is learning rate in Gradient Boosting?**
   A hyperparameter that controls how much each model contributes to the overall prediction. Smaller values make learning more robust but slower.



10. **What is shrinkage in Gradient Boosting?**
    Another name for learning rate; it “shrinks” the contribution of each tree to prevent overfitting.

---

## 🔹 **Intermediate Level Questions (11–20)**

11. **What are the main hyperparameters in Gradient Boosting?**

    * Learning rate
    * Number of estimators
    * Max depth
    * Subsample
    * Min samples split / leaf
    * Loss function


12. **What is the effect of increasing the number of estimators?**
    Increases model complexity. Can lead to better performance on training data but risks overfitting if not regularized properly.



13. **How does Gradient Boosting prevent overfitting?**
    Through:

    * Early stopping
    * Learning rate
    * Regularization (e.g., limiting tree depth, min samples)
    * Subsampling (stochastic GB)


14. **What is stochastic gradient boosting?**
    A variation where a random subset of training data is used at each iteration, which improves generalization and reduces overfitting.


15. **What’s the difference between bagging and boosting?**

    * **Bagging** (e.g., Random Forest): models trained independently in parallel
    * **Boosting**: models trained sequentially with each correcting the previous


16. **Can Gradient Boosting be used for classification?**
    Yes, using log loss or softmax loss for binary or multiclass classification.


17. **What is early stopping in Gradient Boosting?**
    A technique to stop training when the validation error starts increasing, to avoid overfitting.


18. **What are some drawbacks of Gradient Boosting?**

    * Computationally intensive
    * Prone to overfitting without tuning
    * Requires careful hyperparameter tuning


19. **How is feature importance calculated in Gradient Boosting?**

    * Gain: improvement in loss from a feature
    * Cover: number of samples affected
    * Frequency: number of times a feature is used in splits


20. **What’s the difference between GBDT and XGBoost?**
    XGBoost is a highly optimized version of Gradient Boosting with:

    * Regularization
    * Parallelization
    * Missing value handling
    * Tree pruning

---

## 🔹 **Advanced Level Questions (21–30)**

21. **What is the role of the negative gradient in Gradient Boosting?**
    It's used as the pseudo-residual (target) for the next model to learn.


22. **How does Gradient Boosting handle multiclass classification?**
    Builds one model per class using softmax or builds trees to minimize multiclass log loss.


23. **Explain Gradient Boosting in terms of function approximation.**
    It's a stage-wise additive model where each stage adds a function (tree) to better approximate the target function by minimizing a loss.


24. **What are pseudo-residuals?**
    The gradient of the loss function with respect to the predictions — used as targets in each stage.


25. **What is the bias-variance trade-off in Gradient Boosting?**

    * Boosting reduces bias significantly
    * But can increase variance if not regularized (e.g., too many trees)


26. **What is the difference between XGBoost and LightGBM?**

    * **XGBoost**: level-wise tree growth
    * **LightGBM**: leaf-wise tree growth (faster but riskier for overfitting)


27. **How does CatBoost differ from XGBoost and LightGBM?**

    * Handles categorical features natively
    * Reduces overfitting via ordered boosting
    * Doesn’t require one-hot encoding


28. **What is histogram-based Gradient Boosting?**
    Converts continuous features into bins (histograms) to speed up training and reduce memory usage (used in LightGBM and sklearn's HistGradientBoosting).


29. **When would you prefer Random Forest over Gradient Boosting?**

    * When speed is critical
    * When data is noisy or requires less tuning
    * For quick baselines


30. **How do you tune hyperparameters in Gradient Boosting?**
    Use techniques like:

    * Grid search or RandomizedSearchCV
    * Bayesian optimization (e.g., Optuna)
    * Tune in order: learning rate → n\_estimators → max\_depth → min\_child → subsample → regularization