https://phdinds-aim.github.io/time_series_handbook/08_WinningestMethods/lightgbm_m5_tuning.html

#### Basic Rules:

1. Number of Estimators/Trees
2. Depth of Tree (i) -> Num of Leaves 2**i (total leaves in a tree)
3. Learning Rate
4. Boosting type


✔ For large datasets, tune `max_depth` to avoid deep trees that slow down training.

✔ For small datasets, tune `min_data_in_leaf` to prevent overfitting.

✔ Start with `min_data_in_leaf`, then adjust `max_depth` only if needed.

🚀 Best practice: Set `min_data_in_leaf` first, then tune `max_depth` as needed!


    A[Start] --> B[Set Objective & Metric]
    B --> C[Tune Model Complexity]
    C --> D[Regularization to Prevent Overfitting]
    D --> E[Optimize Speed]
    E --> F[Enable GPU (Optional)]
    F --> G[Tuning Strategies]
    G --> H1[Grid Search (Slow but Exhaustive)]
    G --> H2[Bayesian Optimization (Fast & Efficient)]
    H1 --> I[Evaluate Model Performance]
    H2 --> I
    I --> J[Select Best Hyperparameters]
    J --> K[Train Final Model]
    K --> L[Deploy Model]


---

### **📌 Detailed Table for LightGBM Hyperparameter Tuning**

| **Category**          | **Hyperparameter**        | **Effect** | **Recommended Values** |
|----------------------|------------------------|------------|----------------|
| **1. Core Settings**  | `objective` | Defines the task type (Regression, Classification, etc.) | `"regression"`, `"binary"`, `"multiclass"` |
|                      | `metric` | Evaluation metric for the model | `"rmse"`, `"mae"`, `"logloss"`, `"auc"` |
|                      | `learning_rate` | Controls step size per iteration | `0.01 - 0.1` (Lower = better generalization) |
|                      | `n_estimators` | Number of boosting rounds (trees) | `500 - 5000` (Use **early stopping**) |
| **2. Model Complexity** | `num_leaves` | Controls tree complexity (More = Higher Accuracy, Risk of Overfitting) | `20 - 150` (Start with `31`) |
|                      | `max_depth` | Limits the depth of trees | `4 - 12` |
|                      | `min_data_in_leaf` | Minimum samples per leaf node | `10 - 50` |
|                      | `feature_fraction` | Uses a subset of features for each tree | `0.6 - 1.0` |
| **3. Regularization** | `lambda_l1` | L1 Regularization (Lasso) | `0 - 1` (Start with `0.1`) |
|                      | `lambda_l2` | L2 Regularization (Ridge) | `0 - 1` |
|                      | `min_gain_to_split` | Minimum gain required for a split | `0 - 0.1` |
| **4. Speed Optimization** | `max_bin` | Number of bins for feature discretization | `255 - 512` |
|                      | `bagging_fraction` | Uses only part of data per iteration | `0.6 - 0.9` |
|                      | `bagging_freq` | Frequency of bagging (0 = off) | `1 - 5` |
|                      | `num_threads` | Number of CPU cores used | `-1` (Auto-detect) |
| **5. GPU Acceleration** | `device` | Enable GPU support | `"gpu"` |
|                      | `gpu_platform_id` | Select GPU platform | `0` |
|                      | `gpu_device_id` | Select GPU device | `0` |
| **6. Tuning Strategies** | `Grid Search` | Exhaustive parameter search | `Slow but finds optimal values` |
|                      | `Bayesian Optimization (Optuna)` | Finds best hyperparameters efficiently | `Fast & Recommended` |


| **Boosting Type** | **Description** | **Speed** | **Overfitting Prevention** | **Best For** |
|------------------|---------------|----------|------------------------|--------------|
| `gbdt` (Gradient Boosting Decision Trees) | Default method, sequentially improves predictions using gradient descent | ✅ Fast | ⚠️ Can Overfit | General-purpose machine learning tasks |
| `rf` (Random Forest Mode) | Trees are trained independently using bagging (random subsampling) | ❌ Slower | ✅ Strong | High variance reduction (large datasets) |
| `dart` (Dropouts Meet Multiple Additive Regression Trees) | Randomly drops trees to prevent overfitting, improving generalization | ❌ Slower | ✅ Best | Small datasets prone to overfitting |
| `goss` (Gradient-based One-Side Sampling) | Prioritizes high-gradient samples while reducing low-impact data points | 🚀 Fastest | ⚠️ Medium | Imbalanced datasets, faster training |

### **📌 Choosing the Right Boosting Type**
- ✅ **Use `"gbdt"`** for most cases (default and well-balanced).  
- ✅ **Use `"rf"`** when you want a **random forest-like approach**.  
- ✅ **Use `"dart"`** for **small datasets prone to overfitting**.  
- ✅ **Use `"goss"`** for **imbalanced datasets with faster training**.  
