<div style="padding:20px;color:white;margin:0;font-size:270%;text-align:center;display:fill;border-radius:5px;background-color:#4f4e4e;overflow:hidden;font-weight:500">Extreme Gradient Boosting (XGBoost) </br> Wrangling with Hyperparameters</div>

<div style="padding:20px;color:black;margin:0;font-size:200%;text-align:center;display:fill;border-radius:5px;background-color:#d9d9d9;overflow:hidden;font-weight:500">Extreme Gradient Boosting (XGBoost)</div>

# 🧾 1. Extreme Gradient Boosting (XGBoost)

XGBoost is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It works on Linux, Windows, and macOS. From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting Library".

Extreme Gradient Boosting Algorithm. Gradient boosting refers to **a class of ensemble machine learning algorithms that can be used for classification or regression predictive modeling problems**. **Ensembles are constructed from decision tree models**.

**XGBoost is a more regularized form of Gradient Boosting**. XGBoost uses advanced regularization (L1 & L2), which improves model generalization capabilities. XGBoost delivers high performance as compared to Gradient Boosting. Its training is very fast and can be parallelized across clusters.

Extreme Gradient Boosting (xgboost) is similar to gradient boosting framework but more efficient. It has both linear model solver and tree learning algorithms. So, what makes it fast is its capacity to do parallel computation on a single machine.

<div align="center">
  <img src="https://cdn.educba.com/academy/wp-content/uploads/2019/06/XGBoost-Algorithm1.jpg" />
</div>

*educba*

<div style="padding:20px;color:black;margin:0;font-size:200%;text-align:center;display:fill;border-radius:5px;background-color:#d9d9d9;overflow:hidden;font-weight:500"> Introduction to Hyperparameters</div>

# 🛠  2. Hyperparameters

### What is meant by hyperparameter tuning?
In machine learning, **hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm**. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned. The hyper-parameter tuning process is a tightrope walk to achieve a balance between underfitting and overfitting. Underfitting is when the machine learning model is unable to reduce the error for either the test or training set.

### What is hyperparameter tuning example?
Some examples of model hyperparameters include: The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization. The learning rate for training a neural network. The C and sigma hyperparameters for support vector machines.

### Why do we use hyperparameter tuning?
Hyperparameter tuning is an essential part of controlling the behavior of a machine learning model. If we don't correctly tune our hyperparameters, our estimated model parameters produce suboptimal results, as they don't minimize the loss function. This means our model makes more errors.

<div style="padding:20px;color:black;margin:0;font-size:200%;text-align:center;display:fill;border-radius:5px;background-color:#d9d9d9;overflow:hidden;font-weight:500"> XGBoost Parameters - Classification</div>

# ⌛️ 3. XGBoost Hyperparameters for Classification

<div style="padding:10px;color:black;margin:0;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>XGBoost Boosters</b></div>

#### -> gbtree :
 - **gbtree** for **tree-based models**;  
 
#### -> gblinear :
 - **gblinear** for **linear models** to run at each iteration.

#### -> dart :
 - **dart** is also a tree based model. 
 - XGBoost mostly combines a huge number of regression trees with a small learning rate. In this situation, trees added early are significant and trees added late are unimportant. *Vinayak and Gilad-Bachrach* proposed a new method to add dropout techniques from the deep neural net community to boosted trees, and reported better results in some situations - it s called dart. It drops trees in order to solve the over-fitting. Trivial trees (to correct trivial errors) may be prevented. Because of the randomness introduced in the training, expect the following few differences: Training can be slower than gbtree because the random dropout prevents usage of the prediction buffer. The early stop might not be stable, due to the randomness.


<div style="padding:10px;color:black;margin:0;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>General parameters</b></div>
It relates to which booster we are using to do boosting, commonly tree or linear model
​
#### 1. booster [default=gbtree]
 - **gbtree** for **tree-based models**;  **gblinear** for **linear models** to run at each iteration.
 
#### 2. silent [default=0]:
 - **To activate Silent silent mode, set it to 1,meaning off; so, no messages will be printed.**
​
It’s generally a good idea, to keep it 0 as the messages might help in understanding the model; and how the metrics are going.
​
#### 3. nthread [default to maximum number of threads available if not set]
This is used for parallel processing and number of cores in the system should be entered
If you wish to run on all cores, value should not be entered and algorithm will detect automatically
​

<div style="padding:10px;color:black;margin:0;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Booster parameters</b></div>

depend on which booster you have chosen. We will discuss about tree based boosters here.

#### 1. eta [default=0.3]
- Similiar to **learning_rate**; works with geeneralization.
- **Typical final values to be used:** 0.01-0.2

#### 2. min_child_weight [default=1]
- Defines the **minimum sum of weights of all observations required in a child**. It is **used to control over-fitting**. Higher values prevent a model from learning - relations which might be highly specific to the particular sample selected for a tree.
- Too high values can lead to under-fitting hence, it should be tuned using CV.

#### 3. max_depth [default=6]
- The **maximum depth of a tree**, it is also used **to control over-fitting** as higher depth will allow model to learn relations very specific to a particular sample. Should be tuned using CV.
- **Typical values:** 3-10

#### 4. max_leaf_nodes
- The **maximum number of terminal nodes** or leaves in a tree.
- Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves. If this is defined, GBM will ignore max_depth.

#### 5. gamma [default=0]
- A node is split only when the resulting split gives a positive reduction in the loss function. **Gamma specifies the minimum loss reduction required for a split to occur.**
- Makes the algorithm conservative. The values can vary depending on the loss function and should be tuned. **Can be used to control overfitting.**

#### 6. max_delta_step [default=0]
- In **maximum delta step we allow each tree’s weight estimation to be**. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative.
- This is generally not used.

#### 7. subsample [default=1]
- It denotes the **fraction of observations to be randomly samples for each tree**.
- **Lower values make the algorithm more conservative and prevents overfitting** but too small values might lead to under-fitting.
- **Typical values:** 0.5-1

#### 8. colsample_bytree [default=1]
- Denotes the **fraction of columns to be randomly samples for each tree**. Smaller colsample_bytree povides additional regulerization.
- **Typical values:** 0.5-1

#### 9. colsample_bylevel [default=1]
- Denotes **the subsample ratio of columns for each split, in each level**.

#### 10. alpha [default=0]
- **L1 regularization term**.
- L1 regularization forces the weights of uninformative features to be zero by substracting a small amount from the weight at each iteration and thus making the weight zero, eventually. It is also called regularization for simplicity. Applies on leaf weights (rather than feature weights); larger value mean more regularization. 

#### 11. lambda [default=1]
- **L2 regularization term**.
- This used to handle the regularization part of XGBoost. It should be explored to reduce overfitting.
- L2 regularization acts like a force that removes a small percentage of weights at each iteration. Therefore, weights will never be equal to zero. L2 regularization penalizes (weight)² There is an additional parameter to tune the L2 regularization term which is called regularization rate (lambda).
- Smoother than alpha. Also, applies on leaf weights.

#### 12. scale_pos_weight [default=1]
- A value **greater than 0 should be used** in case of high class imbalance as it helps in faster convergence.

<div style="padding:10px;color:black;margin:0;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Learning task parameters</b></div>

decide on the learning scenario. For example, regression tasks may use different parameters with ranking tasks.

#### 1. objective [default=reg:linear]
- This **defines the loss function to be minimized**. 
- Mostly used values are:
 - **reg:linear** - used for regressions
 - **reg:logistic** - used for classification, when you want the decision only, not the probability.
 - **binary:logistic** –logistic regression for binary classification, returns predicted probability (rather than decision class) 
 - **multi:softmax** –multiclass classification using the softmax objective, returns predicted class (not probabilities); you also need to set an additional num_class (number of classes) parameter defining the number of unique classes
 - **multi:softprob** –same as softmax, but returns predicted probability of each data point belonging to each class.
 
#### 2. eval_metric [ default according to objective ]
- The metric to be used for validation data.
- The default values are rmse for regression and error for classification.
- **Typical values are:**
 - **rmse** – root mean square error
 - **mae** – mean absolute error
 - **logloss** – negative log-likelihood
 - **error** – Binary classification error rate (0.5 threshold)
 - **merror** – Multiclass classification error rate
 - **mlogloss** – Multiclass logloss
 - **auc** – Area under the curve
 
#### 3. seed [default=0]
- for reproducibility.

<div style="padding:10px;color:black;margin:0;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Command line parameters</b></div>

- relate to behavior of CLI version of XGBoost.

<div style="padding:20px;color:black;margin:0;font-size:200%;text-align:center;display:fill;border-radius:5px;background-color:#d9d9d9;overflow:hidden;font-weight:500">Understanding Bias-Variance Tradeoff</div>

# 🧮 4. Understanding Bias-Variance Tradeoff

If you take a machine learning or statistics course, this is likely to be one of the most important concepts. When we allow the model to get more complicated (e.g. more depth), the model has better ability to fit the training data, resulting in a less biased model. However, such complicated model requires more data to fit.

Most of parameters in XGBoost are about bias variance tradeoff. The best model should trade the model complexity with its predictive power carefully. Parameters Documentation will tell you whether each parameter will make the model more conservative or not. This can be used to help you turn the knob between complicated model and simple model.

## More Resources :
- [📋 Bias-Variance Tradeoff ➡️ with NumPy & Seaborn](https://www.kaggle.com/code/azminetoushikwasi/bias-variance-tradeoff-with-numpy-seaborn)
- [Mastering Bias-Variance Tradeoff wih Polynomials](https://medium.com/@azmine_wasi/mastering-bias-variance-tradeoff-with-polynomials-part-02-29f9bb53bb26)

In [13]:
import numpy as np 
import pandas as pd 
import os
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import warnings
import json
from sklearn import manifold

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import OrdinalEncoder

import xgboost as xgb

from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.model_selection import StratifiedKFold

from sklearn.metrics import roc_auc_score
from sklearn.metrics import classification_report
from sklearn.model_selection import cross_val_score, KFold
from sklearn.metrics import confusion_matrix
from xgboost import plot_tree

warnings.filterwarnings('ignore')

In [14]:
df_train=pd.read_csv('../spaceship_titanic/spaceship-titanic/train.csv')
df_test=pd.read_csv('../spaceship_titanic/spaceship-titanic/test.csv')
def info_of_dataset(df):
    df1=df.dtypes
    df2=df.isnull().sum()
    df3=df.isnull().sum()/df.shape[1]
    df4=pd.DataFrame([df1,df2,df3])
    return df4.T

In [15]:
info_of_dataset(df_train)

Unnamed: 0,0,1,2
PassengerId,object,0,0.0
HomePlanet,object,201,14.357143
CryoSleep,object,217,15.5
Cabin,object,199,14.214286
Destination,object,182,13.0
Age,float64,179,12.785714
VIP,object,203,14.5
RoomService,float64,181,12.928571
FoodCourt,float64,183,13.071429
ShoppingMall,float64,208,14.857143


In [16]:
def process_df(df):
    df=df.drop(['Name','PassengerId'],axis=1)
    df=df.dropna()
    
    target=df['Transported']
    df=df.drop(['Transported'],axis=1)
    target = target.astype(int)
    
    df['Cabin_1']= df['Cabin'].str[0]
    df['Cabin_2']= df['Cabin'].str[2]
    df['Cabin_3']= df['Cabin'].str[5]
    df=df.drop(['Cabin'],axis=1)
                                             
    
    # Create the training and test datasets
    X_train, X_test, y_train, y_test = train_test_split(df, 
                                                    target, 
                                                    test_size = 0.2, 
                                                    random_state=100,
                                                    stratify=target)

    numaric_columns=list(df.select_dtypes(include=np.number).columns)
    print("Numaric columns ("+str(len(numaric_columns))+") :",", ".join(numaric_columns))
    
    cat_columns=df.select_dtypes(include=['object']).columns.tolist()
    print("Categorical columns ("+str(len(cat_columns))+") :",", ".join(cat_columns))
    
    
    X_train_n=X_train[numaric_columns]
    X_test_n=X_test[numaric_columns]
    
    X_train_c=X_train[cat_columns]
    X_test_c=X_test[cat_columns]
                               
   
    encoder=OrdinalEncoder()#顺序编码
    X_train_c = encoder.fit_transform(X_train_c)
    X_train_c=pd.DataFrame(X_train_c)
    X_test_c = encoder.transform(X_test_c)
    X_test_c=pd.DataFrame(X_test_c)
    
    i=1
    for column in X_train_c:
        X_train_n["cat_"+str(i)]=X_train_c[column]
        X_test_n["cat_"+str(i)]=X_test_c[column]
        i=i+1
    
    #X_train=pd.concat([X_train_n,X_train_c],axis=1,ignore_index=True)
    #X_test=pd.concat([X_test_n,X_test_c],axis=1,ignore_index=True)
    
    X_train_n=X_train_n.fillna(X_train_n.mean())
    X_test_n=X_test_n.fillna(X_test_n.mean())
    
    
    return X_train_n, X_test_n, y_train, y_test

<div style="padding:10px;color:black;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Preparing functions for making things easier later</b></div>

In [17]:
X_train,X_test,y_train,y_test=process_df(df_train)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Numaric columns (6) : Age, RoomService, FoodCourt, ShoppingMall, Spa, VRDeck
Categorical columns (7) : HomePlanet, CryoSleep, Destination, VIP, Cabin_1, Cabin_2, Cabin_3
(5411, 13) (1353, 13) (5411,) (1353,)


In [19]:
X_train.head(20)

Unnamed: 0,Age,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,cat_1,cat_2,cat_3,cat_4,cat_5,cat_6,cat_7
6279,29.0,2.0,3.0,1501.0,26.0,0.0,0.655172,0.351124,1.467466,0.02099,4.265967,3.250675,3.208865
4457,29.0,574.0,0.0,4940.0,1831.0,2.0,1.0,1.0,2.0,0.0,1.0,8.0,12.0
3518,42.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,3.0,0.0
4668,46.0,0.0,4.0,834.0,0.0,32.0,0.0,0.0,2.0,0.0,5.0,9.0,0.0
4947,25.0,662.0,34.0,0.0,79.0,148.0,0.0,1.0,2.0,0.0,6.0,1.0,10.0
4891,25.0,10.0,0.0,908.0,5.0,0.0,1.0,1.0,0.0,0.0,1.0,3.0,0.0
1766,49.0,0.0,0.0,0.0,724.0,0.0,1.0,0.0,2.0,0.0,2.0,1.0,0.0
4415,35.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,2.0,0.0,2.0,2.0,0.0
3659,18.0,659.0,0.0,2316.0,0.0,23.0,0.0,1.0,1.0,0.0,6.0,7.0,0.0
684,38.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,0.0


In [None]:
import wandb

wandb.init(project="xgb", entity="surviffer")


#### Base parameters : 
- booster = "gbtree",      **as it is a classification problem**
- objective = "binary:logistic",    **as per the problem**
- tree_method="gpu_hist"   **using GPU**

In [20]:
def xgb_helper(PARAMETERS,V_PARAM_NAME=False,V_PARAM_VALUES=False,BR=10):
    
    temp_dmatrix =xgb.DMatrix(data=X_train, label=y_train)
    
    if V_PARAM_VALUES==False:
        cv_results = xgb.cv(dtrain=temp_dmatrix, nfold=5,num_boost_round=BR,params=PARAMETERS, as_pandas=True, seed=123 )
        return cv_results
    
    else:
        results=[]
        
        for v_param_value in V_PARAM_VALUES:
            PARAMETERS[V_PARAM_NAME]=v_param_value
            cv_results = xgb.cv(dtrain=temp_dmatrix, nfold=5,num_boost_round=BR,params=PARAMETERS, as_pandas=True, seed=123)
            results.append((cv_results["train-auc-mean"].tail().values[-1],cv_results["test-auc-mean"].tail().values[-1]))
            
        data = list(zip(V_PARAM_VALUES, results))
        print(pd.DataFrame(data,columns=[V_PARAM_NAME,"auc"]))
        
        return cv_results

<div style="padding:10px;color:black;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Create a general base model and evaluate performance</b></div>

In [21]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc"}
xgb_helper(PARAMETERS)

Unnamed: 0,train-auc-mean,train-auc-std,test-auc-mean,test-auc-std
0,0.858413,0.002187,0.821862,0.005628
1,0.870282,0.001689,0.82869,0.004939
2,0.876322,0.000856,0.835879,0.002703
3,0.880106,0.001456,0.838758,0.004261
4,0.884137,0.001624,0.838674,0.003967
5,0.887057,0.001697,0.841544,0.00351
6,0.889695,0.002341,0.842214,0.005057
7,0.891214,0.002497,0.842708,0.005743
8,0.892886,0.002178,0.843644,0.005887
9,0.893928,0.002417,0.844092,0.005543


<div style="padding:10px;color:black;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Optimizing number of boosting rounds</br>(as we will be using DMatrix from xgb)</b></div>

In [22]:
# Create the DMatrix: housing_dmatrix
housing_dmatrix =xgb.DMatrix(data=X_train, label=y_train)

# Create the parameter dictionary for each tree: params 
params = {"objective":"binary:logistic", "max_depth":5}

# Create list of number of boosting rounds
num_rounds = [5, 10, 15, 20, 25]

# Empty list to store final round rmse per XGBoost model
final_rmse_per_round = []

# Iterate over num_rounds and build one model per num_boost_round parameter
for curr_num_rounds in num_rounds:

    # Perform cross-validation: cv_results
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=5, num_boost_round=curr_num_rounds, metrics="auc", as_pandas=True, seed=123)
    
    # Append final round RMSE
    final_rmse_per_round.append(cv_results["test-auc-mean"].tail().values[-1])

# Print the resultant DataFrame
num_rounds_rmses = list(zip(num_rounds, final_rmse_per_round))
print(pd.DataFrame(num_rounds_rmses,columns=["num_boosting_rounds","auc"]))

   num_boosting_rounds       auc
0                    5  0.840033
1                   10  0.843594
2                   15  0.845747
3                   20  0.845400
4                   25  0.844094


### Pointers:
- Taking num_boosting_rounds = 10; to avoid overfitting

<div style="padding:10px;color:black;padding-left: 45px;font-size:150%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>   1. Choose the learning rate. Maybe you can start with a higher one. 0.5 - 0.1 is ok for starting in most cases.</b></div>

In [10]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","learning_rate": 0.5}
xgb_helper(PARAMETERS)

Unnamed: 0,train-auc-mean,train-auc-std,test-auc-mean,test-auc-std
0,0.858413,0.002187,0.821862,0.005628
1,0.872958,0.002187,0.831666,0.004109
2,0.879817,0.001506,0.836776,0.005321
3,0.884087,0.002211,0.838199,0.005545
4,0.888099,0.002316,0.840862,0.006667
5,0.89092,0.002014,0.84152,0.005769
6,0.893914,0.00133,0.842223,0.005199
7,0.896395,0.002359,0.842609,0.005717
8,0.897477,0.002852,0.84267,0.005631
9,0.900422,0.002209,0.842716,0.005787


### Pointers:
- A good starting Classifier, with **train-auc-mean of 0.900422, test-auc-mean of 0.842716. Let;s keep tuning and reduce overfitting.**

<div style="padding:10px;color:black;padding-left: 45px;font-size:150%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>  2. Using CV, tune max_depth and min_child_weight next.</b></div>

<div style="padding:6px;color:black;padding-left: 60px;font-size:120%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>  2.1. Tuning max_depth.</b></div>

**Tips:** Keep it around 3-10.

In [23]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","learning_rate": 0.5}
V_PARAM_NAME="max_depth"
V_PARAM_VALUES=range(3,10,1)

data=xgb_helper(PARAMETERS,V_PARAM_NAME=V_PARAM_NAME,V_PARAM_VALUES=V_PARAM_VALUES);

   max_depth                                       auc
0          3  (0.8643899441624237, 0.8434407018878254)
1          4  (0.8762391828253827, 0.8432457875673736)
2          5  (0.8898517716460612, 0.8449580853400329)
3          6  (0.9004222967227931, 0.8427158084686912)
4          7  (0.9107993432039982, 0.8379593667718235)
5          8  (0.9233976130022082, 0.8309442810154544)
6          9  (0.9331601626845524, 0.8328975608830239)


### Pointers:
- Taking max_depth 5, as per the score.

<div style="padding:6px;color:black;padding-left: 60px;font-size:120%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500">
<b>  2.2. Tuning min_child_weigh.</b></div>

**Tips:** Keep it small for imbalanced datasets,good for balanced

In [24]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","learning_rate": 0.5,"max_depth":5}
V_PARAM_NAME="min_child_weight"
V_PARAM_VALUES=range(0,5,1)

data=xgb_helper(PARAMETERS,V_PARAM_NAME=V_PARAM_NAME,V_PARAM_VALUES=V_PARAM_VALUES);

   min_child_weight                                       auc
0                 0  (0.8923932136315347, 0.8400307515925263)
1                 1  (0.8898517716460612, 0.8449580853400329)
2                 2  (0.8878844356167308, 0.8430174554684318)
3                 3   (0.8842988914848681, 0.842868768786811)
4                 4  (0.8841976959835126, 0.8426672020116197)


### Pointers:
- Taking min_child_weight 1, as per the score.

<div style="padding:10px;color:black;padding-left: 45px;font-size:150%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>  3. Its time for gamma.</b></div>

**Tips:** Keep it small like 0.1-0.2 forstarting. Will be tuned later.

In [25]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","learning_rate": 0.5,"max_depth":5,"min_child_weight":1}
V_PARAM_NAME = "gamma"
V_PARAM_VALUES = [0.1,0.2,0.5,1,1.5,2]

data=xgb_helper(PARAMETERS,V_PARAM_NAME=V_PARAM_NAME,V_PARAM_VALUES=V_PARAM_VALUES);

   gamma                                       auc
0    0.1   (0.889994877151358, 0.8443115370456209)
1    0.2  (0.8908020107961383, 0.8447684831269007)
2    0.5   (0.888948000423167, 0.8450077335411604)
3    1.0   (0.888184172336776, 0.8451616022595031)
4    1.5  (0.8875863399894248, 0.8435783340359313)
5    2.0  (0.8861811009373095, 0.8449147712912233)


### Pointers:
- Taking gamma t 1, as per the score.

<div style="padding:10px;color:black;padding-left: 45px;font-size:150%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>  4. Tune subsample and colsample_bytree.</b></div>

<div style="padding:6px;color:black;padding-left: 60px;font-size:120%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500">
<b>  4.1. Tuning subsample.</b></div>

**Tips:** Keep it small in range 0.5-0.9.

In [26]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","learning_rate": 0.5,"max_depth":5,"min_child_weight":1,"gamma":1}
V_PARAM_NAME = "subsample"
V_PARAM_VALUES = [.4,.5,.6,.7,.8,.9]

data=xgb_helper(PARAMETERS,V_PARAM_NAME=V_PARAM_NAME,V_PARAM_VALUES=V_PARAM_VALUES);

   subsample                                       auc
0        0.4    (0.8749532101660904, 0.83281692487631)
1        0.5  (0.8789542430190034, 0.8356286366243391)
2        0.6  (0.8804439995005579, 0.8371190653296372)
3        0.7  (0.8852418637174774, 0.8388110573107215)
4        0.8  (0.8868771320489373, 0.8385871061415084)
5        0.9  (0.8880784719598278, 0.8414592031099557)


### Pointers:
- Taking 0.7

<div style="padding:6px;color:black;padding-left: 60px;font-size:120%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500">
<b>  4.2. Tune colsample_bytree.</b></div>

**Tips:** Keep it small in range 0.5-0.9.

In [27]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","learning_rate": 0.5,"max_depth":5,"min_child_weight":1,"gamma":1,"subsample":0.7}
V_PARAM_NAME = "colsample_bytree"
V_PARAM_VALUES = [.4,.5,.6,.7,.8,.9]

data=xgb_helper(PARAMETERS,V_PARAM_NAME=V_PARAM_NAME,V_PARAM_VALUES=V_PARAM_VALUES);

   colsample_bytree                                       auc
0               0.4  (0.8732754700508629, 0.8243029336177725)
1               0.5  (0.8763308230288717, 0.8286368940154538)
2               0.6  (0.8797842450245362, 0.8345199439450719)
3               0.7   (0.882606499584198, 0.8387111950766428)
4               0.8  (0.8816423223073414, 0.8372514630828988)
5               0.9   (0.882945245407946, 0.8377939587450118)


### Pointers:
- Taking 0.8

<div style="padding:6px;color:black;padding-left: 60px;font-size:120%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500">
<b>  4.3. Tune scale_pos_weight.</b></div>

**Tips:** Based on class imbalance.

In [29]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","learning_rate": 0.5,"max_depth":5,"min_child_weight":1,
            "gamma":1,"subsample":0.7,"colsample_bytree":.7}

V_PARAM_NAME = "scale_pos_weight"
V_PARAM_VALUES = [.5,1,2]

data=xgb_helper(PARAMETERS,V_PARAM_NAME=V_PARAM_NAME,V_PARAM_VALUES=V_PARAM_VALUES);

   scale_pos_weight                                       auc
0               0.5  (0.8788106918058312, 0.8392510901997741)
1               1.0   (0.882606499584198, 0.8387111950766428)
2               2.0  (0.8820958822662963, 0.8372847814140968)


### Pointers:
- Taking 1

<div style="padding:10px;color:black;padding-left: 45px;font-size:150%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>  5. Tuning Regularization Parameters (alpha,lambda).</b></div>

<div style="padding:6px;color:black;padding-left: 60px;font-size:120%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500">
<b>  5.1. Tune alpha.</b></div>

**Tips:** Based on class imbalance.

In [31]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","learning_rate": 0.5,"max_depth":5,"min_child_weight":1,
            "gamma":1,"subsample":0.7,"colsample_bytree":.7, "scale_pos_weight":1}

V_PARAM_NAME = "reg_alpha"
V_PARAM_VALUES = np.linspace(start=0.001, stop=1, num=20).tolist()

data=xgb_helper(PARAMETERS,V_PARAM_NAME=V_PARAM_NAME,V_PARAM_VALUES=V_PARAM_VALUES);

    reg_alpha                                       auc
0    0.001000  (0.8825902776110934, 0.8384799697154699)
1    0.053579  (0.8827765101105249, 0.8379853999988344)
2    0.106158  (0.8832475962307849, 0.8375166770723563)
3    0.158737  (0.8827204177159876, 0.8389957032452664)
4    0.211316   (0.8819419237829805, 0.838982933856221)
5    0.263895  (0.8816977163693667, 0.8371289055016309)
6    0.316474  (0.8819867012809773, 0.8372788608832871)
7    0.369053    (0.88093476339202, 0.8375452674205288)
8    0.421632  (0.8815444908724631, 0.8391031538913749)
9    0.474211  (0.8814753024250257, 0.8378582929768082)
10   0.526789  (0.8811426042809007, 0.8380137049217223)
11   0.579368  (0.8809835952946458, 0.8381551718822834)
12   0.631947  (0.8803760144680881, 0.8397224351644514)
13   0.684526  (0.8803515938517756, 0.8395350359151322)
14   0.737105  (0.8793435692692295, 0.8376811108037694)
15   0.789684  (0.8795401523438638, 0.8391728378912641)
16   0.842263  (0.8789200872826889, 0.8392052025

### Pointers:
- Taking .15

<div style="padding:6px;color:black;padding-left: 60px;font-size:120%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500">
<b>  5.2. Tune lambda.</b></div>

**Tips:** Based on class imbalance.

In [32]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","learning_rate": 0.5,"max_depth":5,"min_child_weight":1,
            "gamma":1,"subsample":0.7,"colsample_bytree":.8, "scale_pos_weight":1,"reg_alpha":0.15}

V_PARAM_NAME = "reg_lambda"
V_PARAM_VALUES = np.linspace(start=0.001, stop=1, num=20).tolist()

data=xgb_helper(PARAMETERS,V_PARAM_NAME=V_PARAM_NAME,V_PARAM_VALUES=V_PARAM_VALUES);

    reg_lambda                                       auc
0     0.001000  (0.8837629294030112, 0.8377773090342933)
1     0.053579  (0.8834974755847746, 0.8384035914936796)
2     0.106158  (0.8841805799636513, 0.8375535646739086)
3     0.158737  (0.8847404995246787, 0.8366095528781997)
4     0.211316  (0.8836092981816227, 0.8386059183125034)
5     0.263895  (0.8821506722332023, 0.8379296769712348)
6     0.316474  (0.8836651397716764, 0.8370513569434376)
7     0.369053   (0.882388942144463, 0.8367691535569829)
8     0.421632  (0.8847808117276802, 0.8371616555450355)
9     0.474211   (0.883592293060415, 0.8374150080375327)
10    0.526789  (0.8836578971826203, 0.8387685700484389)
11    0.579368  (0.8832800764386283, 0.8407992397031212)
12    0.631947  (0.8823439507992432, 0.8386409162656416)
13    0.684526   (0.883278644113819, 0.8407971549564776)
14    0.737105   (0.882643406935463, 0.8386381056850116)
15    0.789684  (0.8818644813284889, 0.8388627272415266)
16    0.842263  (0.882067240883

### Pointers:
- Taking 1

<div style="padding:10px;color:black;padding-left: 45px;font-size:150%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>  6. Lastly, Reduce Learning Rate and add more trees</b></div>

<div style="padding:6px;color:black;padding-left: 60px;font-size:120%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500">
<b>  6.1. Reduce Learning rate.</b></div>

In [33]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","max_depth":5,"min_child_weight":1,
            "gamma":1,"subsample":0.7,"colsample_bytree":.7, "scale_pos_weight":1,"reg_alpha":0.15,
           "reg_lambda":1}

V_PARAM_NAME = "learning_rate"
V_PARAM_VALUES = np.linspace(start=0.01, stop=0.3, num=10).tolist()

data=xgb_helper(PARAMETERS,V_PARAM_NAME=V_PARAM_NAME,V_PARAM_VALUES=V_PARAM_VALUES);

   learning_rate                                       auc
0       0.010000  (0.8524824902942804, 0.8283759555174786)
1       0.042222    (0.8559181175211865, 0.82989955699058)
2       0.074444  (0.8595689536235975, 0.8315237841377243)
3       0.106667  (0.8620043872936567, 0.8324881862476035)
4       0.138889  (0.8648855464037115, 0.8342868908063013)
5       0.171111  (0.8685044164209602, 0.8361353526161105)
6       0.203333  (0.8706196613214325, 0.8390313464586476)
7       0.235556  (0.8727173745167882, 0.8397642287729165)
8       0.267778  (0.8734459152951304, 0.8397836765291448)
9       0.300000   (0.876039324652579, 0.8401660722604714)


### Pointers:
- Taking .3

<div style="padding:10px;color:black;padding-left: 45px;font-size:150%;text-align:left;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>  Full Model</b></div>

In [35]:
PARAMETERS={"objective":'binary:logistic',"eval_metric":"auc","max_depth":5,"min_child_weight":1,
            "gamma":1,"subsample":0.7,"colsample_bytree":.7, "scale_pos_weight":1,"reg_alpha":0.15,
           "reg_lambda":1,"learning_rate": 0.3}

clf = xgb.XGBClassifier( tree_method="gpu_hist",objective="binary:logistic",eval_metric="auc",max_depth=5,min_child_weight=1,
            gamma=1,subsample=0.7,colsample_bytree=.7, scale_pos_weight=1,reg_alpha=0.15,
           reg_lambda=1,learning_rate= 0.3,n_estimators=800)

clf.fit(X_train,y_train)

clf.save_model("categorical-model.json")

XGBoostError: [23:06:13] /Users/runner/work/xgboost/xgboost/python-package/build/temp.macosx-11.0-arm64-cpython-38/xgboost/src/gbm/../common/common.h:239: XGBoost version not compiled with GPU support.
Stack trace:
  [bt] (0) 1   libxgboost.dylib                    0x0000000173b747a8 dmlc::LogMessageFatal::~LogMessageFatal() + 124
  [bt] (1) 2   libxgboost.dylib                    0x0000000173c14da0 xgboost::gbm::GBTree::ConfigureUpdaters() + 436
  [bt] (2) 3   libxgboost.dylib                    0x0000000173c1498c xgboost::gbm::GBTree::Configure(std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&) + 964
  [bt] (3) 4   libxgboost.dylib                    0x0000000173c3087c xgboost::LearnerConfiguration::Configure() + 1016
  [bt] (4) 5   libxgboost.dylib                    0x0000000173c30b9c xgboost::LearnerImpl::UpdateOneIter(int, std::__1::shared_ptr<xgboost::DMatrix>) + 128
  [bt] (5) 6   libxgboost.dylib                    0x0000000173b78524 XGBoosterUpdateOneIter + 140
  [bt] (6) 7   libffi.8.dylib                      0x00000001068e004c ffi_call_SYSV + 76
  [bt] (7) 8   libffi.8.dylib                      0x00000001068dd790 ffi_call_int + 1256
  [bt] (8) 9   _ctypes.cpython-39-darwin.so        0x00000001068c016c _ctypes_callproc + 772



In [36]:
pred = clf.predict(X_test)

NotFittedError: need to call fit or load_model beforehand

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test, pred, target_names=["0","1"]))


In [None]:
from sklearn.metrics import plot_roc_curve
plot_roc_curve(clf, X_test, y_test)

In [None]:
from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,pred)

In [None]:
# Get a graph
graph = xgb.to_graphviz(clf, num_trees=1)
# Or get a matplotlib axis
ax = xgb.plot_tree(clf, num_trees=1)
# Get feature importances
plt.show()

<div style="padding:20px;color:black;margin:0;font-size:200%;text-align:center;display:fill;border-radius:5px;background-color:#d9d9d9;overflow:hidden;font-weight:500">Which parameter does what?</div>

# 📒 5. Which parameter does what?

<div style="padding:10px;color:black;margin:0;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Control Overfitting</b></div>


When you observe high training accuracy, but low test accuracy, it is likely that you encountered overfitting problem.
There are in general two ways that you can control overfitting in XGBoost:
- **The first way is to directly control model complexity.**
  - This includes **max_depth, min_child_weight** and **gamma**.
  
  
- **The second way is to add randomness to make training robust to noise.**
  - This includes **subsample** and **colsample_bytree**.
  - You can also reduce stepsize **eta**. Remember to increase **num_round** when you do so.

<div style="padding:10px;color:black;margin:0;font-size:120%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Control Overfitting - Code Example : Method 1</b></div>

<div style="padding:10px;color:black;margin:0;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Faster training performance</b></div>

There’s a parameter called **tree_method**, set it to **hist** or **gpu_hist** for faster computation.

<div style="padding:10px;color:black;margin:0;font-size:150%;text-align:center;display:fill;border-radius:5px;background-color:#f5f2f2;overflow:hidden;font-weight:500"><b>Handle Imbalanced Dataset</b></div>

For common cases such as ads clickthrough log, the dataset is extremely imbalanced. This can affect the training of XGBoost model, and there are two ways to improve it.

- **If you care only about the overall performance metric (AUC) of your prediction**
  - Balance the positive and negative weights via **scale_pos_weight**
  - Use AUC for evaluation
  
  
- **If you care about predicting the right probability**
  - In such a case, you cannot re-balance the dataset
  - Set parameter **max_delta_step** to a finite number (say 1) to help convergence