- Generally, the XGBoost hyperparameters have been divided into 4 categories. They are as follows -

  - 1. General parameters
  - 2. Booster parameters
  - 3. Learning task parameters
  - 4. Command line parameters

- Before running a XGBoost model, we must set three types of parameters - **general parameters**, **booster parameters** and **task parameters**.

- The fourth type of parameters are **command line parameters**. They are only used in the console version of XGBoost. So, we will skip these parameters and limit our discussion to the first three type of parameters.

- Link : [Params for XGboost](https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tree-booster)
- General Parameters:
    - booster (gbtree/gblinear/tree)
    - verbosity
    - nthread

- Booster Parameters
    - Tree booster
        - eta (learning rate)
        - gamma
        - max_depth
        - min_child_weight
        - max_delta_step (try in imbalanced data especially with logistic regression)
        - subsample
        - colsample_(bytree, bylevel, bynode) (e.g {'colsample_bytree': 0.5, 'colsample_bylevel': 0.5, 'colsample_bynode': 0.5 } for 64 will reauslt in 8 features per split)
        - lambda (l2 regularization)
        - alpha (l1 regularization)
        - tree_method (auto/ exact(greedy)/ approx/ hist/ gpu_hist)
        - scale_pos_weight
        - max_leaves

- Learning Task Parameters
    - objective (loss function tro be minimized)
    - eval_metric (rmse/mae/acc/auc..,)
    - seed

- k-fold CV in XGBoost
    - nfolds
    - num_boost_round
    - metrics
    - early_stopping_rounds (stop training if hold out metric does not improve for given number of rounds)
    - seed


In [2]:
# imports

# read data - Wholesale customer data.csv [https://archive.ics.uci.edu/ml/datasets/wholesale+customers]

# split feature and target variable

# run basic eda (summary statistics, missing value checks, shape, categprocal feature handling)

# train test split

# make baseline model and get prediction and evaluation metric

# k-fold cross validation

# use xgboost/rf for feature importance


## Bayesian Optimization

- finding the best parameter for any ML/DL model
- constantly learns from previous optimizations
    - surrogate model
    - acquisition function

## Hyperopt

- Search for hyperparameters in a search space optimally
- Used in model tuning

## Parts of the optimiztaion process:

- Initialize domain space
- Define objective function
- Optimization algorithm
- Result evaluation

In [1]:
import os
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.metrics import accuracy_score
from hyperopt import tpe, hp, fmin, Trials, STATUS_OK

In [2]:
os.getcwd()

'C:\\Users\\DELL\\Documents\\College\\Semester V\\Machine Learning\\Ready Codes'

In [4]:
df = pd.read_csv('Wholesale customers data.csv')

In [5]:
df.head()

Unnamed: 0,Channel,Region,Fresh,Milk,Grocery,Frozen,Detergents_Paper,Delicassen
0,2,3,12669,9656,7561,214,2674,1338
1,2,3,7057,9810,9568,1762,3293,1776
2,2,3,6353,8808,7684,2405,3516,7844
3,1,3,13265,1196,4221,6404,507,1788
4,2,3,22615,5410,7198,3915,1777,5185


In [6]:
X = df.drop('Channel', axis = 1)
y = df['Channel']

In [7]:
y.value_counts()

1    298
2    142
Name: Channel, dtype: int64

In [8]:
y[y==2] = 0
y[y==1] = 1

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3)

The available hyperopt optimization algorithms are -

- **hp.choice(label, options)** — Returns one of the options, which should be a list or tuple.

- **hp.randint(label, upper)** — Returns a random integer between the range [0, upper).

- **hp.uniform(label, low, high)** — Returns a value uniformly between low and high.

- **hp.quniform(label, low, high, q)** — Returns a value round(uniform(low, high) / q) * q, i.e it rounds the decimal values and returns an integer.

- **hp.normal(label, mean, std)** — Returns a real value that’s normally-distributed with mean and standard deviation sigma.

In [10]:
space={'max_depth': hp.quniform("max_depth", 3, 18, 1),
       'gamma': hp.uniform ('gamma', 1,9),
       'reg_alpha' : hp.quniform('reg_alpha', 40,180,1),
       'reg_lambda' : hp.uniform('reg_lambda', 0,1),
       'colsample_bytree' : hp.uniform('colsample_bytree', 0.5,1),
       'min_child_weight' : hp.quniform('min_child_weight', 0, 10, 1),
       'n_estimators': 180,
       'seed': 0
       }

In [11]:
def objective(space):
       clf = xgb.XGBClassifier(
              n_estimators = space['n_estimators'],
              gamma = int(space['gamma']),
              reg_alpha = int(space['reg_alpha']),
              reg_lambda = int(space['reg_lambda']),
              colsample_bytree = int(space['colsample_bytree']),
              min_child_weight  = int(space['min_child_weight']),
              max_depth = int(space['max_depth'])
       )
       evaluation = [(X_train, y_train), (X_test, y_test)]
       clf.fit(X_train, y_train,
               eval_set=evaluation, eval_metric = 'auc',
               early_stopping_rounds = 10, verbose = False)

       pred = clf.predict(X_test)
       accuracy = accuracy_score(y_test, pred>0.5)
       print('SCORE:', accuracy)
       return {'loss': -accuracy, 'status': STATUS_OK}

In [12]:
trials = Trials()
best_hyperparameters = fmin(fn = objective, space = space, algo = tpe.suggest, max_evals = 100, trials = trials)

  0%|                                                                          | 0/100 [00:00<?, ?trial/s, best loss=?]




SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
  2%|▉                                              | 2/100 [00:00<00:29,  3.31trial/s, best loss: -0.2803030303030303]






SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
  6%|██▊                                            | 6/100 [00:01<00:12,  7.56trial/s, best loss: -0.8712121212121212]






SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
  8%|███▊                                           | 8/100 [00:01<00:10,  9.00trial/s, best loss: -0.8712121212121212]






SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 12%|█████▌                                        | 12/100 [00:01<00:09,  9.75trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 14%|██████▍                                       | 14/100 [00:01<00:08, 10.53trial/s, best loss: -0.8787878787878788]






SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 16%|███████▎                                      | 16/100 [00:01<00:07, 11.06trial/s, best loss: -0.8787878787878788]






SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 20%|█████████▏                                    | 20/100 [00:02<00:06, 11.90trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
 20%|█████████▏                                    | 20/100 [00:02<00:06, 11.90trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
SCORE:                                                                                                                 
0.7196969696969697                                                                                                     
 23%|██████████▌                                   | 23/100 [00:02<00:11,  6.97trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
 25%|███████████▌                                  | 25/100 [00:03<00:10,  6.98trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
                                                                                                                       





SCORE:
0.8636363636363636                                                                                                     
SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
 29%|█████████████▎                                | 29/100 [00:03<00:10,  6.71trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
 31%|██████████████▎                               | 31/100 [00:04<00:10,  6.67trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 33%|███████████████▏                              | 33/100 [00:04<00:10,  6.38trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
SCORE:                                                                                                                 
 35%|████████████████                              | 35/100 [00:04<00:10,  6.19trial/s, best loss: -0.8787878787878788]





0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
 37%|█████████████████                             | 37/100 [00:05<00:10,  6.05trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 39%|█████████████████▉                            | 39/100 [00:05<00:10,  6.06trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 42%|███████████████████▎                          | 42/100 [00:05<00:08,  6.79trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
                                                                                                                       





0.7196969696969697
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
                                                                                                                       





0.2803030303030303
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 47%|█████████████████████▌                        | 47/100 [00:06<00:07,  7.12trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8636363636363636                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 50%|███████████████████████                       | 50/100 [00:07<00:06,  7.56trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 51%|███████████████████████▍                      | 51/100 [00:07<00:06,  7.12trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
 53%|████████████████████████▍                     | 53/100 [00:07<00:07,  6.13trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 56%|█████████████████████████▊                    | 56/100 [00:07<00:06,  6.95trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8636363636363636                                                                                                     
SCORE:                                                                                                                 
 57%|██████████████████████████▏                   | 57/100 [00:08<00:05,  7.19trial/s, best loss: -0.8787878787878788]





0.8636363636363636                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 59%|███████████████████████████▏                  | 59/100 [00:08<00:05,  7.13trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.7196969696969697                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 61%|████████████████████████████                  | 61/100 [00:08<00:06,  6.27trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
 63%|████████████████████████████▉                 | 63/100 [00:08<00:05,  6.79trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.7196969696969697                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
                                                                                                                       





SCORE:
0.8787878787878788                                                                                                     
SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
 67%|██████████████████████████████▊               | 67/100 [00:09<00:04,  6.71trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
 69%|███████████████████████████████▋              | 69/100 [00:09<00:04,  6.31trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8636363636363636                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 71%|████████████████████████████████▋             | 71/100 [00:10<00:04,  6.10trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 73%|█████████████████████████████████▌            | 73/100 [00:10<00:04,  6.37trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 75%|██████████████████████████████████▌           | 75/100 [00:10<00:03,  6.64trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.7196969696969697                                                                                                     
SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
 77%|███████████████████████████████████▍          | 77/100 [00:11<00:03,  6.12trial/s, best loss: -0.8787878787878788]




SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 78%|███████████████████████████████████▉          | 78/100 [00:11<00:03,  5.64trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 80%|████████████████████████████████████▊         | 80/100 [00:11<00:03,  6.21trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 82%|█████████████████████████████████████▋        | 82/100 [00:12<00:02,  6.29trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
 84%|██████████████████████████████████████▋       | 84/100 [00:12<00:02,  6.22trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
 86%|███████████████████████████████████████▌      | 86/100 [00:12<00:02,  6.16trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 88%|████████████████████████████████████████▍     | 88/100 [00:13<00:01,  6.04trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
SCORE:                                                                                                                 
                                                                                                                       





0.2803030303030303
SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
 92%|██████████████████████████████████████████▎   | 92/100 [00:13<00:01,  5.57trial/s, best loss: -0.8787878787878788]




SCORE:                                                                                                                 
0.8712121212121212                                                                                                     
 93%|██████████████████████████████████████████▊   | 93/100 [00:13<00:01,  5.78trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.7196969696969697                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 95%|███████████████████████████████████████████▋  | 95/100 [00:14<00:00,  5.88trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 97%|████████████████████████████████████████████▌ | 97/100 [00:14<00:00,  5.58trial/s, best loss: -0.8787878787878788]




SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
 98%|█████████████████████████████████████████████ | 98/100 [00:14<00:00,  5.05trial/s, best loss: -0.8787878787878788]




SCORE:                                                                                                                 
0.2803030303030303                                                                                                     
 99%|█████████████████████████████████████████████▌| 99/100 [00:15<00:00,  5.09trial/s, best loss: -0.8787878787878788]





SCORE:                                                                                                                 
0.8787878787878788                                                                                                     
100%|█████████████████████████████████████████████| 100/100 [00:15<00:00,  6.59trial/s, best loss: -0.8787878787878788]


In [13]:
print(best_hyperparameters)

{'colsample_bytree': 0.8492302659678359, 'gamma': 7.557621411513204, 'max_depth': 8.0, 'min_child_weight': 1.0, 'reg_alpha': 41.0, 'reg_lambda': 0.053735640799807305}
