# Optuna: A hyperparameter optimization framework

* *In This Kernel I will use the amazing **Optuna** to find the best hyparameters of our model*

**So, Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API. The code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.** 
* To learn more about Optuna check this [link](https://optuna.org/)

# Basic Concepts
So, We use the terms study and trial as follows:
* Study: optimization based on an objective function
* Trial: a single execution of the objective function

In [1]:
#!pip install optuna 
import optuna

In [8]:
import xgboost as xgb
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

In [3]:
train = pd.read_csv('../input/tabular-playground-series-jan-2021/train.csv')
test  = pd.read_csv('../input/tabular-playground-series-jan-2021/test.csv')
sub = pd.read_csv('../input/tabular-playground-series-jan-2021/sample_submission.csv')

In [3]:
train.head()

Unnamed: 0,id,cont1,cont2,cont3,cont4,cont5,cont6,cont7,cont8,cont9,cont10,cont11,cont12,cont13,cont14,target
0,1,0.67039,0.8113,0.643968,0.291791,0.284117,0.855953,0.8907,0.285542,0.558245,0.779418,0.921832,0.866772,0.878733,0.305411,7.243043
1,3,0.388053,0.621104,0.686102,0.501149,0.64379,0.449805,0.510824,0.580748,0.418335,0.432632,0.439872,0.434971,0.369957,0.369484,8.203331
2,4,0.83495,0.227436,0.301584,0.293408,0.606839,0.829175,0.506143,0.558771,0.587603,0.823312,0.567007,0.677708,0.882938,0.303047,7.776091
3,5,0.820708,0.160155,0.546887,0.726104,0.282444,0.785108,0.752758,0.823267,0.574466,0.580843,0.769594,0.818143,0.914281,0.279528,6.957716
4,8,0.935278,0.421235,0.303801,0.880214,0.66561,0.830131,0.487113,0.604157,0.874658,0.863427,0.983575,0.900464,0.935918,0.435772,7.951046


In [4]:
columns = [col for col in train.columns.to_list() if col not in ['id','target']]

In [5]:
data=train[columns]
target=train['target']

## Let's build our optimization function using optuna

### This function uses XgboostRegressor model, takes 
* the data
* the target
* trial(How many executions we will do) and return the  
### and returns
* RMSE(Root Mean Squared Rrror)

## Notes:
* Note that I used some XGBoostRegressor hyperparameters from Xgboost official site. 
* So if you like to add more parameters or change them, check this [link](https://xgboost.readthedocs.io/en/latest/parameter.html) 
* Also I used early_stopping_rounds to avoid overfiting
* to speedup the training process enable the gpu or comment the first param argument(the training process will takes a lot of time by just using the cpu 😩) 

In [6]:
def objective(trial,data=data,target=target):
    
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.15,random_state=42)
    param = {
        'tree_method':'gpu_hist',  # this parameter means using the GPU when training our model to speedup the training process
        'lambda': trial.suggest_loguniform('lambda', 1e-3, 10.0),
        'alpha': trial.suggest_loguniform('alpha', 1e-3, 10.0),
        'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.3,0.4,0.5,0.6,0.7,0.8,0.9, 1.0]),
        'subsample': trial.suggest_categorical('subsample', [0.4,0.5,0.6,0.7,0.8,1.0]),
        'learning_rate': trial.suggest_categorical('learning_rate', [0.008,0.009,0.01,0.012,0.014,0.016,0.018, 0.02]),
        'n_estimators': 4000,
        'max_depth': trial.suggest_categorical('max_depth', [5,7,9,11,13,15,17,20]),
        'random_state': trial.suggest_categorical('random_state', [24, 48,2020]),
        'min_child_weight': trial.suggest_int('min_child_weight', 1, 300),
    }
    model = xgb.XGBRegressor(**param)  
    
    model.fit(train_x,train_y,eval_set=[(test_x,test_y)],early_stopping_rounds=100,verbose=False)
    
    preds = model.predict(test_x)
    
    rmse = mean_squared_error(test_y, preds,squared=False)
    
    return rmse

## All thing is ready So let's start 🏄‍
* Note that the objective of our fuction is to minimize the RMSE that's why I set direction='minimize'
* you can vary n_trials(number of executions) 

In [9]:
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)
print('Number of finished trials:', len(study.trials))
print('Best trial:', study.best_trial.params)

[32m[I 2021-01-14 20:43:26,270][0m A new study created in memory with name: no-name-c001ceff-0ee3-4f5f-8506-0909e7c021ac[0m
[32m[I 2021-01-14 20:44:45,870][0m Trial 0 finished with value: 0.6946865297554603 and parameters: {'lambda': 0.004462945910704844, 'alpha': 0.01852222899338946, 'colsample_bytree': 0.5, 'subsample': 0.8, 'learning_rate': 0.016, 'max_depth': 15, 'random_state': 2020, 'min_child_weight': 29}. Best is trial 0 with value: 0.6946865297554603.[0m
[32m[I 2021-01-14 20:46:04,506][0m Trial 1 finished with value: 0.6950240210701261 and parameters: {'lambda': 0.0017944587749744762, 'alpha': 0.5557535345992904, 'colsample_bytree': 0.8, 'subsample': 1.0, 'learning_rate': 0.01, 'max_depth': 15, 'random_state': 2020, 'min_child_weight': 61}. Best is trial 0 with value: 0.6946865297554603.[0m
[32m[I 2021-01-14 20:46:22,657][0m Trial 2 finished with value: 0.6961449843462042 and parameters: {'lambda': 0.16824823603688913, 'alpha': 0.00366335321358022, 'colsample_bytree

[32m[I 2021-01-14 21:04:00,943][0m Trial 24 finished with value: 0.6941345874738224 and parameters: {'lambda': 0.01690585875048765, 'alpha': 0.11903963144420927, 'colsample_bytree': 0.9, 'subsample': 0.8, 'learning_rate': 0.008, 'max_depth': 20, 'random_state': 2020, 'min_child_weight': 146}. Best is trial 7 with value: 0.693583206033291.[0m
[32m[I 2021-01-14 21:04:52,192][0m Trial 25 finished with value: 0.6939200335176035 and parameters: {'lambda': 0.23426470621866696, 'alpha': 3.6391114106875917, 'colsample_bytree': 0.7, 'subsample': 0.5, 'learning_rate': 0.012, 'max_depth': 17, 'random_state': 24, 'min_child_weight': 105}. Best is trial 7 with value: 0.693583206033291.[0m
[32m[I 2021-01-14 21:08:21,024][0m Trial 26 finished with value: 0.6943864839175 and parameters: {'lambda': 0.005452555635612149, 'alpha': 0.7561730873636894, 'colsample_bytree': 0.5, 'subsample': 0.8, 'learning_rate': 0.008, 'max_depth': 17, 'random_state': 2020, 'min_child_weight': 28}. Best is trial 7 w

[32m[I 2021-01-14 21:22:20,968][0m Trial 48 finished with value: 0.6935488077698737 and parameters: {'lambda': 0.008064004164059587, 'alpha': 0.06473630502436334, 'colsample_bytree': 0.5, 'subsample': 0.7, 'learning_rate': 0.01, 'max_depth': 15, 'random_state': 2020, 'min_child_weight': 300}. Best is trial 38 with value: 0.6933412322042144.[0m
[32m[I 2021-01-14 21:23:10,473][0m Trial 49 finished with value: 0.6933653939116505 and parameters: {'lambda': 0.007247040484410593, 'alpha': 0.06082421640514651, 'colsample_bytree': 0.5, 'subsample': 0.7, 'learning_rate': 0.01, 'max_depth': 15, 'random_state': 2020, 'min_child_weight': 297}. Best is trial 38 with value: 0.6933412322042144.[0m


Number of finished trials: 50
Best trial: {'lambda': 0.0030282073258141168, 'alpha': 0.01563845128469084, 'colsample_bytree': 0.5, 'subsample': 0.7, 'learning_rate': 0.01, 'max_depth': 15, 'random_state': 2020, 'min_child_weight': 257}


In [10]:
study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_alpha,params_colsample_bytree,params_lambda,params_learning_rate,params_max_depth,params_min_child_weight,params_random_state,params_subsample,state
0,0,0.694687,2021-01-14 20:43:26.274073,2021-01-14 20:44:45.869100,0 days 00:01:19.595027,0.018522,0.5,0.004463,0.016,15,29,2020,0.8,COMPLETE
1,1,0.695024,2021-01-14 20:44:45.871557,2021-01-14 20:46:04.506065,0 days 00:01:18.634508,0.555754,0.8,0.001794,0.01,15,61,2020,1.0,COMPLETE
2,2,0.696145,2021-01-14 20:46:04.507859,2021-01-14 20:46:22.656991,0 days 00:00:18.149132,0.003663,0.8,0.168248,0.014,5,55,48,1.0,COMPLETE
3,3,0.696619,2021-01-14 20:46:22.659382,2021-01-14 20:46:42.534940,0 days 00:00:19.875558,0.005717,0.8,0.008038,0.008,5,127,48,0.8,COMPLETE
4,4,0.696026,2021-01-14 20:46:42.536841,2021-01-14 20:47:00.244935,0 days 00:00:17.708094,0.339408,1.0,0.001897,0.01,5,242,48,0.5,COMPLETE
5,5,0.694006,2021-01-14 20:47:00.247320,2021-01-14 20:47:28.795079,0 days 00:00:28.547759,0.081718,0.3,0.566578,0.02,11,79,48,0.8,COMPLETE
6,6,0.693965,2021-01-14 20:47:28.797172,2021-01-14 20:48:02.383504,0 days 00:00:33.586332,0.30778,0.5,0.218453,0.018,17,127,48,0.6,COMPLETE
7,7,0.693583,2021-01-14 20:48:02.385608,2021-01-14 20:50:11.562938,0 days 00:02:09.177330,1.407286,0.5,0.036699,0.008,15,49,2020,0.8,COMPLETE
8,8,0.694213,2021-01-14 20:50:11.565349,2021-01-14 20:50:41.296615,0 days 00:00:29.731266,0.302927,0.3,9.2722,0.016,9,120,48,0.6,COMPLETE
9,9,0.694596,2021-01-14 20:50:41.298846,2021-01-14 20:51:06.812473,0 days 00:00:25.513627,0.17394,0.8,0.001116,0.012,9,206,2020,1.0,COMPLETE


# Let's do some Quick Visualization for Hyperparameter Optimization Analysis
### Optuna provides various visualization features in optuna.visualization to analyze optimization results visually

In [11]:
#plot_optimization_histor: shows the scores from all trials as well as the best score so far at each point.
optuna.visualization.plot_optimization_history(study)

In [14]:
#plot_parallel_coordinate: interactively visualizes the hyperparameters and scores
optuna.visualization.plot_parallel_coordinate(study)

In [15]:
'''plot_slice: shows the evolution of the search. You can see where in the hyperparameter space your search
went and which parts of the space were explored more.'''
optuna.visualization.plot_slice(study)

In [17]:
#plot_contour: plots parameter interactions on an interactive chart. You can choose which hyperparameters you would like to explore.
optuna.visualization.plot_contour(study, params=['alpha',
                            #'max_depth',
                            'lambda',
                            'subsample',
                            'learning_rate',
                            'subsample'])

In [18]:
#Visualize parameter importances.
optuna.visualization.plot_param_importances(study)

In [19]:
#Visualize empirical distribution function
optuna.visualization.plot_edf(study)

# Let's create an XgboostRegressor model with the best hyperparameters

In [10]:
Best_trial= {'lambda': 0.0030282073258141168, 'alpha': 0.01563845128469084, 'colsample_bytree': 0.5,
             'subsample': 0.7,'n_estimators': 4000, 'learning_rate': 0.01,'max_depth': 15,
             'random_state': 2020, 'min_child_weight': 257,'tree_method':'gpu_hist'}

In [11]:
preds = np.zeros(test.shape[0])
kf = KFold(n_splits=5,random_state=48,shuffle=True)
rmse=[]  # list contains rmse for each fold
n=0
for trn_idx, test_idx in kf.split(train[columns],train['target']):
    X_tr,X_val=train[columns].iloc[trn_idx],train[columns].iloc[test_idx]
    y_tr,y_val=train['target'].iloc[trn_idx],train['target'].iloc[test_idx]
    model = xgb.XGBRegressor(**Best_trial)
    model.fit(X_tr,y_tr,eval_set=[(X_val,y_val)],early_stopping_rounds=100,verbose=False)
    preds+=model.predict(test[columns])/kf.n_splits
    rmse.append(mean_squared_error(y_val, model.predict(X_val), squared=False))
    print(n+1,rmse[n])
    n+=1

1 0.6980713043151195
2 0.6952765526112837
3 0.6958580627587057
4 0.6955146269478062
5 0.6957423743842032


In [12]:
np.mean(rmse)

0.6960925842034238