## 경진대회 BASELINE을 잡기 위한 optune + [xgboost, lightgbm, catboost]

경진대회에서 모델의 Hyperparameter 튜닝에 드는 노력과 시간을 절약하기 위하여 xgboost, lightgbm, catboost 3개의 라이브러리에 대하여 optuna 튜닝을 적용하여 예측 값을 산출해 내는 로직을 라이브러리 형태로 패키징 했습니다.

지원하는 예측 종류는
- 회귀(regression)
- 이진분류(binary classification)
- 다중분류(multi-class classification)

입니다.

앞으로 라이브러리 개선작업을 통해 더 빠르게 최적화할 수 있도록 개선해 나갈 계획입니다.

## 설치

In [None]:
!pip install -U teddynote

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting teddynote
  Downloading teddynote-0.2.1-py3-none-any.whl (10 kB)
Collecting optuna
  Downloading optuna-3.0.5-py3-none-any.whl (348 kB)
[K     |████████████████████████████████| 348 kB 5.0 MB/s 
[?25hCollecting catboost
  Downloading catboost-1.1.1-cp38-none-manylinux1_x86_64.whl (76.6 MB)
[K     |████████████████████████████████| 76.6 MB 1.2 MB/s 
Collecting colorlog
  Downloading colorlog-6.7.0-py2.py3-none-any.whl (11 kB)
Collecting cmaes>=0.8.2
  Downloading cmaes-0.9.0-py3-none-any.whl (23 kB)
Collecting alembic>=1.5.0
  Downloading alembic-1.9.1-py3-none-any.whl (210 kB)
[K     |████████████████████████████████| 210 kB 61.9 MB/s 
[?25hCollecting importlib-metadata<5.0.0
  Downloading importlib_metadata-4.13.0-py3-none-any.whl (23 kB)
Collecting cliff
  Downloading cliff-4.1.0-py3-none-any.whl (81 kB)
[K     |████████████████████████████████| 81 kB 7.4 MB/s 
Collecti

In [None]:
# 모듈 import 
from teddynote import models

## 샘플 데이터셋 로드

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

from sklearn.datasets import load_iris, load_boston, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

import lightgbm as lgb
import xgboost as xgb
import catboost as cb

from lightgbm import LGBMRegressor, LGBMClassifier
from xgboost import XGBRegressor, XGBClassifier
from catboost import CatBoostRegressor, CatBoostClassifier

warnings.filterwarnings('ignore')

SEED = 2021

In [None]:
# Binary Class Datasets
cancer = load_breast_cancer()
cancer_df = pd.DataFrame(cancer['data'], columns=cancer['feature_names'])
cancer_df['target'] = cancer['target']
cancer_df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [None]:
# Multi Class Datasets
iris = load_iris()
iris_df = pd.DataFrame(iris['data'], columns=iris['feature_names'])
iris_df['target'] = iris['target']
iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [None]:
# Regression Datasets
boston = load_boston()
boston_df = pd.DataFrame(boston['data'], columns=boston['feature_names'])
boston_df['target'] = boston['target']
boston_df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


## 간단 사용법

### optimize()

```
optimize(
    x,
    y,
    test_data=None,
    cat_features=None,
    eval_metric='f1',
    cv=5,
    seed=None,
    n_rounds=3000,
    n_trials=100,
)
```

**입력 매개변수**

- `x`: Feature 데이터
- `y`: Target 데이터
- `test_data`: 예측 데이터 (test 데이터의 feature 데이터)
- `cat_features`: 카테고리형 컬럼
- `eval_metric`: 최적화할 메트릭 ('f1', 'accuracy', 'recall', 'precision', 'mse', 'rmse', 'rmsle')
- `cv`: cross validation fold 개수
- `seed`: 시드
- `n_rounds`: 학습시 최대 iteration 횟수
- `n_trials`: optuna 하이퍼파라미터 튜닝 시도 횟수

**return**
- `params`: best 하이퍼파라미터
- `preds`: `test_data` 매개변수에 데이터를 지정한 경우 이에 대한 예측 값

### 결과값 자동저장 기능

optimizer() 로 튜닝 + 예측한 결과는 `numpy array` 형식으로 자동 저장합니다.

- 저장 경로: `models` 폴더

## CatBoost + Optuna

### 이진분류(binary classification)

In [None]:
catboostoptuna = models.CatBoostClassifierOptuna(use_gpu=False)

params, preds = catboostoptuna.optimize(iris_df.drop('target', 1), 
                                        iris_df['target'], 
                                        test_data=iris_df.drop('target', 1),
                                        seed=321,
                                        eval_metric='recall', n_trials=3)

(np.squeeze(preds) == iris_df['target']).mean()

[32m[I 2022-12-29 09:03:57,465][0m A new study created in memory with name: no-name-28ffcb0e-3683-45ee-bc03-2a34daf1a7b6[0m


metric type: recall, score: 1.00000
metric type: recall, score: 0.90000
metric type: recall, score: 0.96667
metric type: recall, score: 0.96667


[32m[I 2022-12-29 09:03:58,418][0m Trial 0 finished with value: 0.96 and parameters: {'bootstrap_type': 'MVS', 'boosting_type': 'Ordered', 'od_type': 'IncToDec', 'colsample_bylevel': 0.06671297835170709, 'l2_leaf_reg': 7.658494804634539e-07, 'learning_rate': 0.14785483236532374, 'iterations': 1957, 'min_child_samples': 29, 'depth': 11}. Best is trial 0 with value: 0.96.[0m


metric type: recall, score: 0.96667
metric type: recall, score: 0.56667
metric type: recall, score: 0.80000
metric type: recall, score: 0.50000


[32m[I 2022-12-29 09:03:58,679][0m Trial 1 finished with value: 0.58 and parameters: {'bootstrap_type': 'MVS', 'boosting_type': 'Plain', 'od_type': 'IncToDec', 'colsample_bylevel': 0.015558701922017922, 'l2_leaf_reg': 2.001369927344747e-08, 'learning_rate': 0.02431371358278197, 'iterations': 751, 'min_child_samples': 21, 'depth': 7}. Best is trial 0 with value: 0.96.[0m


metric type: recall, score: 0.86667
metric type: recall, score: 0.16667
metric type: recall, score: 1.00000
metric type: recall, score: 0.93333
metric type: recall, score: 0.96667


[32m[I 2022-12-29 09:03:59,268][0m Trial 2 finished with value: 0.9533333333333334 and parameters: {'bootstrap_type': 'MVS', 'boosting_type': 'Ordered', 'od_type': 'IncToDec', 'colsample_bylevel': 0.027519767569376283, 'l2_leaf_reg': 6.08111486794213e-08, 'learning_rate': 0.29707857950800076, 'iterations': 1243, 'min_child_samples': 14, 'depth': 3}. Best is trial 0 with value: 0.96.[0m


metric type: recall, score: 0.93333
metric type: recall, score: 0.93333
saving model...models/CatBoostClassifier-0.96000.npy


0.9733333333333334

### 다중분류(multi-class classification)

In [None]:
catboostoptuna = models.CatBoostClassifierOptuna()

params, preds = catboostoptuna.optimize(cancer_df.drop('target', 1), 
                                        cancer_df['target'], 
                                        test_data=cancer_df.drop('target', 1),
                                        seed=321,
                                        eval_metric='recall', n_trials=3)

(np.squeeze(preds) == cancer_df['target']).mean()

[32m[I 2022-12-29 09:06:28,540][0m A new study created in memory with name: no-name-cfbf7b45-c7d6-4352-9719-4b6e506c68f4[0m


metric type: recall, score: 0.92308
metric type: recall, score: 0.97015
metric type: recall, score: 0.98592
metric type: recall, score: 0.98718


[32m[I 2022-12-29 09:06:30,479][0m Trial 0 finished with value: 0.9680010734943633 and parameters: {'bootstrap_type': 'MVS', 'boosting_type': 'Ordered', 'od_type': 'IncToDec', 'colsample_bylevel': 0.023857798068534813, 'l2_leaf_reg': 1.4290831783864455e-07, 'learning_rate': 0.3389404251938573, 'iterations': 1370, 'min_child_samples': 12, 'depth': 7}. Best is trial 0 with value: 0.9680010734943633.[0m


metric type: recall, score: 0.97368
metric type: recall, score: 1.00000
metric type: recall, score: 1.00000
metric type: recall, score: 0.98592
metric type: recall, score: 1.00000


[32m[I 2022-12-29 09:06:44,969][0m Trial 1 finished with value: 0.9945515196441809 and parameters: {'bootstrap_type': 'Bayesian', 'boosting_type': 'Ordered', 'od_type': 'IncToDec', 'colsample_bylevel': 0.09789744541150466, 'l2_leaf_reg': 4.9967720608342804e-05, 'learning_rate': 0.028600910563671197, 'iterations': 1944, 'min_child_samples': 17, 'depth': 6, 'bagging_temperature': 9.155827151667337}. Best is trial 1 with value: 0.9945515196441809.[0m


metric type: recall, score: 0.98684
metric type: recall, score: 0.96923
metric type: recall, score: 0.98507
metric type: recall, score: 0.98592


[32m[I 2022-12-29 09:06:45,635][0m Trial 2 finished with value: 0.9802843937352639 and parameters: {'bootstrap_type': 'Bayesian', 'boosting_type': 'Plain', 'od_type': 'Iter', 'colsample_bylevel': 0.07729743157935254, 'l2_leaf_reg': 6.516661372618588e-07, 'learning_rate': 0.0116185794104273, 'iterations': 103, 'min_child_samples': 1, 'depth': 3, 'bagging_temperature': 29.442691081225654}. Best is trial 1 with value: 0.9945515196441809.[0m


metric type: recall, score: 0.97436
metric type: recall, score: 0.98684
saving model...models/CatBoostClassifier-0.99455.npy


0.9929701230228472

### 회귀(regression)

In [None]:
for col in ['CHAS', 'RAD', 'ZN']:
    boston_df[col] = boston_df[col].astype('int')
    
catboostoptuna_reg = models.CatBoostRegressorOptuna(use_gpu=False)
        
params, preds = catboostoptuna_reg.optimize(boston_df.drop('target', 1), 
                                            boston_df['target'], 
                                            test_data=boston_df.drop('target', 1),
                                            # int, str 타입 이어야 한다. float는 허용하지 않음
                                            cat_features=['CHAS', 'RAD', 'ZN'],
                                            eval_metric='mse', n_trials=3)

mean_squared_error(boston_df['target'], preds)

[32m[I 2022-12-29 09:13:27,466][0m A new study created in memory with name: no-name-c4a6f7ed-707a-470b-abdf-6a1e54d6924a[0m


error type: mse, error: 9.78167
error type: mse, error: 9.17342
error type: mse, error: 21.05971
error type: mse, error: 7.93784


[32m[I 2022-12-29 09:13:28,317][0m Trial 0 finished with value: 11.242691378761268 and parameters: {'bootstrap_type': 'Bernoulli', 'boosting_type': 'Plain', 'od_type': 'Iter', 'colsample_bylevel': 0.055635877341427525, 'l2_leaf_reg': 1.3063906272297266e-07, 'learning_rate': 0.05566893527221241, 'iterations': 933, 'min_child_samples': 4, 'depth': 12, 'subsample': 0.8127546834847629}. Best is trial 0 with value: 11.242691378761268.[0m


error type: mse, error: 8.26081
error type: mse, error: 15.06716
error type: mse, error: 8.06386
error type: mse, error: 8.47345
error type: mse, error: 10.32539


[32m[I 2022-12-29 09:13:29,273][0m Trial 1 finished with value: 9.659289371715955 and parameters: {'bootstrap_type': 'MVS', 'boosting_type': 'Plain', 'od_type': 'Iter', 'colsample_bylevel': 0.08204797882854901, 'l2_leaf_reg': 0.07333292199252602, 'learning_rate': 0.08005184519178517, 'iterations': 1589, 'min_child_samples': 1, 'depth': 5}. Best is trial 1 with value: 9.659289371715955.[0m


error type: mse, error: 6.36659
error type: mse, error: 19.71679
error type: mse, error: 5.56402
error type: mse, error: 8.31567
error type: mse, error: 12.71012


[32m[I 2022-12-29 09:13:31,195][0m Trial 2 finished with value: 10.908498331028131 and parameters: {'bootstrap_type': 'MVS', 'boosting_type': 'Plain', 'od_type': 'Iter', 'colsample_bylevel': 0.0971780340050969, 'l2_leaf_reg': 1.5331796221592077e-06, 'learning_rate': 0.17603300275969247, 'iterations': 691, 'min_child_samples': 21, 'depth': 8}. Best is trial 1 with value: 9.659289371715955.[0m


error type: mse, error: 8.23590
saving model...models/CatBoostRegressor-9.65929.npy


5.681159126457748

### 저장한 파일로부터 예측 값 (prediction) 불러오기

In [None]:
# 넘파이 array로 저장된 예측 결과를 로드할 수 있습니다.
models.load_prediction_from_file('/content/models/CatBoostRegressor-3.48059.npy')

array([25.82585968, 20.68006021, 32.84489183, 35.65161464, 32.78903836,
       25.80110956, 19.98998853, 18.82686324, 16.59023856, 19.59561169,
       18.09095126, 19.70100316, 20.33241957, 19.85520235, 18.20452363,
       19.97112548, 22.31164823, 18.05131059, 17.03215382, 17.45285002,
       15.31319352, 17.69064344, 16.39416142, 18.09231114, 17.26043346,
       15.94433693, 17.8622751 , 15.4895261 , 18.21393005, 21.33716669,
       14.48495421, 17.5680783 , 12.7236004 , 16.18968978, 14.21964801,
       23.13882888, 21.9074767 , 23.22506389, 23.16177266, 29.24684204,
       35.53260448, 27.85917278, 25.00109791, 24.65877509, 23.02005912,
       20.59479697, 20.59479697, 19.49851231, 18.65180189, 19.66256934,
       20.87246509, 22.86097377, 24.930397  , 22.8855505 , 18.19944467,
       36.0808    , 24.10941311, 36.75498042, 24.0082003 , 21.70310133,
       19.50025966, 19.66628114, 23.33020854, 24.84047106, 31.72573978,
       24.40566168, 19.02161942, 21.04979048, 18.85374287, 21.11

### 하이퍼파라미터 튜닝 시각화

In [None]:
# 튜닝 결과 시각화
catboostoptuna_reg.visualize()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_boosting_type,params_bootstrap_type,params_colsample_bylevel,params_depth,params_iterations,params_l2_leaf_reg,params_learning_rate,params_min_child_samples,params_od_type,params_subsample,state
0,0,11.242691,2022-12-29 09:13:27.471799,2022-12-29 09:13:28.317403,0 days 00:00:00.845604,Plain,Bernoulli,0.055636,12,933,1.306391e-07,0.055669,4,Iter,0.812755,COMPLETE
2,2,10.908498,2022-12-29 09:13:29.276466,2022-12-29 09:13:31.181382,0 days 00:00:01.904916,Plain,MVS,0.097178,8,691,1.53318e-06,0.176033,21,Iter,,COMPLETE
1,1,9.659289,2022-12-29 09:13:28.323602,2022-12-29 09:13:29.272726,0 days 00:00:00.949124,Plain,MVS,0.082048,5,1589,0.07333292,0.080052,1,Iter,,COMPLETE


## XGBoost

### 이진분류(binary classification)

In [None]:
xgboptuna = models.XGBClassifierOptuna(use_gpu=False)
        
params, preds = xgboptuna.optimize(iris_df.drop('target', 1), 
                                   iris_df['target'], 
                                   test_data=iris_df.drop('target', 1),
                                   seed=321,
                                   eval_metric='f1', n_trials=3)

(preds == iris_df['target']).mean()

[32m[I 2022-12-29 09:14:56,089][0m A new study created in memory with name: no-name-30381ae1-579f-499c-8614-b6b7c7db921f[0m


metric type: f1, score: 0.08829
metric type: f1, score: 0.16667
metric type: f1, score: 0.06667


[32m[I 2022-12-29 09:14:57,195][0m Trial 0 finished with value: 0.1015404415404415 and parameters: {'lambda': 0.001997388231849061, 'alpha': 0.0006132805599297649, 'colsample_bytree': 0.7642155856174857, 'subsample': 0.5266229668726202, 'learning_rate': 0.0006640778583932129, 'n_estimators': 667, 'max_depth': 30, 'min_child_weight': 33}. Best is trial 0 with value: 0.1015404415404415.[0m


metric type: f1, score: 0.13846
metric type: f1, score: 0.04762
metric type: f1, score: 0.08829
metric type: f1, score: 0.16667
metric type: f1, score: 0.06667


[32m[I 2022-12-29 09:14:57,717][0m Trial 1 finished with value: 0.10718146718146715 and parameters: {'lambda': 0.0001341468272216203, 'alpha': 0.0005128767714703468, 'colsample_bytree': 0.5899542564007023, 'subsample': 0.6631681207047757, 'learning_rate': 0.0001350596986692828, 'n_estimators': 2477, 'max_depth': 30, 'min_child_weight': 245}. Best is trial 1 with value: 0.10718146718146715.[0m


metric type: f1, score: 0.16667
metric type: f1, score: 0.04762
metric type: f1, score: 0.08829
metric type: f1, score: 0.13846
metric type: f1, score: 0.06667
metric type: f1, score: 0.13846


[32m[I 2022-12-29 09:14:58,132][0m Trial 2 finished with value: 0.09589941589941589 and parameters: {'lambda': 0.4738177610556752, 'alpha': 0.004144571311046486, 'colsample_bytree': 0.8913967246468639, 'subsample': 0.8785923888618219, 'learning_rate': 0.0004391753034763145, 'n_estimators': 4365, 'max_depth': 15, 'min_child_weight': 189}. Best is trial 1 with value: 0.10718146718146715.[0m


metric type: f1, score: 0.04762
saving model...models/XGBClassifier-0.10718.npy


0.3333333333333333

### 다중분류(multi-class classification)

In [None]:
xgboptuna_binary = models.XGBClassifierOptuna(use_gpu=False)
        
params, preds = xgboptuna_binary.optimize(cancer_df.drop('target', 1), 
                                          cancer_df['target'], 
                                          test_data=cancer_df.drop('target', 1), 
                                          eval_metric='accuracy', n_trials=3)

(preds == cancer_df['target']).mean()

[32m[I 2022-12-29 09:15:00,938][0m A new study created in memory with name: no-name-eb6ffa9c-3dc4-4117-8082-50da39c73491[0m


metric type: accuracy, score: 0.59649
metric type: accuracy, score: 0.69298
metric type: accuracy, score: 0.62281
metric type: accuracy, score: 0.63158


[32m[I 2022-12-29 09:15:25,517][0m Trial 0 finished with value: 0.6273560006210216 and parameters: {'lambda': 1.2413682640115096e-05, 'alpha': 0.030900183378150283, 'colsample_bytree': 0.6771700662771362, 'subsample': 0.7058559286075331, 'learning_rate': 0.0003092386008034649, 'n_estimators': 2152, 'max_depth': 27, 'min_child_weight': 130}. Best is trial 0 with value: 0.6273560006210216.[0m


metric type: accuracy, score: 0.59292
metric type: accuracy, score: 0.90351
metric type: accuracy, score: 0.95614
metric type: accuracy, score: 0.94737
metric type: accuracy, score: 0.91228


[32m[I 2022-12-29 09:15:33,822][0m Trial 1 finished with value: 0.9297003570874087 and parameters: {'lambda': 5.8073997524807416e-05, 'alpha': 0.000328992736353654, 'colsample_bytree': 0.553033500880707, 'subsample': 0.885698064381888, 'learning_rate': 0.00383230058859935, 'n_estimators': 4096, 'max_depth': 11, 'min_child_weight': 45}. Best is trial 1 with value: 0.9297003570874087.[0m


metric type: accuracy, score: 0.92920
metric type: accuracy, score: 0.57895
metric type: accuracy, score: 0.67544
metric type: accuracy, score: 0.64035
metric type: accuracy, score: 0.66667


[32m[I 2022-12-29 09:15:57,096][0m Trial 2 finished with value: 0.6273249495419966 and parameters: {'lambda': 0.027291097288808318, 'alpha': 0.00010582047198008938, 'colsample_bytree': 0.5328067308903317, 'subsample': 0.8031408333573117, 'learning_rate': 0.00010115009540749037, 'n_estimators': 3430, 'max_depth': 27, 'min_child_weight': 120}. Best is trial 1 with value: 0.9297003570874087.[0m


metric type: accuracy, score: 0.57522
saving model...models/XGBClassifier-0.92970.npy


0.8646748681898067

### 회귀(regression)

In [None]:
xgboptuna_reg = models.XGBRegressorOptuna()
        
params, preds = xgboptuna_reg.optimize(boston_df.drop('target', 1), 
                                       boston_df['target'], 
                                       test_data=boston_df.drop('target', 1), 
                                       eval_metric='mse', n_trials=3)

mean_squared_error(boston_df['target'], preds)

[32m[I 2022-12-29 09:16:57,639][0m A new study created in memory with name: no-name-ae7a00e6-92f7-41d2-b5c0-acbcf2f46d53[0m


error type: mse, error: 53.60435
error type: mse, error: 92.34599
error type: mse, error: 99.80514
error type: mse, error: 87.34863


[32m[I 2022-12-29 09:17:13,722][0m Trial 0 finished with value: 84.4422455786637 and parameters: {'lambda': 0.7635246861751053, 'alpha': 0.0031490611875899005, 'colsample_bytree': 0.8705756780009448, 'subsample': 0.8759323931606535, 'learning_rate': 0.002118437275978508, 'n_estimators': 3830, 'max_depth': 6, 'min_child_weight': 213}. Best is trial 0 with value: 84.4422455786637.[0m


error type: mse, error: 89.10712
error type: mse, error: 113.74239
error type: mse, error: 197.72973
error type: mse, error: 129.61431
error type: mse, error: 107.41140


[32m[I 2022-12-29 09:17:35,275][0m Trial 1 finished with value: 144.85258270621802 and parameters: {'lambda': 0.03437630049838279, 'alpha': 0.00018414872645525245, 'colsample_bytree': 0.8630890292344646, 'subsample': 0.551387281694677, 'learning_rate': 0.00034872348268562915, 'n_estimators': 419, 'max_depth': 16, 'min_child_weight': 137}. Best is trial 0 with value: 84.4422455786637.[0m


error type: mse, error: 175.76509
error type: mse, error: 484.68630
error type: mse, error: 534.23987
error type: mse, error: 513.40338
error type: mse, error: 446.41880


[32m[I 2022-12-29 09:17:57,241][0m Trial 2 finished with value: 483.1852927076376 and parameters: {'lambda': 0.0005758862991605531, 'alpha': 0.034543622027623705, 'colsample_bytree': 0.5415550556820339, 'subsample': 0.5537703008151961, 'learning_rate': 3.1901659384450854e-05, 'n_estimators': 2690, 'max_depth': 28, 'min_child_weight': 113}. Best is trial 0 with value: 84.4422455786637.[0m


error type: mse, error: 437.17812
saving model...models/XGBRegressor-84.44225.npy


84.43627046555166

## LGBM

### 이진분류(binary classification)

In [None]:
lgbmoptuna_binary = models.LGBMClassifierOptuna()
        
params, preds = lgbmoptuna_binary.optimize(cancer_df.drop('target', 1), 
                                           cancer_df['target'], 
                                           test_data=cancer_df.drop('target', 1),
                                           eval_metric='accuracy', n_trials=3)

(preds == cancer_df['target']).mean()

[32m[I 2022-12-29 09:17:59,660][0m A new study created in memory with name: no-name-c14a21dd-6289-4a6c-b9c2-625ac6369216[0m


Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.662971	training's score: 0.621978	valid_1's binary_logloss: 0.64947	valid_1's score: 0.649123
metric type: accuracy, score: 0.64912
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.667191	training's score: 0.613187	valid_1's binary_logloss: 0.634472	valid_1's score: 0.684211
metric type: accuracy, score: 0.68421
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.65604	training's score: 0.635165	valid_1's binary_logloss: 0.677486	valid_1's score: 0.596491
metric type: accuracy, score: 0.59649
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.654814	training's score: 0.637363	valid_1's binary_logloss: 0.682801	valid_1's scor

[32m[I 2022-12-29 09:17:59,879][0m Trial 0 finished with value: 0.6274025772395592 and parameters: {'lambda_l1': 4.595642532463337e-05, 'lambda_l2': 4.871147705405551e-06, 'path_smooth': 2.132089812428851e-08, 'learning_rate': 0.00016017208935090687, 'feature_fraction': 0.6721675485445667, 'bagging_fraction': 0.5525546970317589, 'num_leaves': 15, 'min_data_in_leaf': 91, 'max_bin': 136, 'n_estimators': 174, 'bagging_freq': 8, 'min_child_weight': 15}. Best is trial 0 with value: 0.6274025772395592.[0m


metric type: accuracy, score: 0.58772
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.659178	training's score: 0.629386	valid_1's binary_logloss: 0.664422	valid_1's score: 0.619469
metric type: accuracy, score: 0.61947
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.661964	training's score: 0.624176	valid_1's binary_logloss: 0.653758	valid_1's score: 0.640351
metric type: accuracy, score: 0.64035
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.654901	training's score: 0.637363	valid_1's binary_logloss: 0.682896	valid_1's score: 0.587719
metric type: accuracy, score: 0.58772
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.658525	training's score: 0.630769	valid_1's

[32m[I 2022-12-29 09:18:00,150][0m Trial 1 finished with value: 0.6274646793976091 and parameters: {'lambda_l1': 0.006204017854053708, 'lambda_l2': 0.6166727012607798, 'path_smooth': 0.0006456447868947258, 'learning_rate': 2.5445778619417826e-05, 'feature_fraction': 0.884989593521972, 'bagging_fraction': 0.8085943157981701, 'num_leaves': 65, 'min_data_in_leaf': 18, 'max_bin': 216, 'n_estimators': 1450, 'bagging_freq': 2, 'min_child_weight': 6}. Best is trial 1 with value: 0.6274646793976091.[0m


Early stopping, best iteration is:
[1]	training's binary_logloss: 0.663743	training's score: 0.620614	valid_1's binary_logloss: 0.646886	valid_1's score: 0.654867
metric type: accuracy, score: 0.65487
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.652512	training's score: 0.632967	valid_1's binary_logloss: 0.667747	valid_1's score: 0.605263
metric type: accuracy, score: 0.60526
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.657031	training's score: 0.624176	valid_1's binary_logloss: 0.649213	valid_1's score: 0.640351
metric type: accuracy, score: 0.64035
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.648844	training's score: 0.63956	valid_1's binary_logloss: 0.68321	valid_1's score: 0.578947
metric type: accuracy, score: 0.57895
Training unt

[32m[I 2022-12-29 09:18:00,385][0m Trial 2 finished with value: 0.6274802049371215 and parameters: {'lambda_l1': 3.2735035295480926e-06, 'lambda_l2': 2.653937126026768e-06, 'path_smooth': 1.963247188884402e-06, 'learning_rate': 0.006562765507047732, 'feature_fraction': 0.8017540250821625, 'bagging_fraction': 0.8901809067407016, 'num_leaves': 63, 'min_data_in_leaf': 69, 'max_bin': 207, 'n_estimators': 2766, 'bagging_freq': 1, 'min_child_weight': 7}. Best is trial 2 with value: 0.6274802049371215.[0m


Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's binary_logloss: 0.659785	training's score: 0.618421	valid_1's binary_logloss: 0.638887	valid_1's score: 0.663717
metric type: accuracy, score: 0.66372
saving model...models/LGBMClassifier-0.62748.npy


0.984182776801406

### 다중분류(multi-class classification)

In [None]:
lgbmoptuna = models.LGBMClassifierOptuna()
        
params, preds = lgbmoptuna.optimize(iris_df.drop('target', 1), 
                    iris_df['target'], 
                    seed=321,
                    eval_metric='recall', n_trials=3)


(preds == iris_df['target']).mean()

[32m[I 2022-12-29 09:18:00,881][0m A new study created in memory with name: no-name-c582fc83-6a04-4285-8dd0-afa9c6f2873c[0m


Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[5]	training's multi_logloss: 1.0964	training's score: 0.358333	valid_1's multi_logloss: 1.11515	valid_1's score: 0.433333
metric type: recall, score: 0.23333
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's multi_logloss: 1.09835	training's score: 0.391667	valid_1's multi_logloss: 1.10043	valid_1's score: 0.266667
metric type: recall, score: 0.30000
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's multi_logloss: 1.09586	training's score: 0.375	valid_1's multi_logloss: 1.12255	valid_1's score: 0.2
metric type: recall, score: 0.20000
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[11]	training's multi_logloss: 1.09766	training's score: 0.333333	valid_1's multi_logloss: 1.09978	valid_1's score: 0.4
metric type: recall, 

[32m[I 2022-12-29 09:18:01,294][0m Trial 0 finished with value: 0.24000000000000005 and parameters: {'lambda_l1': 2.3535929325302383, 'lambda_l2': 0.008567351717265869, 'path_smooth': 0.00016181683657077886, 'learning_rate': 6.361102878608731e-05, 'feature_fraction': 0.8738901851917171, 'bagging_fraction': 0.7294948502560171, 'num_leaves': 75, 'min_data_in_leaf': 21, 'max_bin': 190, 'n_estimators': 1805, 'bagging_freq': 2, 'min_child_weight': 10}. Best is trial 0 with value: 0.24000000000000005.[0m


Early stopping, best iteration is:
[1]	training's multi_logloss: 1.09199	training's score: 0.333333	valid_1's multi_logloss: 1.15819	valid_1's score: 0.4
metric type: recall, score: 0.16667
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's multi_logloss: 1.0986	training's score: 0.333333	valid_1's multi_logloss: 1.13617	valid_1's score: 0.333333
metric type: recall, score: 0.23333
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's multi_logloss: 1.09861	training's score: 0.341667	valid_1's multi_logloss: 1.10278	valid_1's score: 0.3
metric type: recall, score: 0.30000
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's multi_logloss: 1.09865	training's score: 0.366667	valid_1's multi_logloss: 1.15202	valid_1's score: 0.2
metric type: recall, score: 0.20000
Training until validation scores don't improve for

[32m[I 2022-12-29 09:18:01,661][0m Trial 1 finished with value: 0.24000000000000005 and parameters: {'lambda_l1': 3.760470476637525e-05, 'lambda_l2': 0.027379296021520953, 'path_smooth': 7.08379720738119e-07, 'learning_rate': 0.0008683343434183733, 'feature_fraction': 0.8207101251014834, 'bagging_fraction': 0.5899219011052734, 'num_leaves': 69, 'min_data_in_leaf': 58, 'max_bin': 142, 'n_estimators': 1917, 'bagging_freq': 14, 'min_child_weight': 2}. Best is trial 0 with value: 0.24000000000000005.[0m


Early stopping, best iteration is:
[1]	training's multi_logloss: 1.09861	training's score: 0.341667	valid_1's multi_logloss: 1.10278	valid_1's score: 0.3
metric type: recall, score: 0.30000
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's multi_logloss: 1.09833	training's score: 0.283333	valid_1's multi_logloss: 1.23072	valid_1's score: 0.533333
metric type: recall, score: 0.16667
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's multi_logloss: 1.0986	training's score: 0.333333	valid_1's multi_logloss: 1.13617	valid_1's score: 0.333333
metric type: recall, score: 0.23333
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	training's multi_logloss: 1.09861	training's score: 0.341667	valid_1's multi_logloss: 1.10278	valid_1's score: 0.3
metric type: recall, score: 0.30000
Training until validation scores don't improv

[32m[I 2022-12-29 09:18:02,051][0m Trial 2 finished with value: 0.24000000000000005 and parameters: {'lambda_l1': 0.00040788633657923463, 'lambda_l2': 0.5139457564836926, 'path_smooth': 3.3818590908878444e-08, 'learning_rate': 1.0512867547406974e-05, 'feature_fraction': 0.6775006827010839, 'bagging_fraction': 0.7898075669072202, 'num_leaves': 89, 'min_data_in_leaf': 49, 'max_bin': 238, 'n_estimators': 491, 'bagging_freq': 1, 'min_child_weight': 11}. Best is trial 0 with value: 0.24000000000000005.[0m


Early stopping, best iteration is:
[1]	training's multi_logloss: 1.09833	training's score: 0.283333	valid_1's multi_logloss: 1.23072	valid_1's score: 0.533333
metric type: recall, score: 0.16667


0.0

### 회귀(regression)

In [None]:
lgbmoptuna_reg = models.LGBMRegressorOptuna()
        
params, preds = lgbmoptuna_reg.optimize(boston_df.drop('target', 1), 
                                        boston_df['target'], 
                                        test_data=boston_df.drop('target', 1), 
                                        eval_metric='mse', n_trials=3)

mean_squared_error(boston_df['target'], preds)

[32m[I 2022-12-29 09:18:02,073][0m A new study created in memory with name: no-name-30121c32-b317-4bb5-b5a2-e0cdd63061d8[0m


Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[1709]	training's l2: 37.034	training's score: 37.034	valid_1's l2: 29.9855	valid_1's score: 29.9855
error type: mse, error: 29.98554
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[1709]	training's l2: 31.9411	training's score: 31.9411	valid_1's l2: 52.88	valid_1's score: 52.88
error type: mse, error: 52.88001
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[1709]	training's l2: 35.9263	training's score: 35.9263	valid_1's l2: 29.5901	valid_1's score: 29.5901
error type: mse, error: 29.59012
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[1709]	training's l2: 35.4721	training's score: 35.4721	valid_1's l2: 37.1197	valid_1's score: 37.1197
error type: mse, error: 37.11968
Training until validat

[32m[I 2022-12-29 09:18:08,981][0m Trial 0 finished with value: 37.12134099747261 and parameters: {'lambda_l1': 1.6920160712340157e-08, 'lambda_l2': 0.3121584304813956, 'path_smooth': 0.0005433919976498505, 'learning_rate': 0.0005108441186681401, 'feature_fraction': 0.7244640806269197, 'bagging_fraction': 0.7705110343972117, 'num_leaves': 42, 'min_data_in_leaf': 46, 'max_bin': 236, 'n_estimators': 1709, 'bagging_freq': 7, 'min_child_weight': 17}. Best is trial 0 with value: 37.12134099747261.[0m


Did not meet early stopping. Best iteration is:
[1709]	training's l2: 35.1082	training's score: 35.1082	valid_1's l2: 36.0314	valid_1's score: 36.0314
error type: mse, error: 36.03135
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[2881]	training's l2: 75.2725	training's score: 75.2725	valid_1's l2: 60.7196	valid_1's score: 60.7196
error type: mse, error: 60.71961
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[2881]	training's l2: 75.3598	training's score: 75.3598	valid_1's l2: 61.2104	valid_1's score: 61.2104
error type: mse, error: 61.21044
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[2881]	training's l2: 73.989	training's score: 73.989	valid_1's l2: 67.8801	valid_1's score: 67.8801
error type: mse, error: 67.88007
Training until validation scores don't improve for 30 rounds.
Did not meet early

[32m[I 2022-12-29 09:18:21,044][0m Trial 1 finished with value: 73.41621480687674 and parameters: {'lambda_l1': 3.0027719552771627e-05, 'lambda_l2': 8.163835722554557e-08, 'path_smooth': 3.859678822007846e-08, 'learning_rate': 3.968029869948861e-05, 'feature_fraction': 0.8779071970478933, 'bagging_fraction': 0.6087878515039717, 'num_leaves': 47, 'min_data_in_leaf': 33, 'max_bin': 209, 'n_estimators': 2881, 'bagging_freq': 1, 'min_child_weight': 18}. Best is trial 0 with value: 37.12134099747261.[0m


Did not meet early stopping. Best iteration is:
[2881]	training's l2: 71.44	training's score: 71.44	valid_1's l2: 78.1232	valid_1's score: 78.1232
error type: mse, error: 78.12321
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[484]	training's l2: 14.7942	training's score: 14.7942	valid_1's l2: 20.0686	valid_1's score: 20.0686
error type: mse, error: 20.06863
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[551]	training's l2: 14.1071	training's score: 14.1071	valid_1's l2: 21.6504	valid_1's score: 21.6504
error type: mse, error: 21.65038
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[551]	training's l2: 13.7774	training's score: 13.7774	valid_1's l2: 19.08	valid_1's score: 19.08
error type: mse, error: 19.07996
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iterat

[32m[I 2022-12-29 09:18:23,201][0m Trial 2 finished with value: 19.899965059769222 and parameters: {'lambda_l1': 0.0006948134857875728, 'lambda_l2': 1.816559704151923e-08, 'path_smooth': 9.091897723940985e-06, 'learning_rate': 0.023874159548122453, 'feature_fraction': 0.8700281555072255, 'bagging_fraction': 0.6519540223204592, 'num_leaves': 32, 'min_data_in_leaf': 57, 'max_bin': 189, 'n_estimators': 551, 'bagging_freq': 1, 'min_child_weight': 15}. Best is trial 2 with value: 19.899965059769222.[0m


Did not meet early stopping. Best iteration is:
[551]	training's l2: 14.4414	training's score: 14.4414	valid_1's l2: 20.1203	valid_1's score: 20.1203
error type: mse, error: 20.12029
saving model...models/LGBMRegressor-19.89997.npy


17.11354253780762

## 하이퍼파라미터 범위 수정 (custom)

In [None]:
lgbmoptuna = models.LGBMRegressorOptuna()

# 기본 값으로 설정된 하이퍼파라미터 출력
lgbmoptuna.print_params()

name: verbose, fixed_value: -1, type: fixed
name: lambda_l1, low: 1e-08, high: 5, type: loguniform
name: lambda_l2, low: 1e-08, high: 5, type: loguniform
name: path_smooth, low: 1e-08, high: 0.001, type: loguniform
name: learning_rate, low: 1e-05, high: 0.1, type: loguniform
name: feature_fraction, low: 0.5, high: 0.9, type: uniform
name: bagging_fraction, low: 0.5, high: 0.9, type: uniform
name: num_leaves, low: 15, high: 90, type: int
name: min_data_in_leaf, low: 10, high: 100, type: int
name: max_bin, low: 100, high: 255, type: int
name: n_estimators, low: 100, high: 3000, type: int
name: bagging_freq, low: 0, high: 15, type: int
name: min_child_weight, low: 1, high: 20, type: int


**`param_type`에 관하여**

`param_type`은 `int`, `uniform`, `loguniform`, `categorical`, `fixed` 가 있습니다.

- `int`, `uniform`, `loguniform`은 optuna의 search range 정의하는 파라미터와 같습니다.

```
예시)
- int 범위(int)
lgbmoptuna.set_param(models.OptunaParam('num_leaves', low=10, high=25, param_type='int'))

- 카테고리(categorical)
cboptuna.set_param(models.OptunaParam('bootstrap_type', categorical_value=['Bayesian', 'Bernoulli', 'MVS'], param_type='categorical'))

- 고정된 값(fixed)
cboptuna.set_param(models.OptunaParam('one_hot_max_size', fixed_value=1024, param_type='fixed'))
```

In [None]:
# 하이퍼파라미터 범위 정의
lgbmoptuna.set_param(models.OptunaParam('num_leaves', low=10, high=25, param_type='int'))
lgbmoptuna.set_param(models.OptunaParam('n_estimators', low=0, high=500, param_type='int'))
# 출력
lgbmoptuna.print_params()

name: verbose, fixed_value: -1, type: fixed
name: lambda_l1, low: 1e-08, high: 5, type: loguniform
name: lambda_l2, low: 1e-08, high: 5, type: loguniform
name: path_smooth, low: 1e-08, high: 0.001, type: loguniform
name: learning_rate, low: 1e-05, high: 0.1, type: loguniform
name: feature_fraction, low: 0.5, high: 0.9, type: uniform
name: bagging_fraction, low: 0.5, high: 0.9, type: uniform
name: num_leaves, low: 10, high: 25, type: int
name: min_data_in_leaf, low: 10, high: 100, type: int
name: max_bin, low: 100, high: 255, type: int
name: n_estimators, low: 0, high: 500, type: int
name: bagging_freq, low: 0, high: 15, type: int
name: min_child_weight, low: 1, high: 20, type: int


In [None]:
# 달라진 결과값 확인
params, preds = lgbmoptuna.optimize(boston_df.drop('target', 1), 
                                    boston_df['target'], 
                                    test_data=boston_df.drop('target', 1), 
                                    eval_metric='mse', n_trials=3)

[32m[I 2022-12-29 09:18:23,372][0m A new study created in memory with name: no-name-ae35f087-a228-4a68-b3c8-60d35f41590e[0m


Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[247]	training's l2: 21.093	training's score: 21.093	valid_1's l2: 29.1066	valid_1's score: 29.1066
error type: mse, error: 29.10655
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[247]	training's l2: 19.4262	training's score: 19.4262	valid_1's l2: 38.4706	valid_1's score: 38.4706
error type: mse, error: 38.47062
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[247]	training's l2: 22.0686	training's score: 22.0686	valid_1's l2: 20.341	valid_1's score: 20.341
error type: mse, error: 20.34104
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[247]	training's l2: 23.1349	training's score: 23.1349	valid_1's l2: 17.5461	valid_1's score: 17.5461
error type: mse, error: 17.54613


[32m[I 2022-12-29 09:18:24,381][0m Trial 0 finished with value: 25.833186951988512 and parameters: {'lambda_l1': 3.1348333324228786e-05, 'lambda_l2': 3.084018572804572, 'path_smooth': 0.00016940884371250385, 'learning_rate': 0.044391881817682145, 'feature_fraction': 0.8220894360847955, 'bagging_fraction': 0.8920904182836557, 'num_leaves': 14, 'min_data_in_leaf': 96, 'max_bin': 203, 'n_estimators': 247, 'bagging_freq': 1, 'min_child_weight': 17}. Best is trial 0 with value: 25.833186951988512.[0m


Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[247]	training's l2: 21.9319	training's score: 21.9319	valid_1's l2: 23.7016	valid_1's score: 23.7016
error type: mse, error: 23.70159
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[215]	training's l2: 7.99189	training's score: 7.99189	valid_1's l2: 14.3525	valid_1's score: 14.3525
error type: mse, error: 14.35247
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[246]	training's l2: 6.81822	training's score: 6.81822	valid_1's l2: 18.4926	valid_1's score: 18.4926
error type: mse, error: 18.49256
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[110]	training's l2: 11.4237	training's score: 11.4237	valid_1's l2: 8.87499	valid_1's score: 8.87499
error type: mse, error: 8.87499
Training until validation scores don't improve 

[32m[I 2022-12-29 09:18:25,373][0m Trial 1 finished with value: 14.31766601817004 and parameters: {'lambda_l1': 4.8092750923671334e-05, 'lambda_l2': 1.1459858330453022e-05, 'path_smooth': 3.219119419681446e-07, 'learning_rate': 0.04668186493593296, 'feature_fraction': 0.8820692747245038, 'bagging_fraction': 0.8367016179345099, 'num_leaves': 12, 'min_data_in_leaf': 41, 'max_bin': 200, 'n_estimators': 246, 'bagging_freq': 7, 'min_child_weight': 8}. Best is trial 1 with value: 14.31766601817004.[0m


Did not meet early stopping. Best iteration is:
[246]	training's l2: 6.62346	training's score: 6.62346	valid_1's l2: 21.0676	valid_1's score: 21.0676
error type: mse, error: 21.06764
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[121]	training's l2: 72.4573	training's score: 72.4573	valid_1's l2: 72.8465	valid_1's score: 72.8465
error type: mse, error: 72.84648
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[121]	training's l2: 72.147	training's score: 72.147	valid_1's l2: 72.1677	valid_1's score: 72.1677
error type: mse, error: 72.16768
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[121]	training's l2: 69.5731	training's score: 69.5731	valid_1's l2: 84.8518	valid_1's score: 84.8518
error type: mse, error: 84.85181
Training until validation scores don't improve for 30 rounds.


[32m[I 2022-12-29 09:18:25,940][0m Trial 2 finished with value: 72.88286497884431 and parameters: {'lambda_l1': 1.1057237328517206e-06, 'lambda_l2': 7.825102342319326e-05, 'path_smooth': 1.738304803573946e-08, 'learning_rate': 0.0010080869572794613, 'feature_fraction': 0.845567092557858, 'bagging_fraction': 0.5118440354952796, 'num_leaves': 13, 'min_data_in_leaf': 31, 'max_bin': 118, 'n_estimators': 121, 'bagging_freq': 8, 'min_child_weight': 12}. Best is trial 1 with value: 14.31766601817004.[0m


Did not meet early stopping. Best iteration is:
[121]	training's l2: 76.1123	training's score: 76.1123	valid_1's l2: 55.535	valid_1's score: 55.535
error type: mse, error: 55.53499
Training until validation scores don't improve for 30 rounds.
Did not meet early stopping. Best iteration is:
[121]	training's l2: 70.7505	training's score: 70.7505	valid_1's l2: 79.0134	valid_1's score: 79.0134
error type: mse, error: 79.01336
saving model...models/LGBMRegressor-14.31767.npy


trial에 대한 결과를 출력합니다.

In [None]:
# trial에 대한 결과 출력
lgbmoptuna.study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_bagging_fraction,params_bagging_freq,params_feature_fraction,params_lambda_l1,params_lambda_l2,params_learning_rate,params_max_bin,params_min_child_weight,params_min_data_in_leaf,params_n_estimators,params_num_leaves,params_path_smooth,state
0,0,25.833187,2022-12-29 09:18:23.375976,2022-12-29 09:18:24.381118,0 days 00:00:01.005142,0.89209,1,0.822089,3.1e-05,3.084019,0.044392,203,17,96,247,14,0.0001694088,COMPLETE
1,1,14.317666,2022-12-29 09:18:24.383579,2022-12-29 09:18:25.372613,0 days 00:00:00.989034,0.836702,7,0.882069,4.8e-05,1.1e-05,0.046682,200,8,41,246,12,3.219119e-07,COMPLETE
2,2,72.882865,2022-12-29 09:18:25.374771,2022-12-29 09:18:25.939203,0 days 00:00:00.564432,0.511844,8,0.845567,1e-06,7.8e-05,0.001008,118,12,31,121,13,1.738305e-08,COMPLETE


하이퍼파라미터 튜닝 결과 시각화

In [None]:
lgbmoptuna.visualize()

Best 하이퍼파라미터 출력

In [None]:
lgbmoptuna.get_best_params()

{'lambda_l1': 4.8092750923671334e-05,
 'lambda_l2': 1.1459858330453022e-05,
 'path_smooth': 3.219119419681446e-07,
 'learning_rate': 0.04668186493593296,
 'feature_fraction': 0.8820692747245038,
 'bagging_fraction': 0.8367016179345099,
 'num_leaves': 12,
 'min_data_in_leaf': 41,
 'max_bin': 200,
 'n_estimators': 246,
 'bagging_freq': 7,
 'min_child_weight': 8}