## Machine Learning 프로젝트 수행을 위한 코드 구조화

- ML project를 위해서 사용하는 템플릿 코드를 만듭니다.

1. **필요한 라이브러리와 데이터를 불러옵니다.**


2. **EDA를 수행합니다.** 이 때 EDA의 목적은 풀어야하는 문제를 위해서 수행됩니다.


3. **전처리를 수행합니다.** 이 때 중요한건 **feature engineering**을 어떻게 하느냐 입니다.


4. **데이터 분할을 합니다.** 이 때 train data와 test data 간의 분포 차이가 없는지 확인합니다.


5. **학습을 진행합니다.** 어떤 모델을 사용하여 학습할지 정합니다. 성능이 잘 나오는 GBM을 추천합니다.


6. **hyper-parameter tuning을 수행합니다.** 원하는 목표 성능이 나올 때 까지 진행합니다. 검증 단계를 통해 지속적으로 **overfitting이 되지 않게 주의**하세요.


7. **최종 테스트를 진행합니다.** 데이터 분석 대회 포맷에 맞는 submission 파일을 만들어서 성능을 확인해보세요.

## 1. 라이브러리, 데이터 불러오기

In [11]:
# 데이터분석 4종 세트
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 모델들, 성능 평가
# (저는 일반적으로 정형데이터로 머신러닝 분석할 때는 이 2개 모델은 그냥 돌려봅니다. 특히 RF가 테스트하기 좋습니다.)
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from lightgbm import LGBMClassifier

# KFold(CV), partial : optuna를 사용하기 위함
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
from functools import partial

# hyper-parameter tuning을 위한 라이브러리, optuna
import optuna

In [12]:
# 데이터를 불러옵니다.
base_path = '../data/'
train = pd.read_csv(base_path + 'train.csv')
test = pd.read_csv(base_path + 'test.csv')
submission = pd.read_csv(base_path + 'sample_submission.csv')
print(train.shape, test.shape, submission.shape)

(101763, 23) (67842, 22) (67842, 2)


## 2. EDA

- 데이터에서 찾아야 하는 기초적인 내용들을 확인합니다.


- class imbalance, target distribution, outlier, correlation을 확인합니다.

In [13]:
train.columns

Index(['id', 'loc', 'v(g)', 'ev(g)', 'iv(g)', 'n', 'v', 'l', 'd', 'i', 'e',
       'b', 't', 'lOCode', 'lOComment', 'lOBlank', 'locCodeAndComment',
       'uniq_Op', 'uniq_Opnd', 'total_Op', 'total_Opnd', 'branchCount',
       'defects'],
      dtype='object')

### 3. 전처리

#### 결측치 처리

### 4. 학습 데이터 분할

In [14]:
# 첫번째 테스트용으로 사용하고, 실제 학습시에는 K-Fold CV를 사용합니다.
from sklearn.model_selection import train_test_split

X = train.drop(columns=['defects'])
y = train.defects

# for OOF-prediction split 5% of data as validation dataset.
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=61, stratify=y)

In [15]:
print(X_train.shape, y_train.shape, X_val.shape, y_val.shape)
print(y_train.mean(), y_val.mean())

(81410, 22) (81410,) (20353, 22) (20353,)
0.2266429185603734 0.22664963396059548


### 5. 학습 및 평가

In [16]:
# 간단하게 LightGBM 테스트
# 적당한 hyper-parameter 조합을 두었습니다. (항상 best는 아닙니다. 예시입니다.)
model = LGBMClassifier(
    n_jobs=-1,
    random_state=61
)

In [17]:
print("\nFitting LightGBM...")
model.fit(X_train, y_train)


Fitting LightGBM...
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.007258 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3806
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365


In [18]:
# metric은 그때마다 맞게 바꿔줘야 합니다.
evaluation_metric = roc_auc_score

In [19]:
print("Prediction")
pred_train = model.predict(X_train)
pred_val = model.predict(X_val)


train_score = evaluation_metric(y_train, pred_train)
val_score = evaluation_metric(y_val, pred_val)

print("Train Score : %.4f" % train_score)
print("Validation Score : %.4f" % val_score)

Prediction
Train Score : 0.6770
Validation Score : 0.6676


### 6. Hyper-parameter Tuning

In [22]:
def optimizer(trial, X, y, K):
    # 조절할 hyper-parameter 조합을 적어줍니다.
    max_depth = trial.suggest_int('max_depth', 15, 25)
    num_leaves = trial.suggest_categorical('num_leaves', [128,256,512,1024])
    min_child_samples = trial.suggest_int('min_child_samples', 5, 100)
    colsample_bytree = trial.suggest_float('colsample_bytree', 0.5, 0.8)
    n_estimators = trial.suggest_int('n_estimators', 50, 2047)
    learning_rate = trial.suggest_float('learning_rate', 0.001, 0.3)

    # 원하는 모델을 지정합니다, optuna는 시간이 오래걸리기 때문에 저는 보통 RF로 일단 테스트를 해본 뒤에 LGBM을 사용합니다.
    model = LGBMClassifier(
        max_depth=max_depth,
        num_leaves=num_leaves,
        min_child_samples=min_child_samples,
        colsample_bytree=colsample_bytree,
        n_estimators=n_estimators,
        learning_rate=learning_rate,
        random_state=61,
        eval_metric='binary_logloss',
    )

    # K-Fold Cross validation을 구현합니다.
    folds = StratifiedKFold(n_splits=K, random_state=61, shuffle=True)
    losses = []

    for train_idx, val_idx in folds.split(X, y):
        X_train = X.iloc[train_idx, :]
        y_train = y.iloc[train_idx]

        X_val = X.iloc[val_idx, :]
        y_val = y.iloc[val_idx]

        model.fit(X_train, y_train)
        preds = model.predict(X_val)
        loss = evaluation_metric(y_val, preds)
        losses.append(loss)


    # K-Fold의 평균 loss값을 돌려줍니다.
    return np.mean(losses)

In [23]:
K = 5   # Kfold 수
opt_func = partial(optimizer, X=X, y=y, K=K)

study = optuna.create_study(direction="maximize") # 최소/최대 어느 방향의 최적값을 구할 건지.
study.optimize(opt_func, n_trials=50)

[I 2023-10-12 16:56:44,364] A new study created in memory with name: no-name-32b6f6e3-c459-46b5-9dc4-7583543aba5d


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.006514 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002156 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[L

[I 2023-10-12 16:56:51,132] Trial 0 finished with value: 0.6554727119295547 and parameters: {'max_depth': 22, 'num_leaves': 1024, 'min_child_samples': 48, 'colsample_bytree': 0.5738115835734765, 'n_estimators': 81, 'learning_rate': 0.24677706632620663}. Best is trial 0 with value: 0.6554727119295547.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.006593 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.006916 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 16:57:26,718] Trial 1 finished with value: 0.6496533694658323 and parameters: {'max_depth': 15, 'num_leaves': 128, 'min_child_samples': 68, 'colsample_bytree': 0.5465080307132649, 'n_estimators': 1079, 'learning_rate': 0.22925211508669857}. Best is trial 0 with value: 0.6554727119295547.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.014039 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010475 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 16:59:12,809] Trial 2 finished with value: 0.6587894356793729 and parameters: {'max_depth': 20, 'num_leaves': 1024, 'min_child_samples': 98, 'colsample_bytree': 0.7748090885311691, 'n_estimators': 1423, 'learning_rate': 0.01407085894134133}. Best is trial 2 with value: 0.6587894356793729.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012112 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011121 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:02:56,072] Trial 3 finished with value: 0.6476592250308411 and parameters: {'max_depth': 25, 'num_leaves': 512, 'min_child_samples': 54, 'colsample_bytree': 0.5936071989661091, 'n_estimators': 1630, 'learning_rate': 0.16948290862732593}. Best is trial 2 with value: 0.6587894356793729.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.016626 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013394 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:03:27,000] Trial 4 finished with value: 0.6541585971061712 and parameters: {'max_depth': 16, 'num_leaves': 128, 'min_child_samples': 31, 'colsample_bytree': 0.688406883052448, 'n_estimators': 408, 'learning_rate': 0.22294987933893878}. Best is trial 2 with value: 0.6587894356793729.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012613 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013287 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:04:27,708] Trial 5 finished with value: 0.6517175541706038 and parameters: {'max_depth': 15, 'num_leaves': 128, 'min_child_samples': 81, 'colsample_bytree': 0.7530170464065351, 'n_estimators': 1027, 'learning_rate': 0.18934343469367432}. Best is trial 2 with value: 0.6587894356793729.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013451 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011685 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:04:59,125] Trial 6 finished with value: 0.6568193698574994 and parameters: {'max_depth': 24, 'num_leaves': 128, 'min_child_samples': 47, 'colsample_bytree': 0.5821285234456054, 'n_estimators': 552, 'learning_rate': 0.05413315693588195}. Best is trial 2 with value: 0.6587894356793729.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009570 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012928 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:05:57,434] Trial 7 finished with value: 0.6581377051270503 and parameters: {'max_depth': 16, 'num_leaves': 128, 'min_child_samples': 71, 'colsample_bytree': 0.569038291442038, 'n_estimators': 876, 'learning_rate': 0.04305009710465173}. Best is trial 2 with value: 0.6587894356793729.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.018871 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013213 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:09:32,111] Trial 8 finished with value: 0.6538442501346393 and parameters: {'max_depth': 16, 'num_leaves': 1024, 'min_child_samples': 36, 'colsample_bytree': 0.6794363578334728, 'n_estimators': 1648, 'learning_rate': 0.05349837851012683}. Best is trial 2 with value: 0.6587894356793729.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013838 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012360 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:10:21,035] Trial 9 finished with value: 0.6466478942095588 and parameters: {'max_depth': 18, 'num_leaves': 512, 'min_child_samples': 89, 'colsample_bytree': 0.6389961813065926, 'n_estimators': 748, 'learning_rate': 0.22225951989417045}. Best is trial 2 with value: 0.6587894356793729.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013752 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013735 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:13:59,677] Trial 10 finished with value: 0.6634569658368192 and parameters: {'max_depth': 20, 'num_leaves': 256, 'min_child_samples': 7, 'colsample_bytree': 0.7877714744904999, 'n_estimators': 1941, 'learning_rate': 0.0019831552074608405}. Best is trial 10 with value: 0.6634569658368192.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012416 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013029 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:17:25,574] Trial 11 finished with value: 0.6622296311294742 and parameters: {'max_depth': 20, 'num_leaves': 256, 'min_child_samples': 6, 'colsample_bytree': 0.7872113403383444, 'n_estimators': 2034, 'learning_rate': 0.006326133597003708}. Best is trial 10 with value: 0.6634569658368192.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013048 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010214 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:20:31,299] Trial 12 finished with value: 0.6545514377421257 and parameters: {'max_depth': 20, 'num_leaves': 256, 'min_child_samples': 7, 'colsample_bytree': 0.7840519573792116, 'n_estimators': 2015, 'learning_rate': 0.10993904929309878}. Best is trial 10 with value: 0.6634569658368192.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012153 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010154 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:27:46,815] Trial 13 finished with value: 0.6644553594120093 and parameters: {'max_depth': 22, 'num_leaves': 256, 'min_child_samples': 5, 'colsample_bytree': 0.7453759396136168, 'n_estimators': 2044, 'learning_rate': 0.002144840837902651}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011237 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013180 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:30:54,828] Trial 14 finished with value: 0.6538616532835709 and parameters: {'max_depth': 22, 'num_leaves': 256, 'min_child_samples': 20, 'colsample_bytree': 0.7336270682042454, 'n_estimators': 1759, 'learning_rate': 0.10858569444996202}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010736 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009236 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:32:58,007] Trial 15 finished with value: 0.6515100528695429 and parameters: {'max_depth': 22, 'num_leaves': 256, 'min_child_samples': 20, 'colsample_bytree': 0.728907134358558, 'n_estimators': 1304, 'learning_rate': 0.29842835258263495}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.014998 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013090 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:35:58,842] Trial 16 finished with value: 0.6538911248087593 and parameters: {'max_depth': 18, 'num_leaves': 256, 'min_child_samples': 19, 'colsample_bytree': 0.7926518856721786, 'n_estimators': 1863, 'learning_rate': 0.09453602587839133}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.016180 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003958 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[L

[I 2023-10-12 17:39:01,697] Trial 17 finished with value: 0.6624921666217585 and parameters: {'max_depth': 23, 'num_leaves': 256, 'min_child_samples': 5, 'colsample_bytree': 0.5054116298834979, 'n_estimators': 1424, 'learning_rate': 0.0026728659865737072}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009131 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009752 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:41:27,305] Trial 18 finished with value: 0.6528902834009125 and parameters: {'max_depth': 19, 'num_leaves': 256, 'min_child_samples': 31, 'colsample_bytree': 0.7483891983274931, 'n_estimators': 1831, 'learning_rate': 0.08744550653067491}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.015049 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009710 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:43:28,245] Trial 19 finished with value: 0.6542541118802042 and parameters: {'max_depth': 23, 'num_leaves': 256, 'min_child_samples': 15, 'colsample_bytree': 0.799832017983449, 'n_estimators': 1478, 'learning_rate': 0.13556161046095966}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009371 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.006355 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:46:46,291] Trial 20 finished with value: 0.6541802290399442 and parameters: {'max_depth': 21, 'num_leaves': 512, 'min_child_samples': 36, 'colsample_bytree': 0.7110583229949102, 'n_estimators': 2026, 'learning_rate': 0.03649981711126294}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.007419 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009233 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:48:41,772] Trial 21 finished with value: 0.6614971476500068 and parameters: {'max_depth': 24, 'num_leaves': 256, 'min_child_samples': 7, 'colsample_bytree': 0.5034902044383918, 'n_estimators': 1278, 'learning_rate': 0.0027425212596389964}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009601 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010954 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:51:04,719] Trial 22 finished with value: 0.6576153247332733 and parameters: {'max_depth': 23, 'num_leaves': 256, 'min_child_samples': 15, 'colsample_bytree': 0.6368555889470956, 'n_estimators': 1579, 'learning_rate': 0.02774247344377742}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012029 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012307 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:54:06,358] Trial 23 finished with value: 0.6575517697811117 and parameters: {'max_depth': 21, 'num_leaves': 256, 'min_child_samples': 6, 'colsample_bytree': 0.7615155923792962, 'n_estimators': 1866, 'learning_rate': 0.025857194701296618}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011752 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.014389 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:56:48,994] Trial 24 finished with value: 0.6494137961688546 and parameters: {'max_depth': 25, 'num_leaves': 256, 'min_child_samples': 28, 'colsample_bytree': 0.6758842101673881, 'n_estimators': 1272, 'learning_rate': 0.0016124860974311445}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.008383 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009730 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 17:59:18,159] Trial 25 finished with value: 0.655670350669751 and parameters: {'max_depth': 23, 'num_leaves': 256, 'min_child_samples': 13, 'colsample_bytree': 0.7129087822740289, 'n_estimators': 1710, 'learning_rate': 0.06075637704537404}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009824 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011970 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:01:38,199] Trial 26 finished with value: 0.6535730020526038 and parameters: {'max_depth': 21, 'num_leaves': 256, 'min_child_samples': 23, 'colsample_bytree': 0.764291533460864, 'n_estimators': 1490, 'learning_rate': 0.07616785609912277}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009539 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011032 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:04:22,313] Trial 27 finished with value: 0.6580832464061752 and parameters: {'max_depth': 19, 'num_leaves': 256, 'min_child_samples': 12, 'colsample_bytree': 0.626887369304604, 'n_estimators': 1909, 'learning_rate': 0.02711499514582785}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009716 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.014317 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:06:50,756] Trial 28 finished with value: 0.6526243523621631 and parameters: {'max_depth': 24, 'num_leaves': 1024, 'min_child_samples': 40, 'colsample_bytree': 0.6521728389441058, 'n_estimators': 1133, 'learning_rate': 0.06846684270508681}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009250 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010712 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:07:18,814] Trial 29 finished with value: 0.6622570890288604 and parameters: {'max_depth': 22, 'num_leaves': 512, 'min_child_samples': 27, 'colsample_bytree': 0.7398256790515221, 'n_estimators': 205, 'learning_rate': 0.03320636377008683}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010272 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009309 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:09:55,507] Trial 30 finished with value: 0.6532941968010288 and parameters: {'max_depth': 21, 'num_leaves': 1024, 'min_child_samples': 55, 'colsample_bytree': 0.7675158770264592, 'n_estimators': 1705, 'learning_rate': 0.049376379768622074}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009783 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.007445 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:10:41,617] Trial 31 finished with value: 0.6611196182491217 and parameters: {'max_depth': 22, 'num_leaves': 512, 'min_child_samples': 27, 'colsample_bytree': 0.7363459736387775, 'n_estimators': 333, 'learning_rate': 0.02023897297349926}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004129 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009732 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[L

[I 2023-10-12 18:11:07,654] Trial 32 finished with value: 0.5 and parameters: {'max_depth': 23, 'num_leaves': 512, 'min_child_samples': 11, 'colsample_bytree': 0.5013628073119065, 'n_estimators': 179, 'learning_rate': 0.0026565759629410216}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009576 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010048 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:11:18,684] Trial 33 finished with value: 0.6542521691718864 and parameters: {'max_depth': 22, 'num_leaves': 512, 'min_child_samples': 24, 'colsample_bytree': 0.7746627974737852, 'n_estimators': 68, 'learning_rate': 0.035534088381207356}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010318 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010136 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:12:49,429] Trial 34 finished with value: 0.6610927251689597 and parameters: {'max_depth': 20, 'num_leaves': 512, 'min_child_samples': 5, 'colsample_bytree': 0.751875508919509, 'n_estimators': 646, 'learning_rate': 0.019537907733956597}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.008721 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009633 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:14:12,637] Trial 35 finished with value: 0.6586156818909984 and parameters: {'max_depth': 19, 'num_leaves': 512, 'min_child_samples': 44, 'colsample_bytree': 0.6088187133263868, 'n_estimators': 950, 'learning_rate': 0.018239568474771442}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010244 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003850 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[L

[I 2023-10-12 18:15:56,034] Trial 36 finished with value: 0.6562746462302571 and parameters: {'max_depth': 22, 'num_leaves': 256, 'min_child_samples': 16, 'colsample_bytree': 0.5393431986476701, 'n_estimators': 1400, 'learning_rate': 0.04300271717597913}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.008546 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.008823 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:18:17,278] Trial 37 finished with value: 0.6586348124568184 and parameters: {'max_depth': 24, 'num_leaves': 1024, 'min_child_samples': 57, 'colsample_bytree': 0.7775103184822298, 'n_estimators': 1140, 'learning_rate': 0.01328696031332813}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009712 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010458 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:19:41,563] Trial 38 finished with value: 0.6566727306998363 and parameters: {'max_depth': 25, 'num_leaves': 128, 'min_child_samples': 11, 'colsample_bytree': 0.7980450750824557, 'n_estimators': 1569, 'learning_rate': 0.06325241342268775}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010844 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.008046 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:20:24,290] Trial 39 finished with value: 0.658240485739371 and parameters: {'max_depth': 21, 'num_leaves': 512, 'min_child_samples': 64, 'colsample_bytree': 0.7017967527594442, 'n_estimators': 454, 'learning_rate': 0.034431871289517225}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010734 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013260 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:21:32,676] Trial 40 finished with value: 0.6556985811472282 and parameters: {'max_depth': 18, 'num_leaves': 256, 'min_child_samples': 35, 'colsample_bytree': 0.7435490243152889, 'n_estimators': 862, 'learning_rate': 0.0472836505032856}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.010498 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009371 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [

[I 2023-10-12 18:25:37,472] Trial 41 finished with value: 0.6493778738903334 and parameters: {'max_depth': 20, 'num_leaves': 256, 'min_child_samples': 5, 'colsample_bytree': 0.7791649163678115, 'n_estimators': 1959, 'learning_rate': 0.0010138771650056108}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.005496 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013976 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[L

[I 2023-10-12 18:28:59,916] Trial 42 finished with value: 0.6580966693286967 and parameters: {'max_depth': 19, 'num_leaves': 256, 'min_child_samples': 10, 'colsample_bytree': 0.7615809807336867, 'n_estimators': 2030, 'learning_rate': 0.015938002259222983}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.015071 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.005640 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[L

[I 2023-10-12 18:31:03,204] Trial 43 finished with value: 0.6616974336674921 and parameters: {'max_depth': 20, 'num_leaves': 128, 'min_child_samples': 17, 'colsample_bytree': 0.7305053552969952, 'n_estimators': 1814, 'learning_rate': 0.011468588119674347}. Best is trial 13 with value: 0.6644553594120093.


[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013604 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3802
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365
[LightGBM] [Info] Number of positive: 18451, number of negative: 62959
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011553 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3801
[LightGBM] [Info] Number of data points in the train set: 81410, number of used features: 22
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.226643 -> initscore=-1.227365
[LightGBM] [Info] Start training from score -1.227365


In [24]:
# optuna가 시도했던 모든 실험 관련 데이터
study.trials_dataframe()

In [25]:
print("Best Score: %.4f" % study.best_value) # best score 출력
print("Best params: ", study.best_trial.params) # best score일 때의 하이퍼파라미터들

In [26]:
# 실험 기록 시각화
optuna.visualization.plot_optimization_history(study)

In [27]:
# hyper-parameter들의 중요도
optuna.visualization.plot_param_importances(study)

### 7. 테스트 및 제출 파일 생성

In [28]:
# Make KFold OOF prediction
def oof_preds(best_model):

    # make KFold
    folds = StratifiedKFold(n_splits=K, random_state=42, shuffle=True)
    final_preds = []
    losses = []
    # fitting with best_model
    for i, (train_idx, val_idx) in enumerate(folds.split(X, y)):
        X_train = X.iloc[train_idx, :]
        y_train = y.iloc[train_idx]
        X_val = X.iloc[val_idx, :]
        y_val = y.iloc[val_idx]

        print(f"========== Fold {i+1} ==========")
        best_model.fit(X_train, y_train)
        preds = best_model.predict_proba(X_val)[:, 1]
        test_preds = best_model.predict_proba(test)[:, 1]
        final_preds.append(test_preds)
        loss = evaluation_metric(y_val, preds)

        losses.append(loss)

    avg_loss = np.mean(losses)
    print(f"Loss : {avg_loss:.4f}")
    return final_preds

In [29]:
test.info() # 결측치 없음.

In [91]:
## X_test 만들기 : 앞서했던 전처리를 동일하게 적용해주면 됨.


In [30]:
best_params = study.best_trial.params

# define best model
best_model = LGBMClassifier(**best_params,
                           random_state=61)

# model finalization : 가장 일반적으로 좋은 예측 성능을 냈던 모델로, 전체 데이터 트레이닝.

preds = oof_preds(best_model)
preds = np.mean(preds, axis=0)
preds

In [31]:
submission['defects'] = preds

In [32]:
submission.to_csv(base_path+"submission_lightgbm_kfold.csv", index=False)