# LightGBM Model Train - Holdout
LightGMB を利用してモデル学習します。本ノートブックでは Azure Machine Learning Python SDK は使っていません。

## Prerequisites
必要なライブラリをインポートします。

In [9]:
import lightgbm as lgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import pandas as pd
import joblib
import os

## Data Preparation
ボストンの住宅価格のサンプルデータをロードします。

In [10]:
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['target'] = boston.target

## Split Data
データをインポートし、説明変数と目的変数に分割します。

In [11]:
X, y = df.drop(['target'], axis=1), df['target']

In [12]:
# データ分割
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = 0.2, 
                                                    random_state=0)

## Model Training
LightGBM を用いて分類モデルを作成します。

In [13]:
## Parameter list (https://lightgbm.readthedocs.io/en/latest/Parameters.html)
params = {
    'boosting_type': 'gbdt',
    'learning_rate': 0.15,
    'n_estimators':1000,
    'colsample_bytree': 1.0,
    'reg_alpha': 1e-3,
    'reg_lambda': 1e-3,
    'seed': 1234,
}

In [14]:
# callback
def log_evaluation(run):
    def callback(env):
        for data_name, eval_name, result, _ in env.evaluation_result_list:
            key = f'{data_name}+{eval_name}'
            dict[key] = result
    return callback

In [15]:
clf = lgb.LGBMRegressor(**params)

dict = {}

result = clf.fit(X_train, y_train, 
                   verbose=50, 
                   #categorical_feature=X_cat.columns.tolist(), 
                   eval_set=[(X_train, y_train), (X_test, y_test)],
                   eval_names=['train', 'test'],
                   early_stopping_rounds=100,
                   callbacks=[log_evaluation(dict)]
                  )

Training until validation scores don't improve for 100 rounds
[50]	train's l2: 2.52896	test's l2: 24.6795
[100]	train's l2: 1.03211	test's l2: 23.5957
[150]	train's l2: 0.463691	test's l2: 23.1745
[200]	train's l2: 0.232365	test's l2: 23.2039
Early stopping, best iteration is:
[113]	train's l2: 0.79494	test's l2: 22.952


## Model Save
モデルを pickle に変換します。

In [16]:
model_file = 'lgb.pkl'
OUTPUT_DIR = 'model'

In [17]:
os.makedirs(OUTPUT_DIR, exist_ok=True)

In [18]:
with open(os.path.join(OUTPUT_DIR, model_file), 'wb') as f:
    joblib.dump(clf, f)