今までのlightBGMではスコアは頭打ち状態なので、他のアルゴリズムも試してみる。それらのアンサンブルでスコアの向上を狙ってみる。

In [20]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import scipy as sp
import lightgbm as lgb
import category_encoders as ce
import mojimoji
import re
from cmath import nan
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import accuracy_score
import optuna
from sklearn.metrics import mean_squared_error

訓練データの読み込み

In [21]:
house_age = pd.read_csv('house_age.csv')
area_size = pd.read_csv('area_size.csv')
room_arrange_scores = pd.read_csv('room_arrange_scores.csv')
contract_span = pd.read_csv('contract_span.csv')
floor_scores = pd.read_csv('floor_scores.csv')
Floor_scores = pd.read_csv('capital_floor_scores.csv')
stations = pd.read_csv('stations.csv')
minits = pd.read_csv('minits.csv')
addresses = pd.read_csv('addresses.csv')
room_arrange = pd.read_csv('room_arrange.csv')

rent = pd.read_csv('rent.csv')

テストデータ

In [22]:
test_house_age = pd.read_csv('test_house_age.csv')
test_area_size = pd.read_csv('test_area_size.csv')
test_room_arrange_scores = pd.read_csv('test_room_arrange_scores.csv')
test_contract_span = pd.read_csv('test_contract_span.csv')
test_floor_scores = pd.read_csv('test_floor_scores.csv')
test_Floor_scores = pd.read_csv('test_capital_floor_scores.csv')
test_stations = pd.read_csv('test_stations.csv')
test_minits = pd.read_csv('test_minits.csv')
test_addresses = pd.read_csv('test_addresses.csv')
test_room_arrange = pd.read_csv('test_room_arrange.csv')

In [47]:
X_train = pd.concat([house_age, area_size, contract_span, floor_scores, Floor_scores, stations, minits, addresses, room_arrange], axis=1)
y_train = rent

X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0)

X_test = pd.concat([test_house_age, test_area_size, test_contract_span, test_floor_scores, test_Floor_scores, test_stations, test_minits, test_addresses, test_room_arrange], axis=1)

category_lists = ['最寄り駅', '所在地', 'L', 'D', 'K', 'S']
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train)

params = {
    'objective':'regression',
    'boosting_type':'rf',#ランダムフォレストを使う
    'metrics':'rmse',
    'bagging_freq': 10,
    'bagging_fraction': 0.5,
    'reg_lambda': 3.681194978110037e-06,
    'max_bin': 522,
    'num_leaves': 124
}

model = lgb.train(
                    params,
                    lgb_train, 
                    valid_sets=[lgb_train, lgb_eval], 
                    verbose_eval=10, 
                    num_boost_round=3000, 
                    early_stopping_rounds=10,
                    categorical_feature = category_lists
                    )

y_pred = model.predict(X_test, num_iteration=model.best_iteration)

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2444
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
Training until validation scores don't improve for 10 rounds
[10]	training's rmse: 35472.1	valid_1's rmse: 41641.7
[20]	training's rmse: 32771.7	valid_1's rmse: 39412.2
[30]	training's rmse: 31925.3	valid_1's rmse: 38847
[40]	training's rmse: 31901.6	valid_1's rmse: 39228.9
Early stopping, best iteration is:
[30]	training's rmse: 31925.3	valid_1's rmse: 38847


あまり芳しくない値なので、パラメータチューニングを行ってみる。

In [61]:
X_train = pd.concat([house_age, area_size, contract_span, floor_scores, Floor_scores, stations, minits, addresses, room_arrange], axis=1)
y_train = rent

X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0)

X_test = pd.concat([test_house_age, test_area_size, test_contract_span, test_floor_scores, test_Floor_scores, test_stations, test_minits, test_addresses, test_room_arrange], axis=1)

def objective(trial):
    category_lists = ['最寄り駅', '所在地', 'L', 'D', 'K', 'S']
    lgb_train = lgb.Dataset(X_train, y_train)
    lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train)

    params = {
        'objective':'regression',
        'boosting_type':'rf',
        'metrics':'rmse',
        'learning_rate':0.05,
        'bagging_freq':1,
        'bagging_fraction': trial.suggest_float('reg_lambda', 0.21, 0.99, log=True),
        'reg_lambda': trial.suggest_float('reg_lambda', 0.0000001, 0.0001, log=True),
        'max_bin': trial.suggest_int('max_bin', 255, 600),
        'num_leaves': trial.suggest_int('num_leaves', 32, 128),
    }

    model = lgb.train(
                        params,
                        lgb_train, 
                        valid_sets=[lgb_train, lgb_eval], 
                        #verbose_eval=10, 
                        num_boost_round=3000, 
                        early_stopping_rounds=10,
                        categorical_feature = category_lists
                        )

    y_pred_valid = model.predict(X_valid, num_iteration=model.best_iteration)

    loss = mean_squared_error(y_valid, y_pred_valid, squared=False)
    return loss

In [None]:
optuna.logging.disable_default_handler()
study = optuna.create_study(sampler=optuna.samplers.RandomSampler(seed=0))
study.optimize(objective, n_trials=200)
study.best_params

このチューニングされたパラメータを使う。

In [57]:
X_train = pd.concat([house_age, area_size, contract_span, floor_scores, Floor_scores, stations, minits, addresses, room_arrange], axis=1)
y_train = rent

X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0)

X_test = pd.concat([test_house_age, test_area_size, test_contract_span, test_floor_scores, test_Floor_scores, test_stations, test_minits, test_addresses, test_room_arrange], axis=1)

category_lists = ['最寄り駅', '所在地', 'L', 'D', 'K', 'S']
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train)

params = {
    'objective':'regression',
    'boosting_type':'rf',
    'metrics':'rmse',
    'bagging_freq': 1,
    'bagging_fraction': 0.8,
    'reg_lambda': 0.8541733691883097,
    'max_bin': 510,
    'num_leaves': 119
}

model = lgb.train(
                    params,
                    lgb_train, 
                    valid_sets=[lgb_train, lgb_eval], 
                    verbose_eval=10, 
                    num_boost_round=3000, 
                    early_stopping_rounds=10,
                    categorical_feature = category_lists
                    )

y_pred = model.predict(X_test, num_iteration=model.best_iteration)

You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2420
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
Training until validation scores don't improve for 10 rounds
[10]	training's rmse: 29013.2	valid_1's rmse: 36036.6
Early stopping, best iteration is:
[8]	training's rmse: 29070.4	valid_1's rmse: 35940.1


New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


ランダムフォレストではうまくいかなそう。他のアルゴリズムを使う。

dartを使う。

In [63]:
X_train = pd.concat([house_age, area_size, contract_span, floor_scores, Floor_scores, stations, minits, addresses, room_arrange], axis=1)
y_train = rent

X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0)

X_test = pd.concat([test_house_age, test_area_size, test_contract_span, test_floor_scores, test_Floor_scores, test_stations, test_minits, test_addresses, test_room_arrange], axis=1)

category_lists = ['最寄り駅', '所在地', 'L', 'D', 'K', 'S']
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train)

params = {
    'objective':'regression',
    'boosting_type':'dart',
    'metrics':'rmse',
    'lambda_l1':0.000001
}

model = lgb.train(
                    params,
                    lgb_train, 
                    valid_sets=[lgb_train, lgb_eval], 
                    verbose_eval=10, 
                    num_boost_round=3000, 
                    early_stopping_rounds=10,
                    categorical_feature = category_lists
                    )

y_pred = model.predict(X_test, num_iteration=model.best_iteration)

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1877
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 40839.2	valid_1's rmse: 45284.8
[20]	training's rmse: 26245.4	valid_1's rmse: 33594.9
[30]	training's rmse: 20647.8	valid_1's rmse: 29474.7
[40]	training's rmse: 19195.2	valid_1's rmse: 28396.9
[50]	training's rmse: 23317.3	valid_1's rmse: 31380.5
[60]	training's rmse: 27004.9	valid_1's rmse: 34222.3
[70]	training's rmse: 25752.1	valid_1's rmse: 33259.3
[80]	training's rmse: 21192.3	valid_1's rmse: 29947.9
[90]	training's rmse: 21533.4	valid_1's rmse: 30263.4
[100]	training's rmse: 22276.4	valid_1's rmse: 30863.2
[110]	training's rmse: 28096	valid_1's rmse: 35321.1
[120]	training's rmse: 20539.7	valid_1's rmse: 29637.7
[130]	training's rmse: 18872.2	valid_1's rms

dartはよさそう。チューニングしてみる。

In [65]:
X_train = pd.concat([house_age, area_size, contract_span, floor_scores, Floor_scores, stations, minits, addresses, room_arrange], axis=1)
y_train = rent

X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0)

X_test = pd.concat([test_house_age, test_area_size, test_contract_span, test_floor_scores, test_Floor_scores, test_stations, test_minits, test_addresses, test_room_arrange], axis=1)

def objective(trial):
    category_lists = ['最寄り駅', '所在地', 'L', 'D', 'K', 'S']
    lgb_train = lgb.Dataset(X_train, y_train)
    lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train)

    params = {
        'objective':'regression',
        'boosting_type':'dart',
        'metrics':'rmse',
        'learning_rate':0.05,
        'reg_lambda': trial.suggest_float('reg_lambda', 0.0000001, 0.0001, log=True),
        'max_bin': trial.suggest_int('max_bin', 255, 600),
        'num_leaves': trial.suggest_int('num_leaves', 32, 128),
    }

    model = lgb.train(
                        params,
                        lgb_train, 
                        valid_sets=[lgb_train, lgb_eval], 
                        verbose_eval=10, 
                        num_boost_round=3000, 
                        early_stopping_rounds=10,
                        categorical_feature = category_lists
                        )

    y_pred_valid = model.predict(X_valid, num_iteration=model.best_iteration)

    loss = mean_squared_error(y_valid, y_pred_valid, squared=False)
    return loss

In [66]:
optuna.logging.disable_default_handler()
study = optuna.create_study(sampler=optuna.samplers.RandomSampler(seed=0))
study.optimize(objective, n_trials=50)
study.best_params

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2404
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52663.9	valid_1's rmse: 55365.7
[20]	training's rmse: 39657.6	valid_1's rmse: 44316.7
[30]	training's rmse: 31838.6	valid_1's rmse: 38205.8
[40]	training's rmse: 28519.3	valid_1's rmse: 35667.6
[50]	training's rmse: 31848.8	valid_1's rmse: 38332.3
[60]	training's rmse: 35992.3	valid_1's rmse: 41896.1
[70]	training's rmse: 37909.6	valid_1's rmse: 43613
[80]	training's rmse: 32864.7	valid_1's rmse: 39352.6
[90]	training's rmse: 32710.1	valid_1's rmse: 39197.6
[100]	training's rmse: 34820.4	valid_1's rmse: 41039.9
[110]	training's rmse: 41580.9	valid_1's rmse: 46803.8
[120]	training's rmse: 33282.9	valid_1's rmse: 39793.8
[130]	training's rmse: 30130.2	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2174
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52622.8	valid_1's rmse: 55414.6
[20]	training's rmse: 39627	valid_1's rmse: 44482.8
[30]	training's rmse: 31876.7	valid_1's rmse: 38415.4
[40]	training's rmse: 28477.7	valid_1's rmse: 35814.1
[50]	training's rmse: 31836.2	valid_1's rmse: 38484.5
[60]	training's rmse: 35907.7	valid_1's rmse: 41994.5
[70]	training's rmse: 37924.2	valid_1's rmse: 43747.6
[80]	training's rmse: 32918.6	valid_1's rmse: 39553.6
[90]	training's rmse: 32791.6	valid_1's rmse: 39384.1
[100]	training's rmse: 34829.3	valid_1's rmse: 41121.4
[110]	training's rmse: 41619.8	valid_1's rmse: 46919.7
[120]	training's rmse: 33371.1	valid_1's rmse: 39915.5
[130]	training's rmse: 30211.6	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2526
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52317.6	valid_1's rmse: 55227.9
[20]	training's rmse: 39200.6	valid_1's rmse: 44279.3
[30]	training's rmse: 31503.2	valid_1's rmse: 38169.5
[40]	training's rmse: 28052.2	valid_1's rmse: 35485.1
[50]	training's rmse: 31522.9	valid_1's rmse: 38230
[60]	training's rmse: 35792.9	valid_1's rmse: 41806.9
[70]	training's rmse: 37719.9	valid_1's rmse: 43520.2
[80]	training's rmse: 32706.4	valid_1's rmse: 39331.2
[90]	training's rmse: 32645.4	valid_1's rmse: 39237.8
[100]	training's rmse: 34680.4	valid_1's rmse: 40984.9
[110]	training's rmse: 41458.2	valid_1's rmse: 46739.3
[120]	training's rmse: 33018.8	valid_1's rmse: 39683.6
[130]	training's rmse: 29921.7	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2456
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52451.9	valid_1's rmse: 55106.6
[20]	training's rmse: 39412.6	valid_1's rmse: 44100.1
[30]	training's rmse: 31682.3	valid_1's rmse: 38128.6
[40]	training's rmse: 28322.8	valid_1's rmse: 35594.1
[50]	training's rmse: 31686.6	valid_1's rmse: 38235.3
[60]	training's rmse: 35816.4	valid_1's rmse: 41809.7
[70]	training's rmse: 37755.9	valid_1's rmse: 43499.1
[80]	training's rmse: 32757.4	valid_1's rmse: 39305.9
[90]	training's rmse: 32640.5	valid_1's rmse: 39150.4
[100]	training's rmse: 34733.5	valid_1's rmse: 40962.8
[110]	training's rmse: 41490.4	valid_1's rmse: 46746.8
[120]	training's rmse: 33193.2	valid_1's rmse: 39712
[130]	training's rmse: 30053.4	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2550
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 53335.3	valid_1's rmse: 56056.4
[20]	training's rmse: 40168.7	valid_1's rmse: 45021.1
[30]	training's rmse: 32881.8	valid_1's rmse: 39139.7
[40]	training's rmse: 29958.5	valid_1's rmse: 36838
[50]	training's rmse: 33119.8	valid_1's rmse: 39398
[60]	training's rmse: 37006.9	valid_1's rmse: 42750.3
[70]	training's rmse: 38747.3	valid_1's rmse: 44218.6
[80]	training's rmse: 33879.6	valid_1's rmse: 40099
[90]	training's rmse: 33757.3	valid_1's rmse: 39981.4
[100]	training's rmse: 35667.9	valid_1's rmse: 41625.3
[110]	training's rmse: 42230.5	valid_1's rmse: 47289
[120]	training's rmse: 34189	valid_1's rmse: 40386.4
[130]	training's rmse: 31166	valid_1's rmse: 37887.6

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1889
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52589.8	valid_1's rmse: 55169.3
[20]	training's rmse: 39751.2	valid_1's rmse: 44650.4
[30]	training's rmse: 31956.6	valid_1's rmse: 38746.1
[40]	training's rmse: 28538.2	valid_1's rmse: 36146.3
[50]	training's rmse: 31875.3	valid_1's rmse: 38633.1
[60]	training's rmse: 36044.9	valid_1's rmse: 42214.2
[70]	training's rmse: 38011	valid_1's rmse: 44000.6
[80]	training's rmse: 33021.2	valid_1's rmse: 39880.2
[90]	training's rmse: 32802.3	valid_1's rmse: 39600.9
[100]	training's rmse: 34941.8	valid_1's rmse: 41509.4
[110]	training's rmse: 41690.2	valid_1's rmse: 47118.8
[120]	training's rmse: 33237.5	valid_1's rmse: 40120.5
[130]	training's rmse: 30115.8	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2512
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52218.2	valid_1's rmse: 55196.6
[20]	training's rmse: 39072	valid_1's rmse: 44216.8
[30]	training's rmse: 31293.9	valid_1's rmse: 38238.9
[40]	training's rmse: 27849.8	valid_1's rmse: 35638.1
[50]	training's rmse: 31287.8	valid_1's rmse: 38296.2
[60]	training's rmse: 35712	valid_1's rmse: 41958.4
[70]	training's rmse: 37716.8	valid_1's rmse: 43678.3
[80]	training's rmse: 32662.1	valid_1's rmse: 39454.5
[90]	training's rmse: 32613.3	valid_1's rmse: 39366.7
[100]	training's rmse: 34683.1	valid_1's rmse: 41140.8
[110]	training's rmse: 41437.9	valid_1's rmse: 46854.6
[120]	training's rmse: 33023.6	valid_1's rmse: 39769.9
[130]	training's rmse: 29806.8	valid_1's rmse:

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2213
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52490.4	valid_1's rmse: 55307.6
[20]	training's rmse: 39426.3	valid_1's rmse: 44409
[30]	training's rmse: 31689.2	valid_1's rmse: 38272.3
[40]	training's rmse: 28177.4	valid_1's rmse: 35687.6
[50]	training's rmse: 31589.7	valid_1's rmse: 38356.3
[60]	training's rmse: 35821.6	valid_1's rmse: 41919.5
[70]	training's rmse: 37836.2	valid_1's rmse: 43679.2
[80]	training's rmse: 32800.7	valid_1's rmse: 39454.9
[90]	training's rmse: 32710.9	valid_1's rmse: 39323.7
[100]	training's rmse: 34789.8	valid_1's rmse: 41098.1
[110]	training's rmse: 41552.3	valid_1's rmse: 46853.1
[120]	training's rmse: 33213.3	valid_1's rmse: 39852.6
[130]	training's rmse: 30100.7	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2352
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 53258.2	valid_1's rmse: 55898.7
[20]	training's rmse: 40012.7	valid_1's rmse: 44800
[30]	training's rmse: 32627.9	valid_1's rmse: 38894.7
[40]	training's rmse: 29553.2	valid_1's rmse: 36541.2
[50]	training's rmse: 32691.2	valid_1's rmse: 39020.1
[60]	training's rmse: 36687.5	valid_1's rmse: 42412.8
[70]	training's rmse: 38504.3	valid_1's rmse: 44021.8
[80]	training's rmse: 33697.8	valid_1's rmse: 39967.4
[90]	training's rmse: 33571.3	valid_1's rmse: 39818.9
[100]	training's rmse: 35558.4	valid_1's rmse: 41507.4
[110]	training's rmse: 42181.2	valid_1's rmse: 47200.9
[120]	training's rmse: 33985	valid_1's rmse: 40208.2
[130]	training's rmse: 30914.9	valid_1's rmse:

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2270
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52809.4	valid_1's rmse: 55622.9
[20]	training's rmse: 40054.6	valid_1's rmse: 44815.5
[30]	training's rmse: 32194.8	valid_1's rmse: 38727
[40]	training's rmse: 28887.5	valid_1's rmse: 36185.3
[50]	training's rmse: 32066.5	valid_1's rmse: 38675.7
[60]	training's rmse: 36188.4	valid_1's rmse: 42218.2
[70]	training's rmse: 38124.5	valid_1's rmse: 43892.7
[80]	training's rmse: 33204.5	valid_1's rmse: 39768.7
[90]	training's rmse: 33079	valid_1's rmse: 39618.9
[100]	training's rmse: 35134	valid_1's rmse: 41376.1
[110]	training's rmse: 41839.9	valid_1's rmse: 47097.1
[120]	training's rmse: 33648.2	valid_1's rmse: 40115.6
[130]	training's rmse: 30543.8	valid_1's rmse: 3

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2444
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52529.5	valid_1's rmse: 55262.5
[20]	training's rmse: 39565.8	valid_1's rmse: 44319
[30]	training's rmse: 31763.5	valid_1's rmse: 38266.6
[40]	training's rmse: 28412.4	valid_1's rmse: 35708.7
[50]	training's rmse: 31703.9	valid_1's rmse: 38312.5
[60]	training's rmse: 35873.6	valid_1's rmse: 41861.2
[70]	training's rmse: 37856.7	valid_1's rmse: 43576.7
[80]	training's rmse: 32842	valid_1's rmse: 39358.8
[90]	training's rmse: 32673.7	valid_1's rmse: 39143
[100]	training's rmse: 34719.2	valid_1's rmse: 40919.2
[110]	training's rmse: 41510.9	valid_1's rmse: 46710.7
[120]	training's rmse: 33257.7	valid_1's rmse: 39733.7
[130]	training's rmse: 30085.5	valid_1's rmse: 3

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1889
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52732	valid_1's rmse: 55277.8
[20]	training's rmse: 39961.4	valid_1's rmse: 44798.6
[30]	training's rmse: 32216.7	valid_1's rmse: 38880.4
[40]	training's rmse: 28822.5	valid_1's rmse: 36260.8
[50]	training's rmse: 31987.7	valid_1's rmse: 38713.9
[60]	training's rmse: 36145.8	valid_1's rmse: 42301.9
[70]	training's rmse: 38091.9	valid_1's rmse: 44036.1
[80]	training's rmse: 33071.2	valid_1's rmse: 39908.2
[90]	training's rmse: 32961.9	valid_1's rmse: 39743.2
[100]	training's rmse: 34995.6	valid_1's rmse: 41544.4
[110]	training's rmse: 41752.5	valid_1's rmse: 47187.6
[120]	training's rmse: 33446.1	valid_1's rmse: 40302.1
[130]	training's rmse: 30191.5	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2336
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52353.7	valid_1's rmse: 55102.9
[20]	training's rmse: 39432.3	valid_1's rmse: 44298.3
[30]	training's rmse: 31447.3	valid_1's rmse: 38035
[40]	training's rmse: 28022.9	valid_1's rmse: 35455.4
[50]	training's rmse: 31393.2	valid_1's rmse: 38048.2
[60]	training's rmse: 35700.9	valid_1's rmse: 41620.5
[70]	training's rmse: 37730.6	valid_1's rmse: 43389.2
[80]	training's rmse: 32722.2	valid_1's rmse: 39220.1
[90]	training's rmse: 32578.5	valid_1's rmse: 39038.4
[100]	training's rmse: 34654.7	valid_1's rmse: 40822.7
[110]	training's rmse: 41449.8	valid_1's rmse: 46603.2
[120]	training's rmse: 33070.9	valid_1's rmse: 39540.4
[130]	training's rmse: 29926.5	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2125
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52593.2	valid_1's rmse: 55094.6
[20]	training's rmse: 39632.8	valid_1's rmse: 44055.1
[30]	training's rmse: 31924.6	valid_1's rmse: 38122.7
[40]	training's rmse: 28673.9	valid_1's rmse: 35709
[50]	training's rmse: 31897	valid_1's rmse: 38231.9
[60]	training's rmse: 36035.1	valid_1's rmse: 41804.4
[70]	training's rmse: 37937.1	valid_1's rmse: 43454.3
[80]	training's rmse: 32950.1	valid_1's rmse: 39256.6
[90]	training's rmse: 32835.2	valid_1's rmse: 39123.3
[100]	training's rmse: 34872.4	valid_1's rmse: 40884.1
[110]	training's rmse: 41643.7	valid_1's rmse: 46660.4
[120]	training's rmse: 33414.5	valid_1's rmse: 39702.6
[130]	training's rmse: 30240.6	valid_1's rmse:

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1917
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52804.4	valid_1's rmse: 55299.3
[20]	training's rmse: 39904.5	valid_1's rmse: 44506.1
[30]	training's rmse: 32200	valid_1's rmse: 38371.3
[40]	training's rmse: 28774.4	valid_1's rmse: 35752.3
[50]	training's rmse: 32076	valid_1's rmse: 38374.6
[60]	training's rmse: 36234.5	valid_1's rmse: 41955.5
[70]	training's rmse: 38107.5	valid_1's rmse: 43625.8
[80]	training's rmse: 33112.3	valid_1's rmse: 39438.9
[90]	training's rmse: 33002.3	valid_1's rmse: 39269.8
[100]	training's rmse: 35115.3	valid_1's rmse: 41068.6
[110]	training's rmse: 41848.9	valid_1's rmse: 46836.7
[120]	training's rmse: 33515.8	valid_1's rmse: 39800.9
[130]	training's rmse: 30385.2	valid_1's rmse:

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2021
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 53344.3	valid_1's rmse: 55658.7
[20]	training's rmse: 40171.3	valid_1's rmse: 44531.4
[30]	training's rmse: 32697.7	valid_1's rmse: 38447.5
[40]	training's rmse: 29527	valid_1's rmse: 36008.5
[50]	training's rmse: 32754.7	valid_1's rmse: 38605.8
[60]	training's rmse: 36706.4	valid_1's rmse: 42079.5
[70]	training's rmse: 38606.9	valid_1's rmse: 43780.4
[80]	training's rmse: 33812.3	valid_1's rmse: 39702.8
[90]	training's rmse: 33536.4	valid_1's rmse: 39430
[100]	training's rmse: 35464	valid_1's rmse: 41104.5
[110]	training's rmse: 42148.8	valid_1's rmse: 46891.3
[120]	training's rmse: 34014.8	valid_1's rmse: 39898.5
[130]	training's rmse: 30906.8	valid_1's rmse: 3

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2127
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52419.8	valid_1's rmse: 54983.4
[20]	training's rmse: 39454.4	valid_1's rmse: 43966.7
[30]	training's rmse: 31622.8	valid_1's rmse: 37944.2
[40]	training's rmse: 28287.5	valid_1's rmse: 35383.6
[50]	training's rmse: 31669.5	valid_1's rmse: 38091.3
[60]	training's rmse: 35850.4	valid_1's rmse: 41660.8
[70]	training's rmse: 37808.3	valid_1's rmse: 43351.4
[80]	training's rmse: 32816.8	valid_1's rmse: 39146.3
[90]	training's rmse: 32641.9	valid_1's rmse: 38942.4
[100]	training's rmse: 34747.9	valid_1's rmse: 40734.6
[110]	training's rmse: 41551.5	valid_1's rmse: 46548
[120]	training's rmse: 33245.5	valid_1's rmse: 39482.7
[130]	training's rmse: 30081.4	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2590
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 53263.5	valid_1's rmse: 55892.9
[20]	training's rmse: 40150.9	valid_1's rmse: 44827.5
[30]	training's rmse: 32710.8	valid_1's rmse: 38817.8
[40]	training's rmse: 29728.5	valid_1's rmse: 36562.2
[50]	training's rmse: 32846.6	valid_1's rmse: 39094.7
[60]	training's rmse: 36823.1	valid_1's rmse: 42513.8
[70]	training's rmse: 38610.7	valid_1's rmse: 44053.8
[80]	training's rmse: 33757.9	valid_1's rmse: 39948.5
[90]	training's rmse: 33462.7	valid_1's rmse: 39657
[100]	training's rmse: 35408.4	valid_1's rmse: 41328.9
[110]	training's rmse: 42082.1	valid_1's rmse: 47092.2
[120]	training's rmse: 33933.5	valid_1's rmse: 40068.3
[130]	training's rmse: 30878.9	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1987
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52649.9	valid_1's rmse: 55121.9
[20]	training's rmse: 39660.7	valid_1's rmse: 44102.2
[30]	training's rmse: 31955.2	valid_1's rmse: 38053.6
[40]	training's rmse: 28616.1	valid_1's rmse: 35539.6
[50]	training's rmse: 31874.2	valid_1's rmse: 38110.6
[60]	training's rmse: 35990.4	valid_1's rmse: 41679.5
[70]	training's rmse: 37967.3	valid_1's rmse: 43406.2
[80]	training's rmse: 32976	valid_1's rmse: 39219.4
[90]	training's rmse: 33005.3	valid_1's rmse: 39163.9
[100]	training's rmse: 35052.4	valid_1's rmse: 40917.6
[110]	training's rmse: 41773.6	valid_1's rmse: 46673.8
[120]	training's rmse: 33371.1	valid_1's rmse: 39578.2
[130]	training's rmse: 30235.3	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2219
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 53014.2	valid_1's rmse: 55745.5
[20]	training's rmse: 39987.1	valid_1's rmse: 44873
[30]	training's rmse: 32437.6	valid_1's rmse: 38894
[40]	training's rmse: 29178.5	valid_1's rmse: 36344.6
[50]	training's rmse: 32347.5	valid_1's rmse: 38957.5
[60]	training's rmse: 36425.6	valid_1's rmse: 42457.4
[70]	training's rmse: 38311.3	valid_1's rmse: 44114.7
[80]	training's rmse: 33409.6	valid_1's rmse: 39973.6
[90]	training's rmse: 33293.4	valid_1's rmse: 39841.3
[100]	training's rmse: 35204.1	valid_1's rmse: 41485.7
[110]	training's rmse: 41895.9	valid_1's rmse: 47209.3
[120]	training's rmse: 33809.8	valid_1's rmse: 40286.5
[130]	training's rmse: 30671.5	valid_1's rmse:

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1953
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52716.1	valid_1's rmse: 55314.6
[20]	training's rmse: 39696.7	valid_1's rmse: 44268.6
[30]	training's rmse: 31921.5	valid_1's rmse: 38143.2
[40]	training's rmse: 28504	valid_1's rmse: 35514.2
[50]	training's rmse: 31844.9	valid_1's rmse: 38154.7
[60]	training's rmse: 36111.9	valid_1's rmse: 41824.5
[70]	training's rmse: 38089.3	valid_1's rmse: 43534.2
[80]	training's rmse: 33157	valid_1's rmse: 39372.1
[90]	training's rmse: 33035.6	valid_1's rmse: 39220.9
[100]	training's rmse: 35058.2	valid_1's rmse: 40989.5
[110]	training's rmse: 41781.6	valid_1's rmse: 46751.8
[120]	training's rmse: 33367.8	valid_1's rmse: 39634.8
[130]	training's rmse: 30209.9	valid_1's rmse:

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2013
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52921.3	valid_1's rmse: 55306.6
[20]	training's rmse: 40220.2	valid_1's rmse: 44482.2
[30]	training's rmse: 32443	valid_1's rmse: 38354.2
[40]	training's rmse: 29151.8	valid_1's rmse: 35781.9
[50]	training's rmse: 32325.9	valid_1's rmse: 38352.9
[60]	training's rmse: 36346.8	valid_1's rmse: 41822.9
[70]	training's rmse: 38278	valid_1's rmse: 43550.3
[80]	training's rmse: 33348	valid_1's rmse: 39403.8
[90]	training's rmse: 33169.8	valid_1's rmse: 39186
[100]	training's rmse: 35183.7	valid_1's rmse: 40936
[110]	training's rmse: 41895.1	valid_1's rmse: 46702.4
[120]	training's rmse: 33614.2	valid_1's rmse: 39644.9
[130]	training's rmse: 30532.4	valid_1's rmse: 37110

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1943
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52596.5	valid_1's rmse: 55161.7
[20]	training's rmse: 39768.1	valid_1's rmse: 44383.6
[30]	training's rmse: 32078.9	valid_1's rmse: 38239
[40]	training's rmse: 28538.5	valid_1's rmse: 35552.5
[50]	training's rmse: 31886.1	valid_1's rmse: 38224.7
[60]	training's rmse: 36134.8	valid_1's rmse: 41864.7
[70]	training's rmse: 38048.1	valid_1's rmse: 43549.7
[80]	training's rmse: 32971.4	valid_1's rmse: 39241.8
[90]	training's rmse: 32874.1	valid_1's rmse: 39106.6
[100]	training's rmse: 35003.9	valid_1's rmse: 40954.5
[110]	training's rmse: 41739.6	valid_1's rmse: 46711.5
[120]	training's rmse: 33387.6	valid_1's rmse: 39664.4
[130]	training's rmse: 30088.8	valid_1's rms

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2582
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52633.9	valid_1's rmse: 55376.5
[20]	training's rmse: 39691.6	valid_1's rmse: 44504.5
[30]	training's rmse: 31897.2	valid_1's rmse: 38454.6
[40]	training's rmse: 28565.4	valid_1's rmse: 35888.8
[50]	training's rmse: 31846.7	valid_1's rmse: 38492.2
[60]	training's rmse: 35978	valid_1's rmse: 41989.7
[70]	training's rmse: 37922	valid_1's rmse: 43678.4
[80]	training's rmse: 32973	valid_1's rmse: 39548
[90]	training's rmse: 32823.7	valid_1's rmse: 39351.6
[100]	training's rmse: 34945.8	valid_1's rmse: 41178.9
[110]	training's rmse: 41683.4	valid_1's rmse: 46924.2
[120]	training's rmse: 33444.1	valid_1's rmse: 39908.7
[130]	training's rmse: 30322.7	valid_1's rmse: 373

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2328
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52436.9	valid_1's rmse: 55180.1
[20]	training's rmse: 39607.3	valid_1's rmse: 44359.8
[30]	training's rmse: 31699.6	valid_1's rmse: 38144.4
[40]	training's rmse: 28190.1	valid_1's rmse: 35528.7
[50]	training's rmse: 31569.4	valid_1's rmse: 38125.1
[60]	training's rmse: 35734	valid_1's rmse: 41638.8
[70]	training's rmse: 37760.9	valid_1's rmse: 43408.1
[80]	training's rmse: 32760.7	valid_1's rmse: 39193.7
[90]	training's rmse: 32744.5	valid_1's rmse: 39138.2
[100]	training's rmse: 34752.8	valid_1's rmse: 40899.3
[110]	training's rmse: 41517	valid_1's rmse: 46687.1
[120]	training's rmse: 33157.2	valid_1's rmse: 39611.1
[130]	training's rmse: 29988.6	valid_1's rmse:

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2071
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 53247.3	valid_1's rmse: 55813.5
[20]	training's rmse: 39972.7	valid_1's rmse: 44717.6
[30]	training's rmse: 32609.3	valid_1's rmse: 38826.5
[40]	training's rmse: 29571.8	valid_1's rmse: 36562.9
[50]	training's rmse: 32739.6	valid_1's rmse: 39111.2
[60]	training's rmse: 36741	valid_1's rmse: 42553.3
[70]	training's rmse: 38500.4	valid_1's rmse: 44056
[80]	training's rmse: 33614.5	valid_1's rmse: 39920.8
[90]	training's rmse: 33463.9	valid_1's rmse: 39757.7
[100]	training's rmse: 35439.9	valid_1's rmse: 41459.7
[110]	training's rmse: 42069.8	valid_1's rmse: 47182.7
[120]	training's rmse: 33879.1	valid_1's rmse: 40143
[130]	training's rmse: 30886.2	valid_1's rmse: 3

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1959
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 53092.9	valid_1's rmse: 55568.4
[20]	training's rmse: 40168.3	valid_1's rmse: 44571.1
[30]	training's rmse: 32540.2	valid_1's rmse: 38499.3
[40]	training's rmse: 29281.3	valid_1's rmse: 35998
[50]	training's rmse: 32447.8	valid_1's rmse: 38534.9
[60]	training's rmse: 36452.2	valid_1's rmse: 42021.7
[70]	training's rmse: 38361	valid_1's rmse: 43723.3
[80]	training's rmse: 33437.5	valid_1's rmse: 39553.5
[90]	training's rmse: 33279.9	valid_1's rmse: 39395.8
[100]	training's rmse: 35254.9	valid_1's rmse: 41090.5
[110]	training's rmse: 41963	valid_1's rmse: 46851.6
[120]	training's rmse: 33809.6	valid_1's rmse: 39897.2
[130]	training's rmse: 30721.9	valid_1's rmse: 3

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1921
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52789	valid_1's rmse: 55293.4
[20]	training's rmse: 40045.2	valid_1's rmse: 44583
[30]	training's rmse: 32292	valid_1's rmse: 38391.3
[40]	training's rmse: 28808.9	valid_1's rmse: 35766
[50]	training's rmse: 32118.2	valid_1's rmse: 38378.9
[60]	training's rmse: 36196.9	valid_1's rmse: 41874
[70]	training's rmse: 38178.7	valid_1's rmse: 43658.5
[80]	training's rmse: 33174.7	valid_1's rmse: 39484
[90]	training's rmse: 33048.1	valid_1's rmse: 39354.4
[100]	training's rmse: 35092.3	valid_1's rmse: 41128.6
[110]	training's rmse: 41807.1	valid_1's rmse: 46849
[120]	training's rmse: 33478.4	valid_1's rmse: 39817.9
[130]	training's rmse: 30324.2	valid_1's rmse: 37235.1
[

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2059
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52733.1	valid_1's rmse: 55487.4
[20]	training's rmse: 39743.4	valid_1's rmse: 44548.8
[30]	training's rmse: 32019.6	valid_1's rmse: 38575.8
[40]	training's rmse: 28653.6	valid_1's rmse: 35966.7
[50]	training's rmse: 31957.7	valid_1's rmse: 38598.8
[60]	training's rmse: 36017.1	valid_1's rmse: 42089.6
[70]	training's rmse: 38027.1	valid_1's rmse: 43872.9
[80]	training's rmse: 33039.7	valid_1's rmse: 39708.8
[90]	training's rmse: 32954.4	valid_1's rmse: 39581.2
[100]	training's rmse: 35053.3	valid_1's rmse: 41398.2
[110]	training's rmse: 41781.1	valid_1's rmse: 47100.1
[120]	training's rmse: 33453.7	valid_1's rmse: 40132.5
[130]	training's rmse: 30350.6	valid_1's r

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2308
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 52336	valid_1's rmse: 55127.3
[20]	training's rmse: 39409.2	valid_1's rmse: 44293.2
[30]	training's rmse: 31573	valid_1's rmse: 38155.4
[40]	training's rmse: 28086.9	valid_1's rmse: 35536.4
[50]	training's rmse: 31445.4	valid_1's rmse: 38130.5
[60]	training's rmse: 35695.5	valid_1's rmse: 41693.2
[70]	training's rmse: 37677.2	valid_1's rmse: 43387.8
[80]	training's rmse: 32693.4	valid_1's rmse: 39223
[90]	training's rmse: 32659.7	valid_1's rmse: 39134.5
[100]	training's rmse: 34691.5	valid_1's rmse: 40899.7
[110]	training's rmse: 41471.8	valid_1's rmse: 46674.1
[120]	training's rmse: 33106.4	valid_1's rmse: 39612.1
[130]	training's rmse: 29950.9	valid_1's rmse: 3

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2370
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 53325	valid_1's rmse: 55920.6
[20]	training's rmse: 40038.4	valid_1's rmse: 44726.1
[30]	training's rmse: 32673.9	valid_1's rmse: 38847.4
[40]	training's rmse: 29628.3	valid_1's rmse: 36551.3
[50]	training's rmse: 32769.6	valid_1's rmse: 39019.7
[60]	training's rmse: 36727.5	valid_1's rmse: 42396.7
[70]	training's rmse: 38577	valid_1's rmse: 44050.6
[80]	training's rmse: 33678.6	valid_1's rmse: 39872.4
[90]	training's rmse: 33507.7	valid_1's rmse: 39686.2
[100]	training's rmse: 35475.4	valid_1's rmse: 41384.7
[110]	training's rmse: 42126.7	valid_1's rmse: 47139.4
[120]	training's rmse: 34010.9	valid_1's rmse: 40131.5
[130]	training's rmse: 30945.1	valid_1's rmse:

New categorical_feature is ['D', 'K', 'L', 'S', '所在地', '最寄り駅']


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2077
[LightGBM] [Info] Number of data points in the train set: 22029, number of used features: 13
[LightGBM] [Info] Start training from score 118651.337373
[10]	training's rmse: 53080.9	valid_1's rmse: 55694.8
[20]	training's rmse: 39947	valid_1's rmse: 44638.4
[30]	training's rmse: 32477.7	valid_1's rmse: 38729.2
[40]	training's rmse: 29362.8	valid_1's rmse: 36385.8
[50]	training's rmse: 32500.6	valid_1's rmse: 38863.3
[60]	training's rmse: 36487.5	valid_1's rmse: 42346.8
[70]	training's rmse: 38336	valid_1's rmse: 43954.6
[80]	training's rmse: 33377.6	valid_1's rmse: 39769.6
[90]	training's rmse: 33281.7	valid_1's rmse: 39681.5
[100]	training's rmse: 35221.6	valid_1's rmse: 41331
[110]	training's rmse: 41954.5	valid_1's rmse: 47101.5
[120]	training's rmse: 33843.6	valid_1's rmse: 40148.1
[130]	training's rmse: 30755.1	valid_1's rmse: 3

チューニングの結果を実際に使う。

In [None]:
X_train = pd.concat([house_age, area_size, contract_span, floor_scores, Floor_scores, stations, minits, addresses, room_arrange], axis=1)
y_train = rent

X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0)

X_test = pd.concat([test_house_age, test_area_size, test_contract_span, test_floor_scores, test_Floor_scores, test_stations, test_minits, test_addresses, test_room_arrange], axis=1)

category_lists = ['最寄り駅', '所在地', 'L', 'D', 'K', 'S']
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train)

params = {
    'objective':'regression',
    'boosting_type':'rf',#ランダムフォレストを使う
    'metrics':'rmse',
    'bagging_freq': 10,
    'bagging_fraction': 0.5,
    'reg_lambda': 3.681194978110037e-06,
    'max_bin': 522,
    'num_leaves': 124
}

model = lgb.train(
                    params,
                    lgb_train, 
                    valid_sets=[lgb_train, lgb_eval], 
                    verbose_eval=10, 
                    num_boost_round=3000, 
                    early_stopping_rounds=10,
                    categorical_feature = category_lists
                    )

y_pred = model.predict(X_test, num_iteration=model.best_iteration)