# train.ipynb: 用所有的训练集训练决策树
这里需要用到预处理过的数据，以减少代码运算时间。请先跑`waveform.py`。

In [6]:
import multiprocessing
import numpy as np
import h5py
import matplotlib.pyplot as plt
%matplotlib inline
from utils import loadData, getNum, getPePerWF, saveData, lossfunc_train, lossfunc_eval
from tqdm import tqdm # 展示进度条
import lightgbm as lgb # 训练决策树使用的包

`waveform.py`预处理后的数据在./train中，文件名格式为`f"final_{i}_wf.h5"`

In [7]:
trainPathRoot = './data/final-'
processedPathRoot = './train/final_'

从预处理的数据中，读取训练两个训练集需要的原始数据，并将18个训练集的数据拼接起来。
1. `intTrainWF`: 波形积分
2. `pointsPerTrainWF`: 超过阈值的点数
3. `pePerTrainWFCalc`: 每个波形对应的PE数计算值
4. `meanPeTimePerTrainWF`: 每个波形对应的PETime的手算平均值
5. `pePerTrainWF`: 每个波形对应的PE数真值
6. `wfIndices`: (n+1,)的ndarray，下标为i表示EventID=i的第一次出现在Waveform表中的下标，下标为n表示Waveform表长度

In [8]:
intTrainWF = np.array([], dtype='<i4') # 读取波形积分
pointsPerTrainWF = np.array([], dtype='<i2') # 读取超过阈值的点数
pePerTrainWFCalc = np.array([], dtype='<i2') # 读取每个波形对应的PE数计算值
meanPeTimePerTrainWF = np.array([], dtype='<f8') # 读取每个波形对应的PETime的手算平均值
pePerTrainWF = np.array([], dtype='<i2') # 读取每个波形对应的PE数真值

wfIndices = np.array([0], dtype=int)
p = np.array([], dtype='<f8')

previousIndex = 0
for i in tqdm(range(2, 20)):
    with h5py.File(f"{processedPathRoot}{i}_wf.h5", 'r') as ipt:
        intTrainWF = np.append(intTrainWF, ipt['Waveform']['intWF'][...])
        pointsPerTrainWF = np.append(pointsPerTrainWF, ipt['Waveform']['pointsPerWF'][...])
        pePerTrainWFCalc = np.append(pePerTrainWFCalc, ipt['Waveform']['pePerWFCalc'][...])
        meanPeTimePerTrainWF = np.append(meanPeTimePerTrainWF, ipt['Waveform']['meanPeTimePerWF'][...])
        pePerTrainWF = np.append(pePerTrainWF, ipt['Waveform']['pePerWF'][...])
    
    trainPET, trainWF, trainPT = loadData(f"./data/final-{i}.h5", 'PT')
    numPEW, wfFakeIndices = getNum(trainWF)
    
    wfIndices = np.append(wfIndices, np.zeros(trainPT.shape[0]))
    p = np.append(p, np.zeros(trainPT.shape[0]))
    
    finalIndex = previousIndex + trainPT.shape[0] + 1
    
    wfIndices[previousIndex:finalIndex] = wfFakeIndices + wfIndices[previousIndex]
    
    p[previousIndex:(finalIndex-1)] = trainPT['p']
    
    previousIndex = finalIndex-1

  0%|          | 0/18 [00:00<?, ?it/s]

Structure of data:
<HDF5 dataset "PETruth": shape (9137011,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6041164,), type "|V2008"> Waveform /Waveform


  6%|▌         | 1/18 [01:10<20:06, 70.96s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9222643,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6021458,), type "|V2008"> Waveform /Waveform


 11%|█         | 2/18 [02:35<21:00, 78.76s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9009875,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (5987414,), type "|V2008"> Waveform /Waveform


 17%|█▋        | 3/18 [03:47<18:58, 75.88s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9147704,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6004696,), type "|V2008"> Waveform /Waveform


 22%|██▏       | 4/18 [05:00<17:25, 74.66s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9262473,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6115519,), type "|V2008"> Waveform /Waveform


 28%|██▊       | 5/18 [06:15<16:10, 74.67s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9216927,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6075119,), type "|V2008"> Waveform /Waveform


 33%|███▎      | 6/18 [07:22<14:27, 72.27s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9103034,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (5990370,), type "|V2008"> Waveform /Waveform


 39%|███▉      | 7/18 [08:30<12:57, 70.67s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9139087,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (5965634,), type "|V2008"> Waveform /Waveform


 44%|████▍     | 8/18 [09:37<11:35, 69.51s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9067611,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (1999,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (5965655,), type "|V2008"> Waveform /Waveform


 50%|█████     | 9/18 [10:44<10:20, 68.98s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9219415,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6039820,), type "|V2008"> Waveform /Waveform


 56%|█████▌    | 10/18 [11:52<09:07, 68.47s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9141287,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (5995751,), type "|V2008"> Waveform /Waveform


 61%|██████    | 11/18 [12:59<07:56, 68.04s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9157071,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (5984516,), type "|V2008"> Waveform /Waveform


 67%|██████▋   | 12/18 [14:08<06:50, 68.42s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9191741,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (1998,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6020168,), type "|V2008"> Waveform /Waveform


 72%|███████▏  | 13/18 [15:16<05:41, 68.34s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9143911,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6038985,), type "|V2008"> Waveform /Waveform


 78%|███████▊  | 14/18 [16:24<04:32, 68.17s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9219236,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (1999,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6039233,), type "|V2008"> Waveform /Waveform


 83%|████████▎ | 15/18 [17:34<03:25, 68.64s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9104863,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (5981409,), type "|V2008"> Waveform /Waveform


 89%|████████▉ | 16/18 [18:39<02:15, 67.66s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9202609,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6058586,), type "|V2008"> Waveform /Waveform


 94%|█████████▍| 17/18 [19:49<01:08, 68.18s/it]

Structure of data:
<HDF5 dataset "PETruth": shape (9261569,), type "|V20"> PETruth /PETruth
<HDF5 dataset "ParticleTruth": shape (2000,), type "|V40"> ParticleTruth /ParticleTruth
<HDF5 dataset "Waveform": shape (6085040,), type "|V2008"> Waveform /Waveform


100%|██████████| 18/18 [21:03<00:00, 70.20s/it]


下面的代码和`model.ipynb`是完全一样的，除了训练集、验证集大小不一样。

In [9]:
trainDataForPePerWF = lgb.Dataset(
    np.stack(
        (intTrainWF[:-10000000], pointsPerTrainWF[:-10000000], pePerTrainWFCalc[:-10000000]),
        axis=1
    ),
    label=pePerTrainWF[:-10000000],
)
validationDataForPePerWF = lgb.Dataset(
    np.stack(
        (intTrainWF[-10000000:], pointsPerTrainWF[-10000000:], pePerTrainWFCalc[-10000000:]),
        axis=1
    ),
    label=pePerTrainWF[-10000000:],
    reference=trainDataForPePerWF,
)
trainDataForPePerWF.save_binary('./train/trainPePerWF.bin')
validationDataForPePerWF.save_binary('./train/validPePerWF.bin')

[LightGBM] [Info] Saving data to binary file ./train/trainPePerWF.bin




[LightGBM] [Info] Saving data to binary file ./train/validPePerWF.bin


<lightgbm.basic.Dataset at 0x7ff450631be0>

In [10]:
trainDataForPePerWF = lgb.Dataset('./train/trainPePerWF.bin')
validationDataForPePerWF = lgb.Dataset('./train/validPePerWF.bin', reference=trainDataForPePerWF)
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': {'rmse'},
    'num_leaves': 2**11,
    'learning_rate': 0.01,
    'feature_fraction': 1,
    'bagging_fraction': 1,
    'bagging_freq': 5,
    'verbose': 0,
    'num_threads': 20,
    'max_depth': 20,
}
gbmForPePerWF = lgb.train(
    params,
    trainDataForPePerWF,
    num_boost_round=3000,
    valid_sets=validationDataForPePerWF,
    early_stopping_rounds=100,
)
gbmForPePerWF.save_model('modelPePerWF.txt')

You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[1]	valid_0's rmse: 1.82572
Training until validation scores don't improve for 100 rounds
[2]	valid_0's rmse: 1.80789
[3]	valid_0's rmse: 1.79023
[4]	valid_0's rmse: 1.77276
[5]	valid_0's rmse: 1.75547
[6]	valid_0's rmse: 1.73836
[7]	valid_0's rmse: 1.72142
[8]	valid_0's rmse: 1.70466
[9]	valid_0's rmse: 1.68806
[10]	valid_0's rmse: 1.67164
[11]	valid_0's rmse: 1.65539
[12]	valid_0's rmse: 1.6393
[13]	valid_0's rmse: 1.62338
[14]	valid_0's rmse: 1.60763
[15]	valid_0's rmse: 1.59203
[16]	valid_0's rmse: 1.5766
[17]	valid_0's rmse: 1.56132
[18]	valid_0's rmse: 1.54621
[19]	valid_0's rmse: 1.53124
[20]	valid_0's rmse: 1.51644
[21]	valid_0's rmse: 1.50178
[22]	valid_0's rmse: 1.48728
[23]	valid_0's rmse: 1.47293
[24]	valid_0's rmse: 1.45873
[25]	valid_0's rmse: 1.44467
[26]	valid_0's rmse: 1.43076
[27]	valid_0's rmse: 1.41699
[28]	valid_0's rmse: 1.40337
[29]	valid_0's 

[263]	valid_0's rmse: 0.306831
[264]	valid_0's rmse: 0.306286
[265]	valid_0's rmse: 0.305752
[266]	valid_0's rmse: 0.305227
[267]	valid_0's rmse: 0.304712
[268]	valid_0's rmse: 0.304206
[269]	valid_0's rmse: 0.303709
[270]	valid_0's rmse: 0.303222
[271]	valid_0's rmse: 0.302743
[272]	valid_0's rmse: 0.302273
[273]	valid_0's rmse: 0.301812
[274]	valid_0's rmse: 0.30136
[275]	valid_0's rmse: 0.300916
[276]	valid_0's rmse: 0.30048
[277]	valid_0's rmse: 0.300052
[278]	valid_0's rmse: 0.299632
[279]	valid_0's rmse: 0.29922
[280]	valid_0's rmse: 0.298815
[281]	valid_0's rmse: 0.298418
[282]	valid_0's rmse: 0.298029
[283]	valid_0's rmse: 0.297646
[284]	valid_0's rmse: 0.297271
[285]	valid_0's rmse: 0.296903
[286]	valid_0's rmse: 0.296542
[287]	valid_0's rmse: 0.296188
[288]	valid_0's rmse: 0.29584
[289]	valid_0's rmse: 0.295498
[290]	valid_0's rmse: 0.295164
[291]	valid_0's rmse: 0.294835
[292]	valid_0's rmse: 0.294513
[293]	valid_0's rmse: 0.294197
[294]	valid_0's rmse: 0.293886
[295]	valid_

[529]	valid_0's rmse: 0.278409
[530]	valid_0's rmse: 0.278406
[531]	valid_0's rmse: 0.278404
[532]	valid_0's rmse: 0.278402
[533]	valid_0's rmse: 0.278399
[534]	valid_0's rmse: 0.278397
[535]	valid_0's rmse: 0.278395
[536]	valid_0's rmse: 0.278392
[537]	valid_0's rmse: 0.27839
[538]	valid_0's rmse: 0.278388
[539]	valid_0's rmse: 0.278386
[540]	valid_0's rmse: 0.278384
[541]	valid_0's rmse: 0.278382
[542]	valid_0's rmse: 0.27838
[543]	valid_0's rmse: 0.278378
[544]	valid_0's rmse: 0.278376
[545]	valid_0's rmse: 0.278374
[546]	valid_0's rmse: 0.278373
[547]	valid_0's rmse: 0.278371
[548]	valid_0's rmse: 0.278369
[549]	valid_0's rmse: 0.278368
[550]	valid_0's rmse: 0.278366
[551]	valid_0's rmse: 0.278364
[552]	valid_0's rmse: 0.278363
[553]	valid_0's rmse: 0.278361
[554]	valid_0's rmse: 0.27836
[555]	valid_0's rmse: 0.278358
[556]	valid_0's rmse: 0.278357
[557]	valid_0's rmse: 0.278356
[558]	valid_0's rmse: 0.278354
[559]	valid_0's rmse: 0.278353
[560]	valid_0's rmse: 0.278352
[561]	valid

<lightgbm.basic.Booster at 0x7ff450631190>

In [13]:
gbmForPePerWF = lgb.Booster(model_file='./modelPePerWF.txt')
pePerTrainWFFinal = gbmForPePerWF.predict(
    np.stack(
        (intTrainWF, pointsPerTrainWF, pePerTrainWFCalc),
        axis=1
    )
)

In [14]:
splitPePerTrainWFFinal = np.split(pePerTrainWFFinal, wfIndices[1:-1].astype(int))
peTotal = np.empty(p.shape[0])
peMean = np.empty(p.shape[0])
peStd = np.empty(p.shape[0])
for index, pePerTrainWFFinalChunk in enumerate(tqdm(splitPePerTrainWFFinal)):
    peTotal[index] = np.sum(pePerTrainWFFinalChunk)
    peMean[index] = np.mean(pePerTrainWFFinalChunk)
    peStd[index] = np.std(pePerTrainWFFinalChunk)

splitMeanPeTimePerTrainWF = np.split(meanPeTimePerTrainWF, wfIndices[1:-1].astype(int))
peTimeMean = np.empty(p.shape[0])
peTimeStd = np.empty(p.shape[0])
for index, meanPeTimePerTrainWFFinalChunk in enumerate(tqdm(splitMeanPeTimePerTrainWF)):
    peTimeMean[index] = np.nanmean(meanPeTimePerTrainWFFinalChunk)
    peTimeStd[index] = np.nanstd(meanPeTimePerTrainWFFinalChunk)

100%|██████████| 35996/35996 [00:02<00:00, 14626.48it/s]
100%|██████████| 35996/35996 [00:06<00:00, 5542.83it/s]


In [15]:
trainDataForP = lgb.Dataset(
    np.stack(
        (peTotal[:-3600], peMean[:-3600], peStd[:-3600], peTimeMean[:-3600], peTimeStd[:-3600]),
        axis=1
    ),
    label=p[:-3600]
)
validationDataForP = lgb.Dataset(
    np.stack(
        (peTotal[-3600:], peMean[-3600:], peStd[-3600:], peTimeMean[-3600:], peTimeStd[-3600:]),
        axis=1
    ),
    label=p[-3600:],
    reference=trainDataForP
)
trainDataForP.save_binary('./train/trainDataForP.bin')
validationDataForP.save_binary('./train/validDataForP.bin')

You can set `force_col_wise=true` to remove the overhead.
[1]	valid_0's rmse: 6.14646	valid_0's custom: 5.37767
Training until validation scores don't improve for 300 rounds
[2]	valid_0's rmse: 6.08513	valid_0's custom: 5.27082
[3]	valid_0's rmse: 6.02441	valid_0's custom: 5.16609
[4]	valid_0's rmse: 5.96432	valid_0's custom: 5.06349
[5]	valid_0's rmse: 5.9048	valid_0's custom: 4.96287
[6]	valid_0's rmse: 5.84589	valid_0's custom: 4.86428
[7]	valid_0's rmse: 5.78757	valid_0's custom: 4.76766
[8]	valid_0's rmse: 5.72982	valid_0's custom: 4.67294
[9]	valid_0's rmse: 5.67268	valid_0's custom: 4.58013
[10]	valid_0's rmse: 5.61609	valid_0's custom: 4.48915
[11]	valid_0's rmse: 5.56008	valid_0's custom: 4.39998
[12]	valid_0's rmse: 5.50461	valid_0's custom: 4.31259
[13]	valid_0's rmse: 5.44972	valid_0's custom: 4.22694
[14]	valid_0's rmse: 5.39536	valid_0's custom: 4.14299
[15]	valid_0's rmse: 5.34156	valid_0's custom: 4.06072
[16]	valid_0's rmse: 5.28828	valid_0's custom: 3.98007
[17]	valid

[67]	valid_0's rmse: 3.17429	valid_0's custom: 1.43294
[68]	valid_0's rmse: 3.14274	valid_0's custom: 1.40458
[69]	valid_0's rmse: 3.11152	valid_0's custom: 1.37679
[70]	valid_0's rmse: 3.0806	valid_0's custom: 1.34954
[71]	valid_0's rmse: 3.04999	valid_0's custom: 1.32284
[72]	valid_0's rmse: 3.0197	valid_0's custom: 1.29668
[73]	valid_0's rmse: 2.98972	valid_0's custom: 1.27103
[74]	valid_0's rmse: 2.96002	valid_0's custom: 1.24589
[75]	valid_0's rmse: 2.93062	valid_0's custom: 1.22124
[76]	valid_0's rmse: 2.90152	valid_0's custom: 1.19709
[77]	valid_0's rmse: 2.87271	valid_0's custom: 1.17342
[78]	valid_0's rmse: 2.84418	valid_0's custom: 1.15021
[79]	valid_0's rmse: 2.81596	valid_0's custom: 1.12748
[80]	valid_0's rmse: 2.78801	valid_0's custom: 1.1052
[81]	valid_0's rmse: 2.76037	valid_0's custom: 1.08337
[82]	valid_0's rmse: 2.73297	valid_0's custom: 1.06195
[83]	valid_0's rmse: 2.70587	valid_0's custom: 1.04098
[84]	valid_0's rmse: 2.67904	valid_0's custom: 1.02042
[85]	valid_0'

[131]	valid_0's rmse: 1.68097	valid_0's custom: 0.401586
[132]	valid_0's rmse: 1.6645	valid_0's custom: 0.393751
[133]	valid_0's rmse: 1.6482	valid_0's custom: 0.386079
[134]	valid_0's rmse: 1.63206	valid_0's custom: 0.378556
[135]	valid_0's rmse: 1.61608	valid_0's custom: 0.371179
[136]	valid_0's rmse: 1.60028	valid_0's custom: 0.363959
[137]	valid_0's rmse: 1.58464	valid_0's custom: 0.35688
[138]	valid_0's rmse: 1.56915	valid_0's custom: 0.349939
[139]	valid_0's rmse: 1.55382	valid_0's custom: 0.343137
[140]	valid_0's rmse: 1.53865	valid_0's custom: 0.336471
[141]	valid_0's rmse: 1.52363	valid_0's custom: 0.329933
[142]	valid_0's rmse: 1.50877	valid_0's custom: 0.323532
[143]	valid_0's rmse: 1.49405	valid_0's custom: 0.31725
[144]	valid_0's rmse: 1.47948	valid_0's custom: 0.311098
[145]	valid_0's rmse: 1.46508	valid_0's custom: 0.305074
[146]	valid_0's rmse: 1.45083	valid_0's custom: 0.299168
[147]	valid_0's rmse: 1.4367	valid_0's custom: 0.293378
[148]	valid_0's rmse: 1.42273	valid_

[194]	valid_0's rmse: 0.914842	valid_0's custom: 0.119214
[195]	valid_0's rmse: 0.906306	valid_0's custom: 0.11701
[196]	valid_0's rmse: 0.897872	valid_0's custom: 0.114854
[197]	valid_0's rmse: 0.889514	valid_0's custom: 0.112737
[198]	valid_0's rmse: 0.881238	valid_0's custom: 0.110661
[199]	valid_0's rmse: 0.873059	valid_0's custom: 0.108628
[200]	valid_0's rmse: 0.864965	valid_0's custom: 0.106637
[201]	valid_0's rmse: 0.856962	valid_0's custom: 0.104685
[202]	valid_0's rmse: 0.849037	valid_0's custom: 0.102771
[203]	valid_0's rmse: 0.841207	valid_0's custom: 0.100897
[204]	valid_0's rmse: 0.833452	valid_0's custom: 0.0990586
[205]	valid_0's rmse: 0.825783	valid_0's custom: 0.0972576
[206]	valid_0's rmse: 0.818199	valid_0's custom: 0.0954926
[207]	valid_0's rmse: 0.810694	valid_0's custom: 0.0937629
[208]	valid_0's rmse: 0.803277	valid_0's custom: 0.0920691
[209]	valid_0's rmse: 0.795936	valid_0's custom: 0.0904074
[210]	valid_0's rmse: 0.788674	valid_0's custom: 0.0887804
[211]	va

[258]	valid_0's rmse: 0.520331	valid_0's custom: 0.0392165
[259]	valid_0's rmse: 0.516173	valid_0's custom: 0.0386117
[260]	valid_0's rmse: 0.512061	valid_0's custom: 0.038017
[261]	valid_0's rmse: 0.508008	valid_0's custom: 0.0374348
[262]	valid_0's rmse: 0.504003	valid_0's custom: 0.0368658
[263]	valid_0's rmse: 0.500043	valid_0's custom: 0.0363065
[264]	valid_0's rmse: 0.496133	valid_0's custom: 0.0357598
[265]	valid_0's rmse: 0.492261	valid_0's custom: 0.0352225
[266]	valid_0's rmse: 0.488441	valid_0's custom: 0.0346956
[267]	valid_0's rmse: 0.484675	valid_0's custom: 0.0341811
[268]	valid_0's rmse: 0.480944	valid_0's custom: 0.0336755
[269]	valid_0's rmse: 0.477265	valid_0's custom: 0.0331823
[270]	valid_0's rmse: 0.473621	valid_0's custom: 0.0326954
[271]	valid_0's rmse: 0.470036	valid_0's custom: 0.0322208
[272]	valid_0's rmse: 0.466505	valid_0's custom: 0.0317559
[273]	valid_0's rmse: 0.463012	valid_0's custom: 0.0313012
[274]	valid_0's rmse: 0.459552	valid_0's custom: 0.030852

[323]	valid_0's rmse: 0.335617	valid_0's custom: 0.0170484
[324]	valid_0's rmse: 0.333868	valid_0's custom: 0.0168861
[325]	valid_0's rmse: 0.332145	valid_0's custom: 0.016726
[326]	valid_0's rmse: 0.330442	valid_0's custom: 0.0165705
[327]	valid_0's rmse: 0.328758	valid_0's custom: 0.0164174
[328]	valid_0's rmse: 0.327115	valid_0's custom: 0.0162685
[329]	valid_0's rmse: 0.325497	valid_0's custom: 0.016123
[330]	valid_0's rmse: 0.323895	valid_0's custom: 0.0159781
[331]	valid_0's rmse: 0.322318	valid_0's custom: 0.0158372
[332]	valid_0's rmse: 0.320763	valid_0's custom: 0.015699
[333]	valid_0's rmse: 0.319231	valid_0's custom: 0.015564
[334]	valid_0's rmse: 0.317719	valid_0's custom: 0.0154318
[335]	valid_0's rmse: 0.316242	valid_0's custom: 0.0153021
[336]	valid_0's rmse: 0.314773	valid_0's custom: 0.015173
[337]	valid_0's rmse: 0.313336	valid_0's custom: 0.0150486
[338]	valid_0's rmse: 0.311928	valid_0's custom: 0.0149269
[339]	valid_0's rmse: 0.310527	valid_0's custom: 0.0148062
[3

[388]	valid_0's rmse: 0.264006	valid_0's custom: 0.011128
[389]	valid_0's rmse: 0.263416	valid_0's custom: 0.011086
[390]	valid_0's rmse: 0.262815	valid_0's custom: 0.0110428
[391]	valid_0's rmse: 0.262234	valid_0's custom: 0.0110006
[392]	valid_0's rmse: 0.261681	valid_0's custom: 0.0109631
[393]	valid_0's rmse: 0.261132	valid_0's custom: 0.0109261
[394]	valid_0's rmse: 0.260576	valid_0's custom: 0.0108866
[395]	valid_0's rmse: 0.260039	valid_0's custom: 0.010851
[396]	valid_0's rmse: 0.259516	valid_0's custom: 0.0108163
[397]	valid_0's rmse: 0.259003	valid_0's custom: 0.0107822
[398]	valid_0's rmse: 0.258483	valid_0's custom: 0.0107486
[399]	valid_0's rmse: 0.258004	valid_0's custom: 0.0107168
[400]	valid_0's rmse: 0.257508	valid_0's custom: 0.0106845
[401]	valid_0's rmse: 0.257038	valid_0's custom: 0.0106535
[402]	valid_0's rmse: 0.256555	valid_0's custom: 0.0106221
[403]	valid_0's rmse: 0.256089	valid_0's custom: 0.0105916
[404]	valid_0's rmse: 0.255626	valid_0's custom: 0.0105618


[453]	valid_0's rmse: 0.241513	valid_0's custom: 0.00963353
[454]	valid_0's rmse: 0.241358	valid_0's custom: 0.0096235
[455]	valid_0's rmse: 0.241206	valid_0's custom: 0.00961357
[456]	valid_0's rmse: 0.241054	valid_0's custom: 0.00960376
[457]	valid_0's rmse: 0.240878	valid_0's custom: 0.00959258
[458]	valid_0's rmse: 0.240726	valid_0's custom: 0.00958306
[459]	valid_0's rmse: 0.240554	valid_0's custom: 0.00957217
[460]	valid_0's rmse: 0.24042	valid_0's custom: 0.00956396
[461]	valid_0's rmse: 0.240275	valid_0's custom: 0.00955509
[462]	valid_0's rmse: 0.240138	valid_0's custom: 0.00954658
[463]	valid_0's rmse: 0.239991	valid_0's custom: 0.00953766
[464]	valid_0's rmse: 0.239863	valid_0's custom: 0.00952906
[465]	valid_0's rmse: 0.239709	valid_0's custom: 0.00951874
[466]	valid_0's rmse: 0.23959	valid_0's custom: 0.00951086
[467]	valid_0's rmse: 0.239478	valid_0's custom: 0.00950391
[468]	valid_0's rmse: 0.239352	valid_0's custom: 0.00949513
[469]	valid_0's rmse: 0.239226	valid_0's cu

[515]	valid_0's rmse: 0.235626	valid_0's custom: 0.00926532
[516]	valid_0's rmse: 0.235584	valid_0's custom: 0.00926307
[517]	valid_0's rmse: 0.235546	valid_0's custom: 0.00926078
[518]	valid_0's rmse: 0.235512	valid_0's custom: 0.00925916
[519]	valid_0's rmse: 0.235492	valid_0's custom: 0.00925786
[520]	valid_0's rmse: 0.235459	valid_0's custom: 0.00925646
[521]	valid_0's rmse: 0.235428	valid_0's custom: 0.00925566
[522]	valid_0's rmse: 0.235393	valid_0's custom: 0.00925414
[523]	valid_0's rmse: 0.235366	valid_0's custom: 0.00925298
[524]	valid_0's rmse: 0.235341	valid_0's custom: 0.00925281
[525]	valid_0's rmse: 0.235312	valid_0's custom: 0.00925073
[526]	valid_0's rmse: 0.235282	valid_0's custom: 0.00924912
[527]	valid_0's rmse: 0.235238	valid_0's custom: 0.00924779
[528]	valid_0's rmse: 0.235214	valid_0's custom: 0.00924642
[529]	valid_0's rmse: 0.235178	valid_0's custom: 0.00924533
[530]	valid_0's rmse: 0.235146	valid_0's custom: 0.00924353
[531]	valid_0's rmse: 0.235128	valid_0's

[579]	valid_0's rmse: 0.234262	valid_0's custom: 0.00921494
[580]	valid_0's rmse: 0.234266	valid_0's custom: 0.00921625
[581]	valid_0's rmse: 0.234251	valid_0's custom: 0.00921558
[582]	valid_0's rmse: 0.234252	valid_0's custom: 0.00921586
[583]	valid_0's rmse: 0.234241	valid_0's custom: 0.0092163
[584]	valid_0's rmse: 0.234244	valid_0's custom: 0.00921666
[585]	valid_0's rmse: 0.234219	valid_0's custom: 0.00921517
[586]	valid_0's rmse: 0.234219	valid_0's custom: 0.00921541
[587]	valid_0's rmse: 0.234218	valid_0's custom: 0.00921646
[588]	valid_0's rmse: 0.234208	valid_0's custom: 0.00921619
[589]	valid_0's rmse: 0.234205	valid_0's custom: 0.0092164
[590]	valid_0's rmse: 0.234181	valid_0's custom: 0.00921595
[591]	valid_0's rmse: 0.234183	valid_0's custom: 0.00921665
[592]	valid_0's rmse: 0.234159	valid_0's custom: 0.00921623
[593]	valid_0's rmse: 0.234147	valid_0's custom: 0.00921677
[594]	valid_0's rmse: 0.234142	valid_0's custom: 0.00921697
[595]	valid_0's rmse: 0.234129	valid_0's c

[640]	valid_0's rmse: 0.234143	valid_0's custom: 0.00924259
[641]	valid_0's rmse: 0.234143	valid_0's custom: 0.00924203
[642]	valid_0's rmse: 0.234171	valid_0's custom: 0.00924496
[643]	valid_0's rmse: 0.234168	valid_0's custom: 0.00924465
[644]	valid_0's rmse: 0.234175	valid_0's custom: 0.00924627
[645]	valid_0's rmse: 0.234179	valid_0's custom: 0.00924635
[646]	valid_0's rmse: 0.234208	valid_0's custom: 0.00924889
[647]	valid_0's rmse: 0.234231	valid_0's custom: 0.00925045
[648]	valid_0's rmse: 0.234242	valid_0's custom: 0.00925134
[649]	valid_0's rmse: 0.234264	valid_0's custom: 0.00925255
[650]	valid_0's rmse: 0.234253	valid_0's custom: 0.00925205
[651]	valid_0's rmse: 0.234262	valid_0's custom: 0.00925346
[652]	valid_0's rmse: 0.234258	valid_0's custom: 0.00925314
[653]	valid_0's rmse: 0.23427	valid_0's custom: 0.00925392
[654]	valid_0's rmse: 0.234271	valid_0's custom: 0.00925377
[655]	valid_0's rmse: 0.234273	valid_0's custom: 0.00925414
[656]	valid_0's rmse: 0.234284	valid_0's 

[705]	valid_0's rmse: 0.23477	valid_0's custom: 0.00929622
[706]	valid_0's rmse: 0.234789	valid_0's custom: 0.0092976
[707]	valid_0's rmse: 0.234809	valid_0's custom: 0.00929867
[708]	valid_0's rmse: 0.23483	valid_0's custom: 0.00930044
[709]	valid_0's rmse: 0.234848	valid_0's custom: 0.00930135
[710]	valid_0's rmse: 0.234874	valid_0's custom: 0.00930335
[711]	valid_0's rmse: 0.234888	valid_0's custom: 0.0093042
[712]	valid_0's rmse: 0.234896	valid_0's custom: 0.00930671
[713]	valid_0's rmse: 0.234904	valid_0's custom: 0.00930709
[714]	valid_0's rmse: 0.234911	valid_0's custom: 0.0093087
[715]	valid_0's rmse: 0.23493	valid_0's custom: 0.00930997
[716]	valid_0's rmse: 0.234951	valid_0's custom: 0.00931167
[717]	valid_0's rmse: 0.234968	valid_0's custom: 0.00931247
[718]	valid_0's rmse: 0.234982	valid_0's custom: 0.00931457
[719]	valid_0's rmse: 0.234981	valid_0's custom: 0.00931471
[720]	valid_0's rmse: 0.234997	valid_0's custom: 0.00931597
[721]	valid_0's rmse: 0.23501	valid_0's custom

[766]	valid_0's rmse: 0.235363	valid_0's custom: 0.00936678
[767]	valid_0's rmse: 0.235366	valid_0's custom: 0.00936746
[768]	valid_0's rmse: 0.235383	valid_0's custom: 0.00936866
[769]	valid_0's rmse: 0.235383	valid_0's custom: 0.00936883
[770]	valid_0's rmse: 0.235385	valid_0's custom: 0.00936912
[771]	valid_0's rmse: 0.235401	valid_0's custom: 0.00937052
[772]	valid_0's rmse: 0.235418	valid_0's custom: 0.00937196
[773]	valid_0's rmse: 0.235413	valid_0's custom: 0.00937254
[774]	valid_0's rmse: 0.235412	valid_0's custom: 0.00937267
[775]	valid_0's rmse: 0.235414	valid_0's custom: 0.0093736
[776]	valid_0's rmse: 0.235415	valid_0's custom: 0.00937427
[777]	valid_0's rmse: 0.235393	valid_0's custom: 0.00937354
[778]	valid_0's rmse: 0.235386	valid_0's custom: 0.00937304
[779]	valid_0's rmse: 0.23539	valid_0's custom: 0.00937392
[780]	valid_0's rmse: 0.235389	valid_0's custom: 0.00937445
[781]	valid_0's rmse: 0.235404	valid_0's custom: 0.00937653
[782]	valid_0's rmse: 0.235393	valid_0's c

[832]	valid_0's rmse: 0.235496	valid_0's custom: 0.00942468
[833]	valid_0's rmse: 0.235484	valid_0's custom: 0.00942511
[834]	valid_0's rmse: 0.235463	valid_0's custom: 0.00942472
[835]	valid_0's rmse: 0.235455	valid_0's custom: 0.00942525
[836]	valid_0's rmse: 0.235438	valid_0's custom: 0.00942504
[837]	valid_0's rmse: 0.235419	valid_0's custom: 0.00942436
[838]	valid_0's rmse: 0.235408	valid_0's custom: 0.00942488
[839]	valid_0's rmse: 0.235395	valid_0's custom: 0.00942437
[840]	valid_0's rmse: 0.235383	valid_0's custom: 0.00942416
[841]	valid_0's rmse: 0.235399	valid_0's custom: 0.00942516
[842]	valid_0's rmse: 0.235418	valid_0's custom: 0.00942754
[843]	valid_0's rmse: 0.235437	valid_0's custom: 0.00943017
[844]	valid_0's rmse: 0.235437	valid_0's custom: 0.0094305
[845]	valid_0's rmse: 0.235453	valid_0's custom: 0.00943142
[846]	valid_0's rmse: 0.235473	valid_0's custom: 0.009434
[847]	valid_0's rmse: 0.235491	valid_0's custom: 0.00943674
[848]	valid_0's rmse: 0.235486	valid_0's cu

<lightgbm.basic.Booster at 0x7ff3d0151be0>

In [None]:
trainDataForP = lgb.Dataset('./train/trainDataForP.bin')
validationDataForP = lgb.Dataset('./train/validDataForP.bin', reference=trainDataForPePerWF)

params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': {'rmse'},
    'num_leaves': 2**11,
    'learning_rate': 0.01,
    'feature_fraction': 1,
    'bagging_fraction': 1,
    'bagging_freq': 5,
    'verbose': 0,
    'num_threads': 20,
    'max_depth': -1,
}
gbmForP = lgb.train(
    params,
    trainDataForP,
    num_boost_round=3000,
    valid_sets=validationDataForP,
    early_stopping_rounds=300,
    fobj=lossfunc_train,
    feval=lossfunc_eval
)
gbmForP.save_model('modelP.txt')