# final.ipynb: 通过已经训练的决策树，最终解题
前提：跑完`train.ipynb`，最好跑完`waveform.py`。如果跑的是`model.ipynb`，请将model的文件名相应加一个play。

In [6]:
import multiprocessing
import warnings
import multiprocessing
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import h5py
from utils import loadData, saveData, getNum, getPePerWF, saveans, lossfunc_eval, lossfunc_train
import lightgbm as lgb
from tqdm import tqdm

读取测试集

In [2]:
testpath = "data/final.h5"
testWF = loadData(testpath, 'test')

Structure of data:
<HDF5 dataset "Waveform": shape (12178193,), type "|V2008"> Waveform /Waveform


如果已经跑完了`waveform.py`，跑上面的代码块；如果没有，跑下面的代码块。

总之，得到第一个决策树需要的那些参数，并为第二个决策树做准备。

In [3]:
with h5py.File('./train/final_wf.h5', 'r') as ipt:
    intTestWF = ipt['Waveform']['intWF'][...]
    pointsPerTestWF = ipt['Waveform']['pointsPerWF'][...]
    pePerTestWFCalc = ipt['Waveform']['pePerWFCalc'][...]
    meanPeTimePerTestWF = ipt['Waveform']['meanPeTimePerWF'][...]
    wfIndices = ipt['WfIndices'][...]

In [3]:
numPEW, wfIndices = getNum(testWF)

In [9]:
    print("正在去除波形噪音...")
    denoisedTestWF = np.empty((testWF.shape[0], 1000), dtype='<i2')
    step = 2000000
    for i in tqdm(range(testWF.shape[0] // step + 1)):
        denoisedTestWF[step*i:step*(i+1)] = np.where(
            testWF['Waveform'][step*i:step*(i+1), :] < 918,
            918-testWF['Waveform'][step*i:step*(i+1), :],
            0
        )
    print("正在做波形积分...")
    intTestWF = np.sum(denoisedTestWF, axis=1)
    print("正在计算超出阈值的点数...")
    pointsPerTestWF = np.sum(denoisedTestWF > 0, axis=1)
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        with multiprocessing.Pool(8) as p:
            res = np.array(
                list(
                    tqdm(
                        p.imap(
                            getPePerWF,
                            denoisedTestWF
                        ),
                        total=denoisedTestWF.shape[0]
                    )
                )
            )

    pePerTestWFCalc = res[:, 0]
    meanPeTimePerTestWF = res[:, 1]


  0%|          | 0/7 [00:00<?, ?it/s]

正在去除波形噪音...


100%|██████████| 7/7 [00:55<00:00,  7.99s/it]


正在做波形积分...
正在计算超出阈值的点数...


100%|██████████| 12178193/12178193 [33:26<00:00, 6068.65it/s]


使用第一个决策树，预测每个波形的PE数`pePerWF`，按照上面一个代码块

或者使用线性拟合，按照下面一个代码块

In [4]:
gbmForPePerWF = lgb.Booster(model_file='./model/modelPePerWF.txt')
pePerWF = gbmForPePerWF.predict(
    np.stack(
        (intTestWF, pointsPerTestWF, pePerTestWFCalc),
        axis=1
    )
)

In [10]:
pePerWF = np.round(intTestWF * 0.00651683 + 0.5644330147776138).astype(int)

In [10]:
pePerWF = pePerTestWFCalc

运算得到第二个决策树所需要的五个feature

In [11]:
splitPePerTestWFFinal = np.split(pePerWF, wfIndices[1:-1])

peTotal = np.empty(4000)
peMean = np.empty(4000)
peStd = np.empty(4000)
for index, pePerTestWFFinalChunk in enumerate(tqdm(splitPePerTestWFFinal)):
    peTotal[index] = np.sum(pePerTestWFFinalChunk)
    peMean[index] = np.mean(pePerTestWFFinalChunk)
    peStd[index] = np.std(pePerTestWFFinalChunk)

splitMeanPeTimePerTestWF = np.split(meanPeTimePerTestWF, wfIndices[1:-1])
peTimeMean = np.empty(4000)
peTimeStd = np.empty(4000)
for index, meanPeTimePerTestWFFinalChunk in enumerate(tqdm(splitMeanPeTimePerTestWF)):
    peTimeMean[index] = np.nanmean(meanPeTimePerTestWFFinalChunk)
    peTimeStd[index] = np.nanstd(meanPeTimePerTestWFFinalChunk)


100%|██████████| 4000/4000 [00:00<00:00, 12073.58it/s]
100%|██████████| 4000/4000 [00:00<00:00, 5236.77it/s]


喂进第二个决策树，得到最终答案动量p

In [12]:
gbmForP = lgb.Booster(model_file='./model/modelPPlay.txt')
answerP = gbmForP.predict(
    np.stack(
        (peTotal, peMean, peStd, peTimeMean, peTimeStd),
        axis=1
    )
)

将答案存为标准格式，完成！

In [13]:
saveans(answerP, './ans/ansPlay2.h5')