# BDT Debugging

This notebook, along with performance_report_copy.py, provide an illustration of a current bug in the BDT prediction: that this code to predict (reading directly from an awkward file and predicting) gives different prediction values than those in performance_report_copy.py. Note: directories lead to where the files are in my personal area, but the files are also included in this debugging folder for reference.

In [2]:
%matplotlib inline
import os
import time
import json
import uproot
import awkward
import xgboost as xgb
import pandas as pd
import numpy as np
import mvatrain.preprocessors as mpp
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from os.path import join

This is the model and signal file that will be used. The model was trained on total signal (2mu2e and 4mu) with corrected maxd0 and mind0 calculation. The signal file was generated from a skimmed root file of 1000 events from a 4mu sample. The same model and file are passed to performance_report_copy.py.

In [3]:
OUTPUT_DIR = "/uscms/home/sadie/nobackup/ffAna-master/mvatrain/outputs/earl_grey_strong_redux"  #model
DATA_DIR = "/uscms/home/sadie/nobackup/ffAna-master/mvatrain/data/signal_190813.awkd"

Load in the model (since the model was trained on GPU nodes, the if statement is necessary to make the model work when running on CPU nodes):

In [8]:
print("loading model...")
xgbm_optimized = xgb.Booster({"nthread": 16})
xgbm_optimized.load_model(join(OUTPUT_DIR, "model_optimized/model.bin"))
if xgbm_optimized.attributes().get('SAVED_PARAM_predictor', None)=='gpu_predictor':
    xgbm_optimized.set_attr(SAVED_PARAM_predictor=None)
print("model loaded.")
print("loading model...")
xgbm_optimized2 = xgb.Booster({"nthread": 16})
xgbm_optimized2.load_model(join(OUTPUT_DIR, "model_optimized/model.bin"))
if xgbm_optimized2.attributes().get('SAVED_PARAM_predictor', None)=='gpu_predictor':
    xgbm_optimized2.set_attr(SAVED_PARAM_predictor=None)
print("model loaded.")

loading model...
model loaded.
loading model...
model loaded.


Read in the awkward file and do predictions:

In [24]:
dataset2_ = awkward.load(DATA_DIR)
df2 = pd.DataFrame(dict(dataset2_))
df2.fillna(0)

[print(x) for x in df2.keys()]
[print(x) for x in dataset2_.keys()]

feature_cols = [n for n in dataset2_.keys() if n != "target"]

df3 = df2[feature_cols]

xgtest2 = xgb.DMatrix(df3)

predictions = xgbm_optimized2.predict(xgtest2)

target
pt
eta
neufrac
maxd0
mind0
tkiso
pfiso
spreadpt
spreaddr
lambda
epsilon
ecf1
ecf2
ecf3
target
pt
eta
neufrac
maxd0
mind0
tkiso
pfiso
spreadpt
spreaddr
lambda
epsilon
ecf1
ecf2
ecf3


Print predictions for comparison:

In [23]:
for x in predictions:
    print(x)

-4.5141335
8.543285
8.483538
6.732876
7.404385
7.7008843
10.431449
6.509923
9.296654
9.330179
-5.383587
6.732003
8.742421
-4.422081
-6.968762
-7.761949
-5.2850986
7.918211
-3.732893
6.628977
5.637325
6.6871777
-7.899145
9.490824
8.497035
7.2984076
-2.599769
-3.7468753
-4.3057327
-7.192308
8.543003
-8.480057
9.797422
8.376005
-9.149614
9.60353
-6.798461
-8.550571
-6.967067
8.359665
8.477427
-4.473496
-8.560316
-5.1448793
-6.6046925
-2.6661017
9.078332
9.886828
8.583348
-6.536708
-6.2914567
7.9772363
-7.202888
7.534166
-7.839554
4.101373
7.404954
-1.0759785
7.1726885
7.449983
8.854927
-3.669893
-4.3671503
4.174787
7.306478
6.2965064
6.276579
7.3504777
-7.1874285
8.161432
8.3072815
9.379136
-3.9224758
5.3677945
7.6419725
7.6859436
-9.02056
-3.8004897
6.161835
-4.9708843
-4.1922817
-7.9893956
10.491176
8.468589
-5.8010955
9.779403
8.202558
-3.2667341
-8.450048
9.529846
-5.1116095
9.7993355
-2.575331
-4.6485753
7.444374
7.251807
7.848078
8.522001
9.6486435
-9.243121
-6.3921614
9.802706
-4.7