# 6b-Model and 2b-Model Application

This notebook is used to analyze the performance of the 6b-Model and 2b-Model on selected simulated events.

`TRSM` is a custom module created to handle the desired events.

In [1]:
# Allow for automatic reload of modules (useful for development)
%load_ext autoreload
%autoreload 2

In [2]:
from trsm import TRSM, combos_6j

In [3]:
import numpy as np
import awkward as ak

In [4]:
%matplotlib widget
import matplotlib.pyplot as plt
from consistent_plots import hist, hist2d

## Load events, apply model

Load file, apply TRSM module, create six-jet combinations, apply 6b-Model to assign scores to combinations in events

In [5]:
filename6 = '../../NMSSM_XYH_YToHH_6b_MX_700_MY_400_6jet_testing_set_6jets_2021Aug.root'
filename7 = '../../NMSSM_XYH_YToHH_6b_MX_700_MY_400_6jet_testing_set_7jets_2021Aug.root'
filename8 = '../../NMSSM_XYH_YToHH_6b_MX_700_MY_400_6jet_testing_set_8jets_2021Aug.root'

In [6]:
trsm7 = TRSM(filename=filename7)

In [None]:
trsm8 = TRSM(filename=filename8)

In [7]:
tag = '20210816_5btag_req'

In [8]:
combos7 = combos_6j(trsm7, 7)

100%|██████████████████████████████████████████████████████| 129355/129355 [04:02<00:00, 532.37it/s]


Total events chosen: 129355


In [None]:
combos8 = combos_6j(trsm8, 8)

100%|████████████████████████████████████████████████████████| 87382/87382 [06:21<00:00, 228.93it/s]


In [9]:
combos7.apply_6j_model(tag)

Applying 6b Model to combinations. Please wait.


2021-08-19 09:12:30.262833: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-19 09:12:33.536002: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-19 09:12:33.541534: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000000000 Hz


Selecting highest scoring combination from each event.


100%|██████████████████████████████████████████████████████| 129355/129355 [16:39<00:00, 129.45it/s]


In [10]:
scores7 = combos7.scores_combo

In [None]:
combos8.apply_6j_model(tag)
scores8 = combos8.scores_combo

## Event Selection

The 6b-Model is used to assign scores to six-jet combinations in each event to determine which six-jet combination is the most likely to be the correct combination, i.e. the combination that contains all six signal b jets. Events are separated into two categories: events that contain a correct combination (not necessarily assigned the highest score, but the combination exists in the event) and events that do not contain a correct combination (none of th combinations in the event are the correct combination, so all events *should* be assigned a low score).

In [19]:
combos7.get_stats(0.8)

35 % of all events contain signal combos
80 % of events containing signal combo assigned highest score to signal combo
42 % of events removed by applying score cut 0.8
50 % of events above score cut 0.8 contain signal combo
42 % of events above score cut 0.8 contain signal combo assigned highest score


In [29]:
combos8.get_stats(0.8)

41% of all events contain signal combos
68% of events containing signal combo assigned highest score to signal combo
28% of events removed by applying score cut 0.8
50% of events above score cut 0.8 contain signal combo
36% of events above score cut 0.8 contain signal combo assigned highest score


## Score Distribution by Event
Represents combinations with highest score from each event

In [27]:
from trsm import plot_highest_score, plot_combo_scores

In [26]:
fig, ax = plot_highest_score(combos7)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

## Score Distribution by Combination

In [33]:
fig, ax = plot_combo_scores(combos7)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [71]:
import vector

In [44]:
combo7_m = ak.to_numpy(combos7.sixjet_p4.mass)
combo7_m = combo7_p4.reshape(combo7_p4.shape[0])

In [47]:
from trsm import plot_combo_score_v_mass

In [50]:
fig, ax = plot_combo_score_v_mass(combos7)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [52]:
from trsm import plot_highest_score_v_mass

In [58]:
fig, ax = plot_highest_score_v_mass(combos7)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [13]:
score_bins = np.arange(0,1.01,0.01)

In [16]:
from numba import jit

@jit(forceobj=True)
def calculate_confusion_matrix(forceobj=True):
    tpr_arr = []
    fpr_arr = []
    prec_arr = []
    recall_arr = []
    for i in score_bins:
        true_pos  = np.sum(scores7[combos7.signal_mask] > i)
        false_pos = np.sum(scores7[~combos7.signal_mask] > i)
        true_neg  = np.sum(scores7[~combos7.signal_mask] <= i)
        false_neg = np.sum(scores7[combos7.signal_mask] <= i)
        
        tpr = true_pos / (true_pos + false_neg) * 100
        fpr = false_pos / (false_pos + true_neg) * 100
        
        tpr_arr.append(tpr)
        fpr_arr.append(fpr)
        
        precision = true_pos / (true_pos + false_pos)
        recall = true_pos / (true_pos + false_neg)
        
        prec_arr.append(precision)
        recall_arr.append(recall)
        
        
    tpr = np.asarray(tpr_arr)
    fpr = np.asarray(fpr_arr)
    precision = np.asarray(prec_arr)
    recall = np.asarray(recall_arr)
    
    f1_score = 2 * precision * recall / (precision + recall)
    
    return tpr, fpr, precision, recall, f1_score
    
tpr, fpr, prec, rec, f1 = calculate_confusion_matrix()

  tpr, fpr, prec, rec, f1 = calculate_confusion_matrix()


In [17]:
auc = round(np.sum( (tpr[:-1] + tpr[1:]) / 2 * (fpr[:-1] - fpr[1:]) ) / 100, 3)

F1 is harmonic mean of precision and recall

In [64]:
# fig, ax = plt.subplots()
# ax.set_title(r"$^7C_6$ Combinations ROC")

# ax.plot(score_bins, prec, label='precision')
# ax.plot(score_bins, rec, label='recall')
# ax.plot(score_bins, f1, label='F1')

# ax.legend()
# ax.set_xlabel('Score Threshold')

In [29]:
fig, ax = plt.subplots()
ax.set_title(r"$^7C_6$ Combinations ROC")

y = np.sqrt(tpr/100*(1-fpr/100))

ax.plot(score_bins, y)
ax.set_ylabel('G-Mean')
ax.set_xlabel('Score Threshold')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 0, 'Score Threshold')

In [18]:
fig, ax = plt.subplots()
ax.set_title(r"$^7C_6$ Combinations ROC")

ax.plot(fpr, tpr)
# ax.scatter(fpr[::10], tpr[::10], s=6, color='orange')
ax.set_ylabel('True Positives %')
ax.set_xlabel('False Positives %')

props = dict(boxstyle='round', facecolor='white', alpha=0.5)
ax.text(0.75, 0.1, f"auc = {auc}%", transform=ax.transAxes, bbox=props)

fig.savefig('roc.pdf')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [31]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9,5))

n, b, _ = hist(ax, p4_7.mass[combos7.signal_mask], bins=np.linspace(400,900,100), label='Correct Combinations', color='limegreen')
n, b, _ = hist(ax, p4_7.mass[~combos7.signal_mask], bins=np.linspace(400,900,100), label='Incorrect Combinations', color='lightcoral')


ax.set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax.set_ylabel('Count')
ax.legend(loc=2)

fig.savefig('inv_mass_after_score_cut.pdf')

plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [32]:
len(evt_score_mask)

129355

In [33]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9,5))

n, b, _ = hist(ax, p4_7.mass[evt_score_mask][sgnl_evt], bins=np.linspace(400,900,100), label='Correct Highest Score', color='limegreen')
n, b, _ = hist(ax, p4_7.mass[evt_score_mask][~sgnl_evt], bins=np.linspace(400,900,100), label='Incorrect Highest Score', color='lightcoral')


ax.set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax.set_ylabel('Count')
ax.legend(loc=2)

fig.savefig('inv_mass_after_score_cut.pdf')

plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [35]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9,5))

score_cut_mask = evt_high_score > 0.8
sgnl_score_cut = sgnl_evt & score_cut_mask
bkgd_score_cut = (~sgnl_evt) & score_cut_mask

n, b, _ = hist(ax, p4_7.mass[evt_score_mask][sgnl_score_cut], bins=np.linspace(400,900,100), label='Correct Highest Score > 0.8', color='limegreen')
n, b, _ = hist(ax, p4_7.mass[evt_score_mask][bkgd_score_cut], bins=np.linspace(400,900,100), label='Incorrect Highest Score > 0.8', color='lightcoral')


ax.set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax.set_ylabel('Count')
ax.legend(loc=2)

fig.savefig('inv_mass_after_score_cut.pdf')

plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [62]:
# fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10,4))

# ax[0].set_title("Correct Combos")
# ax[1].set_title("Incorrect Combos")

# n, b, _ = hist(ax[0], p4_7.mass[combos7.signal_mask], bins=np.linspace(400,900,100), label='No cut', color='limegreen')
# n, b, _ = hist(ax[0], p4_7.mass[combos7.signal_mask][scores_7[combos7.signal_mask] > 0.8], bins=np.linspace(400,900,100), label='score > 0.8', color='darkgreen')
# n, b, _ = hist(ax[1], p4_7.mass[~combos7.signal_mask], bins=np.linspace(400,900,100), label='No cut', color='lightcoral')
# n, b, _ = hist(ax[1], p4_7.mass[~combos7.signal_mask][scores_7[~combos7.signal_mask] > 0.8], bins=np.linspace(400,900,100), label='score > 0.8', color='maroon')
# # n, xedges, yedges, imb = hist2d(ax[1], p4_7.mass[~combos7.signal_mask], scores_7[~combos7.signal_mask], xbins=np.linspace(0,2000,100))


# ax[0].set_xlabel('Invariant Mass of 6-jet System [GeV]')
# ax[1].set_xlabel('Invariant Mass of 6-jet System [GeV]')
# ax[0].set_ylabel('Count')
# ax[1].set_ylabel('Count')
# ax[0].legend()
# ax[1].legend()

# fig.savefig('inv_mass_postscorecut.pdf')

# plt.show()

In [63]:
# fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(6,5), sharex=True, gridspec_kw={'height_ratios':[3,1]})

# ns, b, _ = hist(ax[0], p4_7.mass[combos7.signal_mask][scores_7[combos7.signal_mask] > 0.8], bins=np.linspace(400,900,100), label='sgnl score > 0.8', color='darkgreen')
# nb, b, _ = hist(ax[0], p4_7.mass[~combos7.signal_mask][scores_7[~combos7.signal_mask] > 0.8], bins=np.linspace(400,900,100), label='bkgd score > 0.8', color='maroon')
# # n, xedges, yedges, imb = hist2d(ax[1], p4_7.mass[~combos7.signal_mask], scores_7[~combos7.signal_mask], xbins=np.linspace(0,2000,100))

# x = (b[1:] + b[:-1])/2

# ax[1].plot(x, ns/nb)
# ax[1].plot(x, np.ones_like(x), '--k', alpha=0.2)

# ax[1].set_xlabel('Invariant Mass of 6-jet System [GeV]')
# ax[0].set_ylabel('Count')
# ax[0].legend()

# ax[1].set_ylabel(r'Ratio $S/B_C$')

# plt.show()

In [39]:
# plt.close('all')

In [28]:
combos7.create_pairs(tag)
# combos7.create_pairs(tag)

100%|██████████████████████████████████████████████████████| 129355/129355 [03:18<00:00, 652.62it/s]


In [41]:
combos7.pair_features.shape

(1940325, 9)

In [30]:
combos7.apply_2j_model('20210817_4btag_req')

Applying 6b Model to combinations. Please wait.


100%|██████████████████████████████████████████████████████| 129355/129355 [21:14<00:00, 101.51it/s]


In [31]:
combos7.select_highest_scoring_pairs()

100%|██████████████████████████████████████████████████████| 129355/129355 [20:47<00:00, 103.66it/s]


In [46]:
# np.savez("../../inputs_2jet_train.npz", inputs=combos7.pair_features, targets=combos7.pair_target)

In [47]:
# pairs = pairs_2j(combos7)

In [48]:
len(combos7.all3Higgs_mask[combos7.all3Higgs_mask > 0])/len(combos7.all3Higgs_mask)

0.28535425766302036

In [49]:
combos7.pair_target[combos7.pair_target > -1]

array([0, 1, 1, ..., 2, 0, 1])

In [50]:
from matplotlib import ticker

In [51]:
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(10,3))

ax = axs[0]
ax.set_title(r"$^7C_6$ Combinations with Pairing, Normalized", pad=10)

bins = np.linspace(0,1.01,100)

scores_2_noscorecut = combos7.scores_pairs[:,0]


n_2_all, b_2_all, x_2_all = norm_hist(scores_2_noscorecut)
c_2_all, b_2_all, x_2_all = norm_hist(scores_2_noscorecut[combos7.pair_target > -1])
w_2_all, b_2_all, x_2_all = norm_hist(scores_2_noscorecut[combos7.pair_target == -1])

hist(ax, x_2_all, weights=n_2_all, bins=b_2_all, label='All pairs')
hist(ax, x_2_all, weights=c_2_all, bins=b_2_all, label='Correct pairs')
hist(ax, x_2_all, weights=w_2_all, bins=b_2_all, label='Incorrect pairs')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('AU')

textstr = f'Entries = {len(scores_2_noscorecut)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

ax = axs[1]
ax.set_title(r"$^7C_6$ Combinations with Pairing", pad=10)

formatter = ticker.ScalarFormatter(useMathText=True)
formatter.set_scientific(True) 
formatter.set_powerlimits((-1,1)) 
ax.yaxis.set_major_formatter(formatter) 

n_2_all, b_2_all = np.histogram(scores_2_noscorecut, bins=100)
c_2_all, b_2_all = np.histogram(scores_2_noscorecut[combos7.pair_target > -1], bins=100)
w_2_all, b_2_all = np.histogram(scores_2_noscorecut[combos7.pair_target == -1], bins=100)

x_2_all = (b_2_all[1:] + b_2_all[:-1]) / 2

# hist(ax, x_2_all, weights=n_2_all, bins=b_2_all, label='All 6-jet combos')
hist(ax, x_2_all, weights=c_2_all, bins=b_2_all, label='Correct pairs')
hist(ax, x_2_all, weights=w_2_all, bins=b_2_all, label='Incorrect pairs')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('Number of Combinations Per Bin')

textstr = f'Entries = {len(scores_2_noscorecut)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

ax.text(0.3, 0.25, f'Ratio correct/total above 0.8 = {100*sum(c_2_all[x_2_all > 0.8])/sum(n_2_all[x_2_all > 0.8]):.0f}%', transform=ax.transAxes)
# ax.ticklabel_format(useMathText=True, useOffset=True)

plt.tight_layout()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [52]:
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(10,3))

ax = axs[0]
ax.set_title(r"$^7C_6$ Combinations with Pairing, Normalized")

bins = np.linspace(0,1.01,100)

scores_2 = combos7.scores_pairs[:,0]


n_7, b_7, x_7 = norm_hist(scores_2)
c_n_7, b_7, x_7 = norm_hist(scores_2[combos7.pair_target > -1])
w_n_7, b_7, x_7 = norm_hist(scores_2[combos7.pair_target == -1])

# hist(ax, x_7, weights=n_7, bins=b_7, label='All pairs')
hist(ax, x_7, weights=c_n_7, bins=b_7, label='Correct pairs')
hist(ax, x_7, weights=w_n_7, bins=b_7, label='Incorrect pairs')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('AU')

textstr = f'Entries = {len(scores_2)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

ax = axs[1]
ax.set_title(r"$^7C_6$ Combinations with Pairing")

n_7, b_7 = np.histogram(scores_2, bins=100)
c_n_7, b_7 = np.histogram(scores_2[combos7.pair_target > -1], bins=100)
w_n_7, b_7 = np.histogram(scores_2[combos7.pair_target == -1], bins=100)

x_7 = (b_7[1:] + b_7[:-1]) / 2

# hist(ax, x_7, weights=n_7, bins=b_7, label='All 6-jet combos')
hist(ax, x_7, weights=c_n_7, bins=b_7, label='Correct pairs')
hist(ax, x_7, weights=w_n_7, bins=b_7, label='Incorrect pairs')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('Number of Combinations Per Bin')

textstr = f'Entries = {len(scores_2)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

ax.text(0.3, 50000, f'Ratio correct/total above 0.8 = {100*sum(c_n_7[x_7 > 0.8])/sum(n_7[x_7 > 0.8]):.0f}%')

plt.tight_layout()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …