# 6b-Model and 2b-Model Application

This notebook is used to analyze the performance of the 6b-Model and 2b-Model on selected simulated events.

`TRSM` is a custom module created to handle the desired events.

In [1]:
# Allow for automatic reload of modules (useful for development)
%load_ext autoreload
%autoreload 2

In [2]:
from trsm import TRSM, combos_6j

In [3]:
import os
os.getcwd()

'/uscms_data/d3/srosenzw/workarea/higgs/sixb_analysis/CMSSW_10_2_18/src/sixb/6jet_classifier/analyzers'

In [4]:
import numpy as np
import awkward as ak

In [5]:
%matplotlib widget
import matplotlib.pyplot as plt
from consistent_plots import hist, hist2d

## Load events, apply model

Load file, apply TRSM module, create six-jet combinations, apply 6b-Model to assign scores to combinations in events

In [6]:
filename = '../../NMSSM_XYH_YToHH_6b_MX_700_MY_400_6jet_testing_set_7jets_2021Aug.root'
trsm_set = TRSM(filename=filename)

In [7]:
tag = '20210816_5btag_req'

In [8]:
combos7 = combos_6j(trsm_set, 7)

100%|██████████████████████████████████████████████████████| 129355/129355 [04:02<00:00, 532.70it/s]
ic| combo_p4.pt: <Array [[120, 115, 76, ... 41.7, 40, 35.9]] type='905485 * var * float64'>
ic| type(combo_p4.pt): <class 'awkward.highlevel.Array'>


Total events chosen: 129355


In [9]:
combos7.apply_6j_model(tag)
scores_7 = combos7.scores_combo

2021-08-18 08:55:00.984662: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-18 08:55:08.881742: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 108658200 exceeds 10% of free system memory.
2021-08-18 08:55:09.044102: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-18 08:55:09.063731: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000000000 Hz


## Event Selection

The 6b-Model is used to assign scores to six-jet combinations in each event to determine which six-jet combination is the most likely to be the correct combination, i.e. the combination that contains all six signal b jets. Events are separated into two categories: events that contain a correct combination (not necessarily assigned the highest score, but the combination exists in the event) and events that do not contain a correct combination (none of th combinations in the event are the correct combination, so all events *should* be assigned a low score).

In [10]:
combos7.select_highest_scoring_combos()

100%|██████████████████████████████████████████████████████| 129355/129355 [05:58<00:00, 360.40it/s]


In [11]:
evt_score_mask = combos7.high_score_combo_mask
evt_high_score = combos7.evt_highest_score

In [12]:
sgnl_evt = combos7.sgnl_evt
sgnl_high_score = combos7.sgnl_high_score

In [13]:
def get_stats(score_cut = 0):
    # what percentage of events contain signal jets
    percent_signal = np.sum(sgnl_evt)/len(sgnl_evt) * 100
    
    # what percentage of signal events are the highest scoring
    percent_signal_highest_score = np.sum(sgnl_high_score)/np.sum(sgnl_evt) * 100
    
    # score cut removes this percentage
    percent_removal = (1 - np.sum(evt_high_score > score_cut)/len(evt_high_score)) * 100
    
    # of events with highest score above threshold, what percentage of those contain signal combo?
    percent_high_score_are_signal = np.sum(np.logical_and(sgnl_evt, evt_high_score > score_cut))/np.sum(evt_high_score > score_cut) * 100
    
    # of events with highest score above threshold, what percentage of those are signal combos with the highest score in the event?
    percent_high_score_signal_with_highest_score = np.sum(np.logical_and(sgnl_high_score, evt_high_score > score_cut))/np.sum(evt_high_score > score_cut) * 100
    
    print(f"{percent_signal:.0f}% of all events contain signal combos")
    print(f"{percent_signal_highest_score:.0f}% of events containing signal combo assigned highest score to signal combo")
    print(f"{percent_removal:.0f}% of events removed by applying score cut {score_cut}")
    print(f"{percent_high_score_are_signal:.0f}% of events above score cut {score_cut} contain signal combo")
    print(f"{percent_high_score_signal_with_highest_score:.0f}% of events above score cut {score_cut} contain signal combo assigned highest score")

In [14]:
get_stats(0.8)

35% of all events contain signal combos
80% of events containing signal combo assigned highest score to signal combo
42% of events removed by applying score cut 0.8
50% of events above score cut 0.8 contain signal combo
42% of events above score cut 0.8 contain signal combo assigned highest score


## Score Distribution by Event
Represents combinations with highest score from each event

In [16]:
fig, ax = plt.subplots()

score_bins = np.arange(0, 1.01, 0.01)

n_signal, edges = np.histogram(evt_high_score[sgnl_evt], bins=score_bins)
n_bkgd, edges   = np.histogram(evt_high_score[~sgnl_evt], bins=score_bins)

x = (edges[1:] + edges[:-1])/2

n_signal = n_signal / np.sum(n_signal)
n_bkgd = n_bkgd / np.sum(n_bkgd)

n_signal, edges, _ = hist(ax, x, weights=n_signal, bins=score_bins, label='Events with correct combos')
n_bkgd, edges, _ = hist(ax, x, weights=n_bkgd, bins=score_bins, label='Events with no correct combos')

ax.legend(loc=2)
ax.set_xlabel('Highest Assigned Score Amongst Combinations in Events')
ax.set_ylabel('AU')
ax.set_title('Distribution of Highest Scoring Combination in Events')

fig.savefig('evt_max_score_dist.pdf')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [17]:
fig, ax = plt.subplots()

n_signal, edges = np.histogram(evt_high_score[sgnl_evt], bins=score_bins)
n_bkgd, edges   = np.histogram(evt_high_score[sgnl_evt & sgnl_high_score], bins=score_bins)

x = (edges[1:] + edges[:-1])/2

n_signal = n_signal / np.sum(n_signal)
n_bkgd = n_bkgd / np.sum(n_bkgd)

n_signal, edges, _ = hist(ax, x, weights=n_signal, bins=score_bins, label='Events with correct combos')
n_bkgd, edges, _ = hist(ax, x, weights=n_bkgd, bins=score_bins, label='Events with no correct combos')

ax.legend(loc=2)
ax.set_xlabel('Highest Assigned Score Amongst Combinations in Events')
ax.set_ylabel('AU')
ax.set_title('Distribution of Highest Scoring Combination in Events')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 1.0, 'Distribution of Highest Scoring Combination in Events')

In [18]:
def norm_hist(arr, bins=100):
    n, b = np.histogram(arr, bins=bins)
    x = (b[:-1] + b[1:]) / 2
    
    return n/n.max(), b, x

## Score Distribution by Combination

In [19]:
fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(6,7))

ax = axs[0]
ax.set_title(r"$^7C_6$ Combinations Normalized")

n_7, b_7, x_7 = norm_hist(scores_7)
c_7, b_7, x_7 = norm_hist(scores_7[combos7.sgnl_mask])
w_7, b_7, x_7 = norm_hist(scores_7[~combos7.sgnl_mask])

hist(ax, x_7, weights=n_7, bins=b_7, label='All 6-jet combos')
hist(ax, x_7, weights=c_7, bins=b_7, label='Correct 6-jet combo')
hist(ax, x_7, weights=w_7, bins=b_7, label='Incorrect 6-jet combo')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('AU')

textstr = f'Entries = {len(scores_7)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

ax = axs[1]
ax.set_title(r"$^7C_6$ Combinations")

n_7, b_7 = np.histogram(scores_7, bins=100)
c_n_7, b_7 = np.histogram(scores_7[combos7.sgnl_mask], bins=100)
w_n_7, b_7 = np.histogram(scores_7[~combos7.sgnl_mask], bins=100)

x_7 = (b_7[1:] + b_7[:-1]) / 2

hist(ax, x_7, weights=n_7, bins=b_7, label='All 6-jet combos')
hist(ax, x_7, weights=c_n_7, bins=b_7, label='Correct 6-jet combo')
hist(ax, x_7, weights=w_n_7, bins=b_7, label='Incorrect 6-jet combo')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('Number of Combinations Per Bin')

textstr = f'Entries = {len(scores_7)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

plt.tight_layout()
fig.savefig('score_dist.pdf')
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [20]:
pt_7  = combos7.combo_features[:,0:6]
eta_7 = combos7.combo_features[:,6:12]
phi_7 = combos7.combo_features[:,12:18]
m_7   = np.ones(pt_7.shape)*4

In [21]:
import vector

In [22]:
p4_7_0 = vector.obj(pt=pt_7[:,0], eta=eta_7[:,0], phi=phi_7[:,0], mass=m_7[:,0])
p4_7_1 = vector.obj(pt=pt_7[:,1], eta=eta_7[:,1], phi=phi_7[:,1], mass=m_7[:,1])
p4_7_2 = vector.obj(pt=pt_7[:,2], eta=eta_7[:,2], phi=phi_7[:,2], mass=m_7[:,2])
p4_7_3 = vector.obj(pt=pt_7[:,3], eta=eta_7[:,3], phi=phi_7[:,3], mass=m_7[:,3])
p4_7_4 = vector.obj(pt=pt_7[:,4], eta=eta_7[:,4], phi=phi_7[:,4], mass=m_7[:,4])
p4_7_5 = vector.obj(pt=pt_7[:,5], eta=eta_7[:,5], phi=phi_7[:,5], mass=m_7[:,5])

In [23]:
p4_7 = p4_7_0 + p4_7_1 + p4_7_2 + p4_7_3 + p4_7_4 + p4_7_5

In [60]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10,4))

fig.suptitle("Combination Analysis")

ax[0].set_title("Signal 7 Combos")
ax[1].set_title("Background 7 Combos")

n, xedges, yedges, ims = hist2d(ax[0], p4_7.mass[combos7.sgnl_mask], scores_7[combos7.sgnl_mask], xbins=np.linspace(400,900,100))
n, xedges, yedges, imb = hist2d(ax[1], p4_7.mass[~combos7.sgnl_mask], scores_7[~combos7.sgnl_mask], xbins=np.linspace(0,2000,100))

plt.colorbar(ims, ax=ax[0])
plt.colorbar(imb, ax=ax[1])

ax[0].set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax[1].set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax[0].set_ylabel('Assigned Score')
ax[1].set_ylabel('Assigned Score')

fig.savefig('score_v_mass.pdf')

plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [61]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10,4))

fig.suptitle("Event Analysis")

ax[0].set_title("Combos with Highest Score in\nEvents with Signal Combo")
ax[1].set_title("Combos with Highest Score in\nEvents without Signal Combo")

n, xedges, yedges, ims = hist2d(ax[0], p4_7.mass[evt_score_mask][sgnl_evt], evt_high_score[sgnl_evt], xbins=np.linspace(400,900,100))
n, xedges, yedges, imb = hist2d(ax[1], p4_7.mass[evt_score_mask][~sgnl_evt], evt_high_score[~sgnl_evt], xbins=np.linspace(0,2000,100))

plt.colorbar(ims, ax=ax[0])
plt.colorbar(imb, ax=ax[1])

ax[0].set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax[1].set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax[0].set_ylabel('Assigned Score')
ax[1].set_ylabel('Assigned Score')

fig.savefig('score_v_mass.pdf')

plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [25]:
score_bins = np.arange(0,1.01,0.01)

In [26]:
from numba import jit

@jit(forceobj=True)
def calculate_confusion_matrix(forceobj=True):
    tpr_arr = []
    fpr_arr = []
    prec_arr = []
    recall_arr = []
    for i in score_bins:
        true_pos  = np.sum(scores_7[combos7.sgnl_mask] > i)
        false_pos = np.sum(scores_7[~combos7.sgnl_mask] > i)
        true_neg  = np.sum(scores_7[~combos7.sgnl_mask] <= i)
        false_neg = np.sum(scores_7[combos7.sgnl_mask] <= i)
        
        tpr = true_pos / (true_pos + false_neg) * 100
        fpr = false_pos / (false_pos + true_neg) * 100
        
        tpr_arr.append(tpr)
        fpr_arr.append(fpr)
        
        precision = true_pos / (true_pos + false_pos)
        recall = true_pos / (true_pos + false_neg)
        
        prec_arr.append(precision)
        recall_arr.append(recall)
        
        
    tpr = np.asarray(tpr_arr)
    fpr = np.asarray(fpr_arr)
    precision = np.asarray(prec_arr)
    recall = np.asarray(recall_arr)
    
    f1_score = 2 * precision * recall / (precision + recall)
    
    return tpr, fpr, precision, recall, f1_score
    
tpr, fpr, prec, rec, f1 = calculate_confusion_matrix()

  tpr, fpr, prec, rec, f1 = calculate_confusion_matrix()


In [27]:
auc = round(np.sum( (tpr[:-1] + tpr[1:]) / 2 * (fpr[:-1] - fpr[1:]) ) / 100, 3)

F1 is harmonic mean of precision and recall

In [64]:
# fig, ax = plt.subplots()
# ax.set_title(r"$^7C_6$ Combinations ROC")

# ax.plot(score_bins, prec, label='precision')
# ax.plot(score_bins, rec, label='recall')
# ax.plot(score_bins, f1, label='F1')

# ax.legend()
# ax.set_xlabel('Score Threshold')

In [29]:
fig, ax = plt.subplots()
ax.set_title(r"$^7C_6$ Combinations ROC")

y = np.sqrt(tpr/100*(1-fpr/100))

ax.plot(score_bins, y)
ax.set_ylabel('G-Mean')
ax.set_xlabel('Score Threshold')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 0, 'Score Threshold')

In [30]:
fig, ax = plt.subplots()
ax.set_title(r"$^7C_6$ Combinations ROC")

ax.plot(fpr, tpr)
# ax.scatter(fpr[::10], tpr[::10], s=6, color='orange')
ax.set_ylabel('True Positives %')
ax.set_xlabel('False Positives %')

props = dict(boxstyle='round', facecolor='white', alpha=0.5)
ax.text(0.75, 0.1, f"auc = {auc}%", transform=ax.transAxes, bbox=props)

fig.savefig('roc.pdf')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [31]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9,5))

n, b, _ = hist(ax, p4_7.mass[combos7.sgnl_mask], bins=np.linspace(400,900,100), label='Correct Combinations', color='limegreen')
n, b, _ = hist(ax, p4_7.mass[~combos7.sgnl_mask], bins=np.linspace(400,900,100), label='Incorrect Combinations', color='lightcoral')


ax.set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax.set_ylabel('Count')
ax.legend(loc=2)

fig.savefig('inv_mass_after_score_cut.pdf')

plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [32]:
len(evt_score_mask)

129355

In [33]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9,5))

n, b, _ = hist(ax, p4_7.mass[evt_score_mask][sgnl_evt], bins=np.linspace(400,900,100), label='Correct Highest Score', color='limegreen')
n, b, _ = hist(ax, p4_7.mass[evt_score_mask][~sgnl_evt], bins=np.linspace(400,900,100), label='Incorrect Highest Score', color='lightcoral')


ax.set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax.set_ylabel('Count')
ax.legend(loc=2)

fig.savefig('inv_mass_after_score_cut.pdf')

plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [35]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9,5))

score_cut_mask = evt_high_score > 0.8
sgnl_score_cut = sgnl_evt & score_cut_mask
bkgd_score_cut = (~sgnl_evt) & score_cut_mask

n, b, _ = hist(ax, p4_7.mass[evt_score_mask][sgnl_score_cut], bins=np.linspace(400,900,100), label='Correct Highest Score > 0.8', color='limegreen')
n, b, _ = hist(ax, p4_7.mass[evt_score_mask][bkgd_score_cut], bins=np.linspace(400,900,100), label='Incorrect Highest Score > 0.8', color='lightcoral')


ax.set_xlabel('Invariant Mass of 6-jet System [GeV]')
ax.set_ylabel('Count')
ax.legend(loc=2)

fig.savefig('inv_mass_after_score_cut.pdf')

plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [62]:
# fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10,4))

# ax[0].set_title("Correct Combos")
# ax[1].set_title("Incorrect Combos")

# n, b, _ = hist(ax[0], p4_7.mass[combos7.sgnl_mask], bins=np.linspace(400,900,100), label='No cut', color='limegreen')
# n, b, _ = hist(ax[0], p4_7.mass[combos7.sgnl_mask][scores_7[combos7.sgnl_mask] > 0.8], bins=np.linspace(400,900,100), label='score > 0.8', color='darkgreen')
# n, b, _ = hist(ax[1], p4_7.mass[~combos7.sgnl_mask], bins=np.linspace(400,900,100), label='No cut', color='lightcoral')
# n, b, _ = hist(ax[1], p4_7.mass[~combos7.sgnl_mask][scores_7[~combos7.sgnl_mask] > 0.8], bins=np.linspace(400,900,100), label='score > 0.8', color='maroon')
# # n, xedges, yedges, imb = hist2d(ax[1], p4_7.mass[~combos7.sgnl_mask], scores_7[~combos7.sgnl_mask], xbins=np.linspace(0,2000,100))


# ax[0].set_xlabel('Invariant Mass of 6-jet System [GeV]')
# ax[1].set_xlabel('Invariant Mass of 6-jet System [GeV]')
# ax[0].set_ylabel('Count')
# ax[1].set_ylabel('Count')
# ax[0].legend()
# ax[1].legend()

# fig.savefig('inv_mass_postscorecut.pdf')

# plt.show()

In [63]:
# fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(6,5), sharex=True, gridspec_kw={'height_ratios':[3,1]})

# ns, b, _ = hist(ax[0], p4_7.mass[combos7.sgnl_mask][scores_7[combos7.sgnl_mask] > 0.8], bins=np.linspace(400,900,100), label='sgnl score > 0.8', color='darkgreen')
# nb, b, _ = hist(ax[0], p4_7.mass[~combos7.sgnl_mask][scores_7[~combos7.sgnl_mask] > 0.8], bins=np.linspace(400,900,100), label='bkgd score > 0.8', color='maroon')
# # n, xedges, yedges, imb = hist2d(ax[1], p4_7.mass[~combos7.sgnl_mask], scores_7[~combos7.sgnl_mask], xbins=np.linspace(0,2000,100))

# x = (b[1:] + b[:-1])/2

# ax[1].plot(x, ns/nb)
# ax[1].plot(x, np.ones_like(x), '--k', alpha=0.2)

# ax[1].set_xlabel('Invariant Mass of 6-jet System [GeV]')
# ax[0].set_ylabel('Count')
# ax[0].legend()

# ax[1].set_ylabel(r'Ratio $S/B_C$')

# plt.show()

In [39]:
# plt.close('all')

In [40]:
combos7.create_pairs(tag)
# combos7.create_pairs(tag)

100%|██████████████████████████████████████████████████████| 129355/129355 [06:08<00:00, 351.29it/s]


In [41]:
combos7.pair_features.shape

(1940325, 9)

In [42]:
combos7.apply_2j_model('20210817_4btag_req')

In [45]:
combos7.select_highest_scoring_pairs()

100%|███████████████████████████████████████████████████████| 129355/129355 [26:05<00:00, 82.61it/s]


In [46]:
# np.savez("../../inputs_2jet_train.npz", inputs=combos7.pair_features, targets=combos7.pair_target)

In [47]:
# pairs = pairs_2j(combos7)

In [48]:
len(combos7.all3Higgs_mask[combos7.all3Higgs_mask > 0])/len(combos7.all3Higgs_mask)

0.28535425766302036

In [49]:
combos7.pair_target[combos7.pair_target > -1]

array([0, 1, 1, ..., 2, 0, 1])

In [50]:
from matplotlib import ticker

In [51]:
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(10,3))

ax = axs[0]
ax.set_title(r"$^7C_6$ Combinations with Pairing, Normalized", pad=10)

bins = np.linspace(0,1.01,100)

scores_2_noscorecut = combos7.scores_pairs[:,0]


n_2_all, b_2_all, x_2_all = norm_hist(scores_2_noscorecut)
c_2_all, b_2_all, x_2_all = norm_hist(scores_2_noscorecut[combos7.pair_target > -1])
w_2_all, b_2_all, x_2_all = norm_hist(scores_2_noscorecut[combos7.pair_target == -1])

hist(ax, x_2_all, weights=n_2_all, bins=b_2_all, label='All pairs')
hist(ax, x_2_all, weights=c_2_all, bins=b_2_all, label='Correct pairs')
hist(ax, x_2_all, weights=w_2_all, bins=b_2_all, label='Incorrect pairs')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('AU')

textstr = f'Entries = {len(scores_2_noscorecut)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

ax = axs[1]
ax.set_title(r"$^7C_6$ Combinations with Pairing", pad=10)

formatter = ticker.ScalarFormatter(useMathText=True)
formatter.set_scientific(True) 
formatter.set_powerlimits((-1,1)) 
ax.yaxis.set_major_formatter(formatter) 

n_2_all, b_2_all = np.histogram(scores_2_noscorecut, bins=100)
c_2_all, b_2_all = np.histogram(scores_2_noscorecut[combos7.pair_target > -1], bins=100)
w_2_all, b_2_all = np.histogram(scores_2_noscorecut[combos7.pair_target == -1], bins=100)

x_2_all = (b_2_all[1:] + b_2_all[:-1]) / 2

# hist(ax, x_2_all, weights=n_2_all, bins=b_2_all, label='All 6-jet combos')
hist(ax, x_2_all, weights=c_2_all, bins=b_2_all, label='Correct pairs')
hist(ax, x_2_all, weights=w_2_all, bins=b_2_all, label='Incorrect pairs')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('Number of Combinations Per Bin')

textstr = f'Entries = {len(scores_2_noscorecut)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

ax.text(0.3, 0.25, f'Ratio correct/total above 0.8 = {100*sum(c_2_all[x_2_all > 0.8])/sum(n_2_all[x_2_all > 0.8]):.0f}%', transform=ax.transAxes)
# ax.ticklabel_format(useMathText=True, useOffset=True)

plt.tight_layout()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [52]:
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(10,3))

ax = axs[0]
ax.set_title(r"$^7C_6$ Combinations with Pairing, Normalized")

bins = np.linspace(0,1.01,100)

scores_2 = combos7.scores_pairs[:,0]


n_7, b_7, x_7 = norm_hist(scores_2)
c_n_7, b_7, x_7 = norm_hist(scores_2[combos7.pair_target > -1])
w_n_7, b_7, x_7 = norm_hist(scores_2[combos7.pair_target == -1])

# hist(ax, x_7, weights=n_7, bins=b_7, label='All pairs')
hist(ax, x_7, weights=c_n_7, bins=b_7, label='Correct pairs')
hist(ax, x_7, weights=w_n_7, bins=b_7, label='Incorrect pairs')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('AU')

textstr = f'Entries = {len(scores_2)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

ax = axs[1]
ax.set_title(r"$^7C_6$ Combinations with Pairing")

n_7, b_7 = np.histogram(scores_2, bins=100)
c_n_7, b_7 = np.histogram(scores_2[combos7.pair_target > -1], bins=100)
w_n_7, b_7 = np.histogram(scores_2[combos7.pair_target == -1], bins=100)

x_7 = (b_7[1:] + b_7[:-1]) / 2

# hist(ax, x_7, weights=n_7, bins=b_7, label='All 6-jet combos')
hist(ax, x_7, weights=c_n_7, bins=b_7, label='Correct pairs')
hist(ax, x_7, weights=w_n_7, bins=b_7, label='Incorrect pairs')
ax.legend(fontsize='small', loc=9)

ax.set_xlabel('Assigned Score')
ax.set_ylabel('Number of Combinations Per Bin')

textstr = f'Entries = {len(scores_2)}'
props = dict(boxstyle='round', facecolor='white', alpha=1)
ax.text(0.8, 1.02, textstr, transform=ax.transAxes, fontsize=9,
        verticalalignment='top', bbox=props)

ax.text(0.3, 50000, f'Ratio correct/total above 0.8 = {100*sum(c_n_7[x_7 > 0.8])/sum(n_7[x_7 > 0.8]):.0f}%')

plt.tight_layout()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …