# Data 2017 Tuning Analysis (v10)

This notebook is dedicated to evaluated the training for the v10 tuning using data 2017 samples setted to have the same detection as the cut based in the same sample. Let's check:

- Best tuning configuration;
- Tuning efficiencies using the best configuration;
- Training plots;
- Roc curve;
- And the best tuning configuration;

**NOTE**: The input files is storage in: `/Volumes/castor/cern_data`

**NOTE**: All output files must be storage in: `/Volumes/castor/tuning_data/Zee/v8`

**NOTE**: The `cutbased`,`v7` and `v8` tuning will be used to compare the data versus monte carlo ringer tunings.

## Import all necessary packages:

In [1]:
from saphyra.analysis import crossval_table
from Gaugi import load
import os, re, sys, glob
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import collections
%matplotlib inline
%config InlineBackend.figure_format='retina'

Welcome to JupyROOT 6.16/00
Using all sub packages with ROOT dependence


## Create the tuning file dict:

In [2]:
def create_op_dict(op):
    d = {
              op+'_pd_ref'    : "reference/"+op+"_cutbased/pd_ref#0",
              op+'_fa_ref'    : "reference/"+op+"_cutbased/fa_ref#0",
              op+'_sp_ref'    : "reference/"+op+"_cutbased/sp_ref",
              op+'_pd_val'    : "reference/"+op+"_cutbased/pd_val#0",
              op+'_fa_val'    : "reference/"+op+"_cutbased/fa_val#0",
              op+'_sp_val'    : "reference/"+op+"_cutbased/sp_val",
              op+'_pd_op'     : "reference/"+op+"_cutbased/pd_op#0",
              op+'_fa_op'     : "reference/"+op+"_cutbased/fa_op#0",
              op+'_sp_op'     : "reference/"+op+"_cutbased/sp_op",
            
              # Counts
              op+'_pd_ref_passed'    : "reference/"+op+"_cutbased/pd_ref#1",
              op+'_fa_ref_passed'    : "reference/"+op+"_cutbased/fa_ref#1",
              op+'_pd_ref_total'     : "reference/"+op+"_cutbased/pd_ref#2",
              op+'_fa_ref_total'     : "reference/"+op+"_cutbased/fa_ref#2",   
              op+'_pd_val_passed'    : "reference/"+op+"_cutbased/pd_val#1",
              op+'_fa_val_passed'    : "reference/"+op+"_cutbased/fa_val#1",
              op+'_pd_val_total'     : "reference/"+op+"_cutbased/pd_val#2",
              op+'_fa_val_total'     : "reference/"+op+"_cutbased/fa_val#2",  
              op+'_pd_op_passed'     : "reference/"+op+"_cutbased/pd_op#1",
              op+'_fa_op_passed'     : "reference/"+op+"_cutbased/fa_op#1",
              op+'_pd_op_total'      : "reference/"+op+"_cutbased/pd_op#2",
              op+'_fa_op_total'      : "reference/"+op+"_cutbased/fa_op#2",
    } 
    return d

tuned_info = collections.OrderedDict( {
              # validation
              "max_sp_val"      : 'summary/max_sp_val',
              "max_sp_pd_val"   : 'summary/max_sp_pd_val#0',
              "max_sp_fa_val"   : 'summary/max_sp_fa_val#0',
              # Operation
              "max_sp_op"       : 'summary/max_sp_op',
              "max_sp_pd_op"    : 'summary/max_sp_pd_op#0',
              "max_sp_fa_op"    : 'summary/max_sp_fa_op#0',
    
              #"loss"            : 'loss',
              #"val_loss"        : 'val_loss',
              #"accuracy"        : 'accuracy',
              #"val_accuracy"    : 'val_accuracy',
              #"max_sp_best_epoch_val": 'max_sp_best_epoch_val',
              } )

tuned_info.update(create_op_dict('tight'))
tuned_info.update(create_op_dict('medium'))
tuned_info.update(create_op_dict('loose'))
tuned_info.update(create_op_dict('vloose'))

## Open all tuning files:

In [3]:
cv_v7 = crossval_table( tuned_info )
cv_v8 = crossval_table( tuned_info )
cv_v10 = crossval_table( tuned_info )


cv_v7.fill( '../tuning_data/Zee/v7/*/*/*.pic.gz', 'v7')
cv_v8.fill( '../tuning_data/Zee/v8/*/*/*.pic.gz', 'v8')
cv_v10.fill( '../tuning_data/Zee/v10/*/*/*.2.pic.gz', 'v10')
#!ls ../tuning_data/*

I0730 01:22:07.161144 140735835288448 macros.py:23] Reading file for v7 tag from ../tuning_data/Zee/v7/*/*/*.pic.gz
I0730 01:22:07.162093 140735835288448 macros.py:23] There are 1000 files for this task...
I0730 01:22:07.162756 140735835288448 macros.py:23] Filling the table... 
I0730 01:22:23.825721 140735835288448 macros.py:23] End of fill step, a pandas DataFrame was created...
I0730 01:22:24.118874 140735835288448 macros.py:23] Reading file for v8 tag from ../tuning_data/Zee/v8/*/*/*.pic.gz
I0730 01:22:24.119894 140735835288448 macros.py:23] There are 1000 files for this task...
I0730 01:22:24.120582 140735835288448 macros.py:23] Filling the table... 
I0730 01:22:43.972186 140735835288448 macros.py:23] End of fill step, a pandas DataFrame was created...
I0730 01:22:44.143077 140735835288448 macros.py:23] Reading file for v10 tag from ../tuning_data/Zee/v10/*/*/*.2.pic.gz
I0730 01:22:44.143923 140735835288448 macros.py:23] There are 500 files for this task...
I0730 01:22:44.144577 1

2020-07-30 01:22:07,161 | Py.crossval_table                       INFO Reading file for v7 tag from ../tuning_data/Zee/v7/*/*/*.pic.gz
2020-07-30 01:22:07,162 | Py.crossval_table                       INFO There are 1000 files for this task...
2020-07-30 01:22:07,162 | Py.crossval_table                       INFO Filling the table... 
2020-07-30 01:22:23,825 | Py.crossval_table                       INFO End of fill step, a pandas DataFrame was created...
2020-07-30 01:22:24,118 | Py.crossval_table                       INFO Reading file for v8 tag from ../tuning_data/Zee/v8/*/*/*.pic.gz
2020-07-30 01:22:24,119 | Py.crossval_table                       INFO There are 1000 files for this task...
2020-07-30 01:22:24,120 | Py.crossval_table                       INFO Filling the table... 
2020-07-30 01:22:43,972 | Py.crossval_table                       INFO End of fill step, a pandas DataFrame was created...
2020-07-30 01:22:44,143 | Py.crossval_table                       INFO Reading f

Let's keep only the best inits for each sort, configuration and eta/phi bin. To calculate this we must choose an evaluation method to keep the max value in each configuration. Here, we will get the best inists looking for the max SP value for each one.

In [4]:

best_inits_v7 = cv_v7.filter_inits("max_sp_val")
best_inits_v8 = cv_v8.filter_inits("max_sp_val")
best_inits_v10 = cv_v10.filter_inits("max_sp_val")

print(best_inits_v7.shape)
print(best_inits_v8.shape)
print(best_inits_v10.shape)

(2250, 98)
(2250, 98)
(250, 98)


## Fix the reference counts:

In [5]:
def fix_counts(table, op):
    table['%s_fa_ref_total'%op]  = table['%s_fa_op_total'%op]
    table['%s_pd_ref_total'%op]  = table['%s_pd_op_total'%op]
    table['%s_fa_ref_passed'%op] = table['%s_fa_ref'%op] * table['%s_fa_op_total'%op]
    table['%s_pd_ref_passed'%op] = table['%s_pd_ref'%op] * table['%s_pd_op_total'%op]
    
for op in ['tight','medium','loose','vloose']:
    fix_counts(best_inits_v7, op)
    fix_counts(best_inits_v8, op)
    fix_counts(best_inits_v10, op)




In [6]:
best_inits_v10.head()

Unnamed: 0,train_tag,et_bin,eta_bin,model_idx,sort,init,file_name,tuned_idx,max_sp_val,max_sp_pd_val,...,vloose_pd_ref_total,vloose_fa_ref_total,vloose_pd_val_passed,vloose_fa_val_passed,vloose_pd_val_total,vloose_fa_val_total,vloose_pd_op_passed,vloose_fa_op_passed,vloose_pd_op_total,vloose_fa_op_total
7,v10,0,0,0,0,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.980407,0.985503,...,226243,187639,22365,543,22625,18764,223657,5038,226243,187639
17,v10,0,0,0,1,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.979003,0.983116,...,226243,187639,22366,619,22625,18764,223652,5299,226243,187639
2,v10,0,0,0,2,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.980362,0.984663,...,226243,187639,22365,536,22625,18763,223652,4861,226243,187639
15,v10,0,0,0,3,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.979431,0.987138,...,226243,187639,22365,575,22624,18764,223654,4999,226243,187639
19,v10,0,0,0,4,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.980974,0.986961,...,226243,187639,22365,514,22624,18764,223651,5090,226243,187639


## Plot the evolution for each configuration

Here, each configurations means the number of neurons in the hidden layer. The tuning proceeding follow this topology:

- First layer with 100 inputs;
- Second layer with 2 to 10 neurons (all with hyperbolic tangent);
- One output with hyperbolic tangent.

Let's check the SP, Fake and detection evolution for each layer configuration calculating the mean and std values for all 10 best inits (1 init per sort, 10 sorts, 10 values per configuration).

In [7]:

def plot_evolution( t , et_bin, eta_bin , output, display=False):

    sp_val_mean = []; sp_val_std = []
    for model_idx in t.model_idx.unique():
        table = t.loc[ (t.model_idx==model_idx) & (t.et_bin==et_bin) & (t.eta_bin==eta_bin)]    
        sp_val_mean.append( table['max_sp_val'].mean() * 100)
        sp_val_std.append( table['max_sp_val'].std() * 100)
    neurons = [ i+2 for i in range(len(sp_val_mean))]
    fig, ax = plt.subplots(1,1, figsize=(15,5))
    plt.errorbar(neurons, sp_val_mean, yerr=sp_val_std, label='Max SP (validation)')
    ax.set(xlabel='#neurons', ylabel='SP',
       title='Max SP in the validation set (et=%d, eta=%d)'%(et_bin,eta_bin))
    ax.grid()
    print('saving %s...'%output)
    plt.savefig(output)
    if display:
        plt.show()      
    else:
        plt.close(fig)

        
for et_bin in best_inits_v10.et_bin.unique():
    for eta_bin in best_inits_v10.eta_bin.unique():
        plot_evolution(best_inits_v10, et_bin, eta_bin, 'sp_evolution_per_config_et%d_eta%d.pdf'%(et_bin,eta_bin))

saving sp_evolution_per_config_et0_eta0.pdf...
saving sp_evolution_per_config_et0_eta1.pdf...
saving sp_evolution_per_config_et0_eta2.pdf...
saving sp_evolution_per_config_et0_eta3.pdf...
saving sp_evolution_per_config_et0_eta4.pdf...
saving sp_evolution_per_config_et1_eta0.pdf...
saving sp_evolution_per_config_et1_eta1.pdf...
saving sp_evolution_per_config_et1_eta2.pdf...
saving sp_evolution_per_config_et1_eta3.pdf...
saving sp_evolution_per_config_et1_eta4.pdf...
saving sp_evolution_per_config_et2_eta0.pdf...
saving sp_evolution_per_config_et2_eta1.pdf...
saving sp_evolution_per_config_et2_eta2.pdf...
saving sp_evolution_per_config_et2_eta3.pdf...
saving sp_evolution_per_config_et2_eta4.pdf...
saving sp_evolution_per_config_et3_eta0.pdf...
saving sp_evolution_per_config_et3_eta1.pdf...
saving sp_evolution_per_config_et3_eta2.pdf...
saving sp_evolution_per_config_et3_eta3.pdf...
saving sp_evolution_per_config_et3_eta4.pdf...
saving sp_evolution_per_config_et4_eta0.pdf...
saving sp_evo

For simplicity, let's keep the five neurons (model id = 3) in the hidden layer for all phase spaces.

## Calculate the cross validation table for all phase spaces

In [8]:
best_inits_v7 = best_inits_v7.loc[(best_inits_v7.model_idx==3)]
best_inits_v8 = best_inits_v8.loc[(best_inits_v8.model_idx==3)]
best_inits_v10 = best_inits_v10.loc[(best_inits_v10.model_idx==0)]


print(best_inits_v7.shape)
print(best_inits_v8.shape)
print(best_inits_v10.shape)

best_inits = pd.concat([best_inits_v7, best_inits_v8, best_inits_v10])
best_inits.head()

(250, 98)
(250, 98)
(250, 98)


Unnamed: 0,train_tag,et_bin,eta_bin,model_idx,sort,init,file_name,tuned_idx,max_sp_val,max_sp_pd_val,...,vloose_pd_ref_total,vloose_fa_ref_total,vloose_pd_val_passed,vloose_fa_val_passed,vloose_pd_val_total,vloose_fa_val_total,vloose_pd_op_passed,vloose_fa_op_passed,vloose_pd_op_total,vloose_fa_op_total
162,v7,0,0,3,0,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,3,0.978133,0.987805,...,4916,332468,486,1047,492,33247,4860,14381,4916,332468
79,v7,0,0,3,1,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,3,0.978463,0.98374,...,4916,332468,486,1572,492,33247,4860,14867,4916,332468
125,v7,0,0,3,2,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,3,0.97549,0.98374,...,4916,332468,486,1683,492,33247,4860,15954,4916,332468
148,v7,0,0,3,3,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,3,0.971869,0.973577,...,4916,332468,486,2315,492,33247,4860,14499,4916,332468
111,v7,0,0,3,4,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,3,0.977755,0.993902,...,4916,332468,486,1209,492,33246,4860,14605,4916,332468


In [9]:
# 25 bins X 10 sorts = 250 rows X 2 (v7 and v8)
best_inits.shape

(750, 98)

In [10]:
cv_v10.describe(best_inits)

Unnamed: 0,train_tag,et_bin,eta_bin,max_sp_val_mean,max_sp_val_std,max_sp_pd_val_mean,max_sp_pd_val_std,max_sp_fa_val_mean,max_sp_fa_val_std,max_sp_op_mean,...,vloose_pd_ref_total,vloose_fa_ref_total,vloose_pd_val_total_mean,vloose_pd_val_total_std,vloose_fa_val_total_mean,vloose_fa_val_total_std,vloose_pd_op_total_mean,vloose_pd_op_total_std,vloose_fa_op_total_mean,vloose_fa_op_total_std
0,v7,0,0,0.977106,0.003830,0.986779,0.006862,0.032511,0.004407,0.974279,...,4916,332468,491.6,0.516398,33246.8,0.421637,4916.0,0.0,332468.0,0.0
1,v7,0,1,0.967836,0.004014,0.980051,0.007838,0.044288,0.006140,0.963947,...,3008,253818,300.8,0.421637,25381.8,0.421637,3008.0,0.0,253818.0,0.0
2,v7,0,2,0.948442,0.008567,0.967991,0.016779,0.070815,0.016083,0.938309,...,1155,48521,115.5,0.527046,4852.1,0.316228,1155.0,0.0,48521.0,0.0
3,v7,0,3,0.962749,0.003036,0.976954,0.009870,0.051316,0.008947,0.959627,...,4555,331654,455.5,0.527046,33165.4,0.516398,4555.0,0.0,331654.0,0.0
4,v7,0,4,0.967934,0.009243,0.987278,0.014849,0.051154,0.015170,0.954410,...,472,40642,47.2,0.421637,4064.2,0.421637,472.0,0.0,40642.0,0.0
5,v7,1,0,0.984498,0.001172,0.989059,0.002560,0.020052,0.001328,0.984097,...,32722,168165,3272.2,0.421637,16816.5,0.527046,32722.0,0.0,168165.0,0.0
6,v7,1,1,0.978056,0.001465,0.986244,0.004452,0.030091,0.003898,0.977335,...,19264,129473,1926.4,0.516398,12947.3,0.483046,19264.0,0.0,129473.0,0.0
7,v7,1,2,0.959119,0.002546,0.976630,0.009583,0.058194,0.009761,0.956317,...,6033,21806,603.3,0.483046,2180.6,0.516398,6033.0,0.0,21806.0,0.0
8,v7,1,3,0.972069,0.001092,0.983663,0.003394,0.039452,0.003209,0.971316,...,22526,158040,2252.6,0.516398,15804.0,0.000000,22526.0,0.0,158040.0,0.0
9,v7,1,4,0.962842,0.004688,0.982429,0.008657,0.056515,0.010450,0.956809,...,2162,19039,216.2,0.421637,1903.9,0.316228,2162.0,0.0,19039.0,0.0


## Create the beamer table

In [11]:
# Create beamer presentation
for op in ['tight','medium','loose','vloose']:
    cv_v10.dump_beamer_table( best_inits ,  [15,20,25,30,40,50], 
                     [0, 0.8 , 1.37, 1.54, 2.37, 2.5], 
                     [op],
                     'tuning_v10_'+op,
                     title = op+' Tunings (v10)',
                     tags = ['v7', 'v8','v10'],
                    )


I0730 01:23:08.021392 140735835288448 BeamerAPI.py:475] Started creating beamer file tuning_v10_tight.pdf latex code...
I0730 01:23:15.653999 140735835288448 BeamerAPI.py:475] Started creating beamer file tuning_v10_medium.pdf latex code...
I0730 01:23:22.596184 140735835288448 BeamerAPI.py:475] Started creating beamer file tuning_v10_loose.pdf latex code...
I0730 01:23:29.399542 140735835288448 BeamerAPI.py:475] Started creating beamer file tuning_v10_vloose.pdf latex code...


2020-07-30 01:23:08,021 | Py.BeamerTexReportTemplate1             INFO Started creating beamer file tuning_v10_tight.pdf latex code...
2020-07-30 01:23:15,653 | Py.BeamerTexReportTemplate1             INFO Started creating beamer file tuning_v10_medium.pdf latex code...
2020-07-30 01:23:22,596 | Py.BeamerTexReportTemplate1             INFO Started creating beamer file tuning_v10_loose.pdf latex code...
2020-07-30 01:23:29,399 | Py.BeamerTexReportTemplate1             INFO Started creating beamer file tuning_v10_vloose.pdf latex code...


## Plot all Training for each sort

In [12]:

def plot_training_curves_for_each_sort(table, et_bin, eta_bin, output, display=False):
    
    table = table.loc[(table.et_bin==et_bin) & (table.eta_bin==eta_bin)] 
    nsorts = len(table.sort.unique())
    fig, ax = plt.subplots(nsorts,2, figsize=(15,20))
    fig.suptitle(r'Monitoring Train Plot - Et = %d, Eta = %d'%(et_bin,eta_bin), fontsize=15)
    for idx, sort in enumerate(table.sort.unique()):
        current_table = table.loc[table.sort==sort]
        path=current_table.file_name.values[0]
        history = load(path)['tunedData'][current_table.model_idx.values[0]]['history']
        best_epoch = history['max_sp_best_epoch_val'][-1]
        # Make the plot here
        ax[idx, 0].set_xlabel('Epochs')
        ax[idx, 0].set_ylabel('Loss')
        ax[idx, 0].plot(history['loss'], c='b', label='Train Step')
        ax[idx, 0].plot(history['val_loss'], c='r', label='Validation Step') 
        ax[idx, 0].axvline(x=best_epoch, c='k', label='Best epoch')
        ax[idx, 0].legend()
        ax[idx, 0].grid()
        ax[idx, 1].set_xlabel('Epochs')
        ax[idx, 1].set_ylabel('SP')
        ax[idx, 1].plot(history['max_sp_val'], c='r', label='Validation Step') 
        ax[idx, 1].axvline(x=best_epoch, c='k', label='Best epoch')
        ax[idx, 1].legend()
        ax[idx, 1].grid()
    
    print(output)
    plt.savefig(output)
    if display:
        plt.show()
    else:
        plt.close(fig)
    
for et_bin in best_inits.et_bin.unique():
    for eta_bin in best_inits.eta_bin.unique():
        plot_training_curves_for_each_sort(best_inits.loc[best_inits.train_tag=='v10'],et_bin, eta_bin, 'train_evolution_best_config_et%d_eta%d.pdf'%(et_bin,eta_bin))

train_evolution_best_config_et0_eta0.pdf
train_evolution_best_config_et0_eta1.pdf
train_evolution_best_config_et0_eta2.pdf
train_evolution_best_config_et0_eta3.pdf
train_evolution_best_config_et0_eta4.pdf
train_evolution_best_config_et1_eta0.pdf
train_evolution_best_config_et1_eta1.pdf
train_evolution_best_config_et1_eta2.pdf
train_evolution_best_config_et1_eta3.pdf
train_evolution_best_config_et1_eta4.pdf
train_evolution_best_config_et2_eta0.pdf
train_evolution_best_config_et2_eta1.pdf
train_evolution_best_config_et2_eta2.pdf
train_evolution_best_config_et2_eta3.pdf
train_evolution_best_config_et2_eta4.pdf
train_evolution_best_config_et3_eta0.pdf
train_evolution_best_config_et3_eta1.pdf
train_evolution_best_config_et3_eta2.pdf
train_evolution_best_config_et3_eta3.pdf
train_evolution_best_config_et3_eta4.pdf
train_evolution_best_config_et4_eta0.pdf
train_evolution_best_config_et4_eta1.pdf
train_evolution_best_config_et4_eta2.pdf
train_evolution_best_config_et4_eta3.pdf
train_evolution_

## Check the integrate value:

Here we will check the integrated value:

In [13]:
cv_v10.integrate(best_inits, 'v10') * 100

Unnamed: 0,tight_pd_ref,tight_fa_ref,tight_pd_val,tight_fa_val,tight_pd_op,tight_fa_op,medium_pd_ref,medium_fa_ref,medium_pd_val,medium_fa_val,...,loose_pd_val,loose_fa_val,loose_pd_op,loose_fa_op,vloose_pd_ref,vloose_fa_ref,vloose_pd_val,vloose_fa_val,vloose_pd_op,vloose_fa_op
mean,99.04946,25.621525,99.048973,0.636078,99.049485,0.570007,99.185367,29.22653,99.184091,0.690721,...,99.622392,1.167215,99.621915,1.014587,99.666488,78.680117,99.66615,1.296809,99.666455,1.09269
std,0.0,0.0,0.003152,0.011599,0.000196,0.006661,0.0,5.851389e-15,0.002094,0.0104,...,0.002829,0.028836,0.000129,0.013568,0.0,0.0,0.002425,0.059423,0.00023,0.015164


## Get the Best sort for all phase spaces:


Here we will access the best sort for each phase space. We can use this information to export each tuning in each phase space to a unique file that will be used in the pileup adjustiment step (using prometheus framework). We will get the best sort with the highest max SP value for each phase space always.

In [14]:
best_sorts = cv_v10.filter_sorts(best_inits_v10, 'max_sp_val')
print(best_sorts.shape)
best_sorts.head(25)

(25, 98)


Unnamed: 0,train_tag,et_bin,eta_bin,model_idx,sort,init,file_name,tuned_idx,max_sp_val,max_sp_pd_val,...,vloose_pd_ref_total,vloose_fa_ref_total,vloose_pd_val_passed,vloose_fa_val_passed,vloose_pd_val_total,vloose_fa_val_total,vloose_pd_op_passed,vloose_fa_op_passed,vloose_pd_op_total,vloose_fa_op_total
19,v10,0,0,0,4,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.980974,0.986961,...,226243,187639,22365,514,22624,18764,223651,5090,226243,187639
32,v10,0,1,0,9,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.972888,0.981804,...,136848,143657,13518,656,13684,14366,135180,5757,136848,143657
42,v10,0,2,0,2,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.949986,0.959608,...,50009,30037,4881,343,5001,3004,48805,2937,50009,30037
66,v10,0,3,0,3,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.973435,0.984837,...,220933,205792,21711,740,22093,20580,217078,7092,220933,205792
82,v10,0,4,0,2,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.95254,0.958614,...,19330,15512,1812,67,1933,1551,18083,734,19330,15512
106,v10,1,0,0,3,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.988932,0.993384,...,1457047,316581,144971,573,145705,31658,1449715,5801,1457047,316581
135,v10,1,1,0,3,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.982953,0.988715,...,871041,227709,86774,870,87104,22771,867739,8338,871041,227709
145,v10,1,2,0,8,1,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.973211,0.978613,...,275404,47550,26669,125,27540,4755,266304,1307,275404,47550
177,v10,1,3,0,1,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.981608,0.989645,...,1099898,307253,109515,1159,109990,30725,1095157,11329,1099898,307253
190,v10,1,4,0,7,0,/Users/jodafons/Desktop/phd_local/prometheus/a...,0,0.972431,0.984919,...,81560,36522,7817,105,8156,3652,78081,1087,81560,36522


## Dump all best tunings:

Here we will dump all 20 configuration into a json file. We use the same neural netowrks for each operation point. Only the thresholds are variated. The threshold correction (alpha and beta) and files will be calculated/generated using the pileup correction tool.

In [15]:

def get_best_models( best_sorts , etbins, etabins):

    from saphyra.layers.RpLayer import RpLayer
    # Just to remove the keras dependence
    import tensorflow as tf
    model_from_json = tf.keras.models.model_from_json
    import json
    
    models = []

    for et_bin in range(len(etbins)-1):
        for eta_bin in range(len(etabins)-1):
            d_tuned = {}
            best = best_sorts.loc[(best_sorts.et_bin==et_bin) & (best_sorts.eta_bin==eta_bin)]
            best.head()
            tuned = load(best.file_name.values[0])['tunedData'][best.model_idx.values[0]]
            model = model_from_json( json.dumps(tuned['sequence'], separators=(',', ':')) , 
                                 custom_objects={'RpLayer':RpLayer} )
            model.set_weights( tuned['weights'] )
            
            model._layers.pop()
            model.summary()
            
            
            d_tuned['model']    = model
            d_tuned['etBin']    = [etbins[et_bin], etbins[et_bin+1]]
            d_tuned['etaBin']   = [etabins[eta_bin], etabins[eta_bin+1]]
            d_tuned['etBinIdx'] = et_bin
            d_tuned['etaBinIdx']= eta_bin
            models.append(d_tuned)
    return models
            
        
    
etbins = [15,20,30,40,50,100000]
etabins = [0, 0.8 , 1.37, 1.54, 2.37, 2.5]


models = get_best_models( best_sorts, etbins, etabins)

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Input (InputLayer)           [(None, 100)]             0         
_________________________________________________________________
Reshape_layer (Reshape)      (None, 100, 1)            0         
_________________________________________________________________
conv1d_layer_1 (Conv1D)      (None, 99, 16)            48        
_________________________________________________________________
conv1d_layer_2 (Conv1D)      (None, 98, 32)            1056      
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense_layer (Dense)          (None, 32)                100384    
_________________________________________________________________
output_for_inference (Dense) (None, 1)                 33    

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Input (InputLayer)           [(None, 100)]             0         
_________________________________________________________________
Reshape_layer (Reshape)      (None, 100, 1)            0         
_________________________________________________________________
conv1d_layer_1 (Conv1D)      (None, 99, 16)            48        
_________________________________________________________________
conv1d_layer_2 (Conv1D)      (None, 98, 32)            1056      
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense_layer (Dense)          (None, 32)                100384    
_________________________________________________________________
output_for_inference (Dense) (None, 1)                 33    

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Input (InputLayer)           [(None, 100)]             0         
_________________________________________________________________
Reshape_layer (Reshape)      (None, 100, 1)            0         
_________________________________________________________________
conv1d_layer_1 (Conv1D)      (None, 99, 16)            48        
_________________________________________________________________
conv1d_layer_2 (Conv1D)      (None, 98, 32)            1056      
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense_layer (Dense)          (None, 32)                100384    
_________________________________________________________________
output_for_inference (Dense) (None, 1)                 33    

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Input (InputLayer)           [(None, 100)]             0         
_________________________________________________________________
Reshape_layer (Reshape)      (None, 100, 1)            0         
_________________________________________________________________
conv1d_layer_1 (Conv1D)      (None, 99, 16)            48        
_________________________________________________________________
conv1d_layer_2 (Conv1D)      (None, 98, 32)            1056      
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense_layer (Dense)          (None, 32)                100384    
_________________________________________________________________
output_for_inference (Dense) (None, 1)                 33    

## Dump to onnx format:

We will dump all best models using dummy thresholds. This threshold should be rewrited using the pileup correction tool provided by the prometheus framework.

In [16]:
def convert_to_onnx_with_dummy_thresholds( models, name, version, signature, model_output_format , operation, output):

    import onnx
    import keras2onnx
    from ROOT import TEnv
    model_etmin_vec = []
    model_etmax_vec = []
    model_etamin_vec = []
    model_etamax_vec = []
    model_paths = []

    slopes = []
    offsets = []

    for model in models:

        model_etmin_vec.append( model['etBin'][0] )
        model_etmax_vec.append( model['etBin'][1] )
        model_etamin_vec.append( model['etaBin'][0] )
        model_etamax_vec.append( model['etaBin'][1] )

        etBinIdx = model['etBinIdx']
        etaBinIdx = model['etaBinIdx']

        # Conver keras to Onnx
        onnx_model = keras2onnx.convert_keras(model['model'], model['model'].name)

        onnx_model_name = model_output_format%( etBinIdx, etaBinIdx )
        model_paths.append( onnx_model_name )

        # Save onnx mode!
        onnx.save_model(onnx_model, onnx_model_name)

        slopes.append( 0.0 )
        offsets.append( 0.0 )


    def list_to_str( l ):
        s = str()
        for ll in l:
          s+=str(ll)+'; '
        return s[:-2]

    # Write the config file
    file = TEnv( 'ringer' )
    file.SetValue( "__name__", name )
    file.SetValue( "__version__", version )
    file.SetValue( "__operation__", operation )
    file.SetValue( "__signature__", signature )
    file.SetValue( "Model__size"  , str(len(models)) )
    file.SetValue( "Model__etmin" , list_to_str(model_etmin_vec) )
    file.SetValue( "Model__etmax" , list_to_str(model_etmax_vec) )
    file.SetValue( "Model__etamin", list_to_str(model_etamin_vec) )
    file.SetValue( "Model__etamax", list_to_str(model_etamax_vec) )
    file.SetValue( "Model__path"  , list_to_str( model_paths ) )
    file.SetValue( "Threshold__size"  , str(len(models)) )
    file.SetValue( "Threshold__etmin" , list_to_str(model_etmin_vec) )
    file.SetValue( "Threshold__etmax" , list_to_str(model_etmax_vec) )
    file.SetValue( "Threshold__etamin", list_to_str(model_etamin_vec) )
    file.SetValue( "Threshold__etamax", list_to_str(model_etamax_vec) )
    file.SetValue( "Threshold__slope" , list_to_str(slopes) )
    file.SetValue( "Threshold__offset", list_to_str(offsets) )
    file.SetValue( "Threshold__MaxAverageMu", 100)
    file.WriteFile(output)


    
for op in ['Tight','Medium','Loose','VeryLoose']:

    format = 'data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v10.electron'+op+'.et%d_eta%d.onnx'
    output = "ElectronRinger%sTriggerConfig.conf"%op
    convert_to_onnx_with_dummy_thresholds( models, 'TrigL2_20200715_v10', 'v10', 'electron', format ,op ,output)


tf executing eager_mode: True
I0730 01:24:34.405292 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:34.407180 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:34.456341 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:34.461817 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:34.469162 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:34.471841 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:34.519776 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:34.525022 140735835288448 onnx_ex.py:64] The maximum opset needed by this 

tf.keras model eager_mode: False
I0730 01:24:35.423253 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:35.470265 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:35.475558 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:35.481271 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:35.483763 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:35.534593 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:35.541846 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:35.547081 140735835288448 main.py:44] tf executing

The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:36.524902 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:36.530216 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:36.536679 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:36.538967 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:36.586142 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:36.591141 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:36.598196 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:36.600794 140735835288448 main.py:46] tf.keras model 

I0730 01:24:37.554342 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:37.560117 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:37.565913 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:37.568403 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:37.616773 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:37.622522 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:37.630242 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:37.632164 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optim

W0730 01:24:38.601199 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:38.607749 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:38.610275 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:38.659491 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:38.665402 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:38.671103 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:38.673591 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:38.724418 140735835288448 topology.py:348] The ONNX operator number change on the optim

tf executing eager_mode: True
I0730 01:24:39.630661 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:39.632922 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:39.679224 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:39.685298 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:39.691112 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:39.693562 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:39.742051 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:39.747051 140735835288448 onnx_ex.py:64] The maximum opset needed by this 

tf.keras model eager_mode: False
I0730 01:24:40.661092 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:40.710628 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:40.716466 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:40.722604 140735835288448 main.py:44] tf executing eager_mode: True
tf.keras model eager_mode: False
I0730 01:24:40.724778 140735835288448 main.py:46] tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 22 -> 13
I0730 01:24:40.775504 140735835288448 topology.py:348] The ONNX operator number change on the optimization: 22 -> 13
W0730 01:24:40.781157 140735835288448 onnx_ex.py:64] The maximum opset needed by this model is only 11.
tf executing eager_mode: True
I0730 01:24:40.787529 140735835288448 main.py:44] tf executing