# Set references for v9 tuning and export to ONNX

This notebook is dedicated to apply the linear correction in the neural network output w.r.t the avgmu and export the v9 best models to ONNX/keras format. Usually, keras versions is used into the prometheus framework. The ONNX version will be used into the athena framework.

**NOTE**: ONNX is a Microsoft API for inference.

**NOTE**: We will export all tunings from v9 r0 derivation

In [1]:
from saphyra import crossval_table, get_color_fader
import saphyra
import numpy as np
import pandas as pd
import collections
import os
import matplotlib
import matplotlib.pyplot as plt
from pprint import pprint
%config InlineBackend.figure_format = 'retina'

Welcome to JupyROOT 6.23/01
Using all sub packages with ROOT dependence


In [2]:
def create_op_dict(op):
    d = {
              op+'_pd_ref'    : "reference/"+op+"_cutbased/pd_ref#0",
              op+'_fa_ref'    : "reference/"+op+"_cutbased/fa_ref#0",
              op+'_sp_ref'    : "reference/"+op+"_cutbased/sp_ref",
              op+'_pd_val'    : "reference/"+op+"_cutbased/pd_val#0",
              op+'_fa_val'    : "reference/"+op+"_cutbased/fa_val#0",
              op+'_sp_val'    : "reference/"+op+"_cutbased/sp_val",
              op+'_pd_op'     : "reference/"+op+"_cutbased/pd_op#0",
              op+'_fa_op'     : "reference/"+op+"_cutbased/fa_op#0",
              op+'_sp_op'     : "reference/"+op+"_cutbased/sp_op",

              # Counts
              op+'_pd_ref_passed'    : "reference/"+op+"_cutbased/pd_ref#1",
              op+'_fa_ref_passed'    : "reference/"+op+"_cutbased/fa_ref#1",
              op+'_pd_ref_total'     : "reference/"+op+"_cutbased/pd_ref#2",
              op+'_fa_ref_total'     : "reference/"+op+"_cutbased/fa_ref#2",
              op+'_pd_val_passed'    : "reference/"+op+"_cutbased/pd_val#1",
              op+'_fa_val_passed'    : "reference/"+op+"_cutbased/fa_val#1",
              op+'_pd_val_total'     : "reference/"+op+"_cutbased/pd_val#2",
              op+'_fa_val_total'     : "reference/"+op+"_cutbased/fa_val#2",
              op+'_pd_op_passed'     : "reference/"+op+"_cutbased/pd_op#1",
              op+'_fa_op_passed'     : "reference/"+op+"_cutbased/fa_op#1",
              op+'_pd_op_total'      : "reference/"+op+"_cutbased/pd_op#2",
              op+'_fa_op_total'      : "reference/"+op+"_cutbased/fa_op#2",
    }
    return d

tuned_info = collections.OrderedDict( {
              # validation
              "max_sp_val"      : 'summary/max_sp_val',
              "max_sp_pd_val"   : 'summary/max_sp_pd_val#0',
              "max_sp_fa_val"   : 'summary/max_sp_fa_val#0',
              # Operation
              "max_sp_op"       : 'summary/max_sp_op',
              "max_sp_pd_op"    : 'summary/max_sp_pd_op#0',
              "max_sp_fa_op"    : 'summary/max_sp_fa_op#0',
              } )

tuned_info.update(create_op_dict('tight'))
tuned_info.update(create_op_dict('medium'))
tuned_info.update(create_op_dict('loose'))
tuned_info.update(create_op_dict('vloose'))

In [3]:
etbins = [15, 20, 30, 40, 50, 1000000]
etabins = [0.0, 0.8, 1.37, 1.54, 2.37, 2.50]

## 1) Reading all tunings:


In [4]:
cv  = crossval_table( tuned_info, etbins = etbins , etabins = etabins )

In [5]:
cv.fill(  '/Volumes/castor/tuning_data/Zee/v9/*.v9_et*.r0/*/*.gz', 'v9')

2020-12-19 18:28:34,202 | Py.crossval_table                       INFO Reading file for v9 tag from /Volumes/castor/tuning_data/Zee/v9/*.v9_et*.r0/*/*.gz
2020-12-19 18:28:34,202 | Py.crossval_table                       INFO There are 1250 files for this task...
2020-12-19 18:28:34,202 | Py.crossval_table                       INFO Filling the table... 
2020-12-19 18:29:08,874 | Py.crossval_table                       INFO End of fill step, a pandas DataFrame was created...


### 1.1) Get best inits and sorts:

In [6]:
best_inits = cv.filter_inits("max_sp_val")
best_sorts = cv.filter_sorts( best_inits , 'max_sp_op')

### 1.2) Get best models:

Get all best models for each bin. Expected to be 25 models.

In [7]:
best_models = cv.get_best_models(best_sorts, remove_last=True)

In [8]:
best_models[0][0]['model'].summary()

Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
Input_rings (InputLayer)        [(None, 100)]        0                                            
__________________________________________________________________________________________________
Input_showers (InputLayer)      [(None, 6)]          0                                            
__________________________________________________________________________________________________
dense_rings_layer (Dense)       (None, 5)            505         Input_rings[0][0]                
__________________________________________________________________________________________________
dense_shower_layer (Dense)      (None, 5)            35          Input_showers[0][0]              
_______________________________________________________________________________________

## 2) Linear correction:

Here we will set all thresholds to operate as the same pd reference from cut-based using the pileup linear correction strategy. As the classifier efficiency has some "dependence" w.r.t the pileup we adopt the linear adjustment to "fix" the trigger efficiency. Here we will "fix" the neural network threshold w.r.t the pileup. 

### 2.1) Get all PD/Fas values:

Read all reference values from the storage.

In [9]:
# calculate all pd/fa from reference file
ref_path = '/Volumes/castor/cern_data/files/Zee/data17_13TeV.AllPeriods.sgn.probes_lhmedium_EGAM1.bkg.VProbes_EGAM7.GRL_v97/references/data17_13TeV.AllPeriods.sgn.probes_lhmedium_EGAM1.bkg.VProbes_EGAM7.GRL_v97_et{ET}_eta{ETA}.ref.pic.gz'
ref_paths = [[ ref_path.format(ET=et,ETA=eta) for eta in range(5)] for et in range(5)]
ref_matrix = [[ {} for eta in range(5)] for et in range(5)]
references = ['tight_cutbased', 'medium_cutbased' , 'loose_cutbased', 'vloose_cutbased']
from saphyra.core import ReferenceReader
for et_bin in range(5):
    for eta_bin in range(5):
        for name in references:
            refObj = ReferenceReader().load(ref_paths[et_bin][eta_bin])
            pd = refObj.getSgnPassed(name)/refObj.getSgnTotal(name)
            fa = refObj.getBkgPassed(name)/refObj.getBkgTotal(name)
            ref_matrix[et_bin][eta_bin][name] = {'pd':pd, 'fa':fa}

### 2.2) Create data generator:

Since each tuning models is fed by a different data organization, we need to create a generator to open the data file, prepare the matrix and apply some pre-processing (if needed).

In [10]:
def generator( path ):
    def norm1( data ):
        norms = np.abs( data.sum(axis=1) )
        norms[norms==0] = 1
        return data/norms[:,None]
    from Gaugi import load
    d = load(path)
    feature_names = d['features'].tolist()

    # How many events?
    n = d['data'].shape[0]
    
    # extract rings
    data_rings = norm1(d['data'][:,1:101])

    # extract all shower shapes
    data_reta   = d['data'][:, feature_names.index('L2Calo_reta')].reshape((n,1)) / 1.0
    data_eratio = d['data'][:, feature_names.index('L2Calo_eratio')].reshape((n,1)) / 1.0
    data_f1     = d['data'][:, feature_names.index('L2Calo_f1')].reshape((n,1)) / 0.6
    data_f3     = d['data'][:, feature_names.index('L2Calo_f3')].reshape((n,1)) / 0.04
    data_weta2  = d['data'][:, feature_names.index('L2Calo_weta2')].reshape((n,1)) / 0.02
    data_wstot  = d['data'][:, feature_names.index('L2Calo_wstot')].reshape((n,1)) / 1.0
    # Fix all shower shapes variables
    data_eratio[data_eratio>10.0]=0
    data_eratio[data_eratio>1.]=1.0
    data_wstot[data_wstot<-99]=0
    data_shower = np.concatenate( (data_reta,data_eratio,data_f1,data_f3,data_weta2, data_wstot), axis=1)
    
    target = d['target']
    avgmu = d['data'][:,0]
    
    return [data_rings,data_shower], target, avgmu

In [11]:
path = '/Volumes/castor/cern_data/files/Zee/data17_13TeV.AllPeriods.sgn.probes_lhmedium_EGAM1.bkg.VProbes_EGAM7.GRL_v97/data17_13TeV.AllPeriods.sgn.probes_lhmedium_EGAM1.bkg.VProbes_EGAM7.GRL_v97_et{ET}_eta{ETA}.npz'
paths = [[ path.format(ET=et,ETA=eta) for eta in range(5)] for et in range(5)]

In [12]:
# create the table class
from saphyra.utils import correction_table
ct  = correction_table( generator, etbins , etabins, 0.02, 0.5, 16, 60, xmin_percentage=0.05, xmax_percentage=99.95 )

### 2.3) Apply linear correction:

**NOTE**: Take about 25 minutes.

In [13]:
# Fill it
ct.fill(paths, best_models, ref_matrix)

Fitting... |############################################################| 25/25
Fitting... ... finished task in 1542.481434s.


In [14]:
ct.table().head()

Unnamed: 0,name,et_bin,eta_bin,reference_signal_passed,reference_signal_total,reference_signal_eff,reference_background_passed,reference_background_total,reference_background_eff,signal_passed,...,signal_eff,background_passed,background_total,background_eff,signal_corrected_passed,signal_corrected_total,signal_corrected_eff,background_corrected_passed,background_corrected_total,background_corrected_eff
0,tight_cutbased,0,0,227619,232819,0.977666,23318,187639,0.124271,227583,...,0.97751,3073,187639,0.016377,227593,232819,0.977553,3004,187639,0.016009
1,medium_cutbased,0,0,227780,232819,0.97836,24336,187639,0.129701,227740,...,0.978185,3138,187639,0.016724,227728,232819,0.978133,3044,187639,0.016223
2,loose_cutbased,0,0,229996,232819,0.987876,31867,187639,0.169837,229980,...,0.987806,4272,187639,0.022767,229975,232819,0.987785,4216,187639,0.022469
3,vloose_cutbased,0,0,230152,232819,0.988548,32748,187639,0.174527,230138,...,0.988485,4420,187639,0.023556,230171,232819,0.988626,4383,187639,0.023359
4,tight_cutbased,0,1,137861,141000,0.977742,31938,143657,0.222321,137837,...,0.977567,3936,143657,0.027399,137832,141000,0.977532,3873,143657,0.02696


### 2.3) Create beamer report:

In [15]:
ct.dump_beamer_table(ct.table(), best_models, 'data17_13TeV v9 tuning', 
                                              'correction_v9_data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.pdf',
                                              'correction_v9_data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose')


Applying ATLAS style settings...
2020-12-19 18:55:34,233 | Py.BeamerTexReportTemplate1             INFO Started creating beamer file correction_v9_data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.pdf latex code...


## 3) Export all tunings:

In [20]:
model_name_format = 'data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electron{op}.et%d_eta%d'
config_name_format = 'ElectronRinger{op}TriggerConfig.conf'
for idx, op in enumerate(['Tight','Medium','Loose','VeryLoose']):
    ct.export(best_models, 
              model_name_format.format(op=op), 
              config_name_format.format(op=op), 
              references[idx], 
              to_onnx=True)

Saving ONNX file as data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electronTight.et0_eta0.onnx
Saving ONNX file as data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electronTight.et0_eta1.onnx
Saving ONNX file as data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electronTight.et0_eta2.onnx
Saving ONNX file as data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electronTight.et0_eta3.onnx
Saving ONNX file as data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electronTight.et0_eta4.onnx
Saving ONNX file as data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electronTight.et1_eta0.onnx
Saving ONNX file as data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electronTight.et1_eta1.onnx
Saving ONNX file as data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electronTight.et1_eta2.onnx
Saving ONNX file as data17_13TeV_EGAM1_probes_lhmedium_EGAM7_vetolhvloose.model_v9.electronTight.et1_eta3.onnx
S

2020-12-19 19:15:31.432443: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-19 19:15:31.445417: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ffa91f2bc00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-19 19:15:31.445431: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
tf executing eager_mode: True
tf.keras model eager_mode: False
The ONNX operator number change on the optimization: 19 -> 12
The maximum opset needed by this model is only 11.
2020-12-19 19:15:33.720723: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (one