In this notebook we summarize performance of using balanced OT distances between events in tandem with interpretable machine learning (ML) for anomaly detection and classification.

We consider the following settings:

**kNN classification:**

- 3D ground space (`kNN_distance_matrix_3D.json`)
- 2D ground space (`kNN_distance_matrix_2D.json`)
- 2D planed $p_{\rm T}$ ground space
  
  `kNN_2D_planed_50GeV.json`   (0,50) GeV

  `kNN_2D_planed_100GeV.json`  (50,100) GeV
  
  `kNN_2D_planed_150GeV.json` (100,150) GeV
  
  `kNN_2D_planed_200GeV.json`(150,200) GeV
  
  `kNN_2D_planed_500GeV.json` (200,500) GeV
  
  `kNN_2D_planed_1000GeV.json` (500,1000) GeV

**kNN anomaly detection (anomaly augmented background as signal)**
- 3D ground space (`kNN_3D_anomalyaug.json`)

**oneClassSVM anomaly detection**
- 3D ground space (`OneClassSVM.json`)

In [1]:
NSIGFIGS = 4
OVERWRITEFILES = True

# Preliminaries

## Information about data

The data we are using was a part of the [ML4Jets 2021 data challenge](https://indico.cern.ch/event/980214/contributions/4413658/attachments/2278124/3870358/ml4jets_data_challenge.pdf). It is publically available in `.h5` format so it's great for testing out new methods!

**Publication:**

E. Govorkova, E. Puljak, T. Aarrestad, M. Pierini, K. A. Woźniak and J. Ngadiuba, LHC physics dataset for unsupervised New Physics detection at 40 MHz, Sci. Data 9, 118 (2022),
doi:[10.1038/s41597-022-01187-8](https://www.nature.com/articles/s41597-022-01187-8), arXiv:2107.02157

**Data:**

NOTE: The original data had several bugs in it. The links in the original materials (i.e. publication and slides) point to the incorrect Version 1 of the data. Version 2 should be used instead. [This website](https://mpp-hep.github.io/ADC2021/) contains the correct links/descriptions/usage information.

- `background_for_training.h5`: 4 million Standard Model (SM) background "training" data ([link to data](https://zenodo.org/record/5046428#.ZB9yKezMKHu))
-  `Ato4l_lepFilter_13TeV_filtered.h5`: # Neutral scalar boson events, $A \rightarrow 4l$, mass = $50$ GeV ([link to data](https://zenodo.org/record/7152590#.ZB9yROzMKHu))
- `leptoquark_LOWMASS_lepFilter_13TeV_filtered.h5`: # Leptoquark events, ${\rm LQ} \rightarrow b \tau$ ([link to data](https://zenodo.org/record/7152599#.ZB9yZOzMKHu))
- `hToTauTau_13TeV_PU20_filtered.h5`: # Scalar boson events, $h^0 \rightarrow \tau \tau$ ([link to data](https://zenodo.org/record/7152614#.ZB9ybOzMKHt))
- `hChToTauNu_13TeV_PU20_filtered.h5`: # Charged scalar boson events, $h^\pm \rightarrow \tau \nu$ ([link to data](https://zenodo.org/record/7152617#.ZB9yf-zMKHt))
- `BlackBox_background_mix.h5`: # Mystery events ([link to data](https://zenodo.org/record/5072068#.ZB9yk-zMKHt))

In [2]:
sigAliasList    = ['sig_A', 'sig_h0', 'sig_hch', 'sig_LQ']
sigFilenameList = ['Ato4l_lepFilter_13TeV_filtered.h5', 'hToTauTau_13TeV_PU20_filtered.h5', 'hChToTauNu_13TeV_PU20_filtered.h5', 'leptoquark_LOWMASS_lepFilter_13TeV_filtered.h5']

## Google Drive preliminaries (since we're running on Google Colab)

In [3]:
#-- "Mount" Google Drive to access data and save files/images --#
# NOTE: If running locally, comment out this cell and change the basePath accordingly
# Reference: https://stackoverflow.com/questions/49031798/when-i-use-google-colaboratory-how-to-save-image-weights-in-my-google-drive
from google.colab import drive
drive.mount('/content/gdrive')

# You will be asked to sign into a Google account and give GoogleColab access

Mounted at /content/gdrive


In [4]:
# To check that mounting worked, uncomment and run the following. You should see the contents of the directory listed.
! ls '/content/gdrive/My Drive/Research/AnomalyDetectionWithOT/OnML4Jets2021DataChallenge/Data/'

anomalyAugmented_background_for_training_50k.h5
anomalyAugmented_background_for_training.h5
Ato4l_lepFilter_13TeV_filtered.h5
background_for_training.h5
BlackBox_background_mix.h5
finalScoreDict_2D_nEvents1000_nRepeat5_eventToEnsembleTypeMAX.npz
finalScoreDict_2D_nEvents1000_nRepeat5_eventToEnsembleTypeMEAN.npz
finalScoreDict_2D_nEvents1000_nRepeat5_eventToEnsembleTypeMIN.npz
finalScoreDict_2D_nEvents1000_nRepeat5.npz
finalScoreDict_2D_nEvents10_nRepeat5_eventToEnsembleTypeMAX.npz
finalScoreDict_3DanomalykNN_nEvents1000_nRepeat5.npz
finalScoreDict_3D_nEvents1000_nRepeat5_eventToEnsembleTypeMAX.npz
finalScoreDict_3D_nEvents1000_nRepeat5_eventToEnsembleTypeMEAN.npz
finalScoreDict_3D_nEvents1000_nRepeat5_eventToEnsembleTypeMIN.npz
finalScoreDict_3D_nEvents1000_nRepeat5.npz
finalScoreDict_3D_nEvents10_nRepeat5_eventToEnsembleTypeMAX.npz
finalScoreDict_3DoneClassSVM_nEvents1000_nRepeat5.npz
hChToTauNu_13TeV_PU20_filtered.h5
hToTauTau_13TeV_PU20_filtered.h5
leptoquark_LOWMASS_lepFilter_13TeV

In [5]:
#-- Set base directory and data directory path --#
basePath   = '/content/gdrive/My Drive/Research/AnomalyDetectionWithOT/OnML4Jets2021DataChallenge/'
dataPath   = 'Data/'

bkgPath    = basePath+dataPath+'background_for_training.h5'
sigPathList = []
for x in sigFilenameList:
  sigPathList.append(basePath+dataPath+x)

## Import libraries

We'll eventually be using the PyOT library to compute Wasserstein distances for now (see [here](https://pythonot.github.io/index.html)). But since this notebook is just visualizing the data we won't use it yet.

In [6]:
import numpy as np
from numpy.random import RandomState
import numpy.ma as ma

import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
%matplotlib inline

import h5py
from numpy.random import Generator, PCG64
from sklearn import metrics
import itertools

import os.path

import json

## Functions

To keep things tidy, functions are externally defined in `centralFunctions.ipynb`. We run this notebook from here which defines the contained functions as if they were written here.


In [7]:
%cd /content/gdrive/My Drive/Research/AnomalyDetectionWithOT/OnML4Jets2021DataChallenge/
%run centralFunctions.ipynb

/content/gdrive/My Drive/Research/AnomalyDetectionWithOT/OnML4Jets2021DataChallenge
Collecting POT
  Downloading POT-0.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (823 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.0/823.0 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: POT
Successfully installed POT-0.9.3


# Classification

## kNN 3D ground space

In [8]:
filepath   = basePath+'Results/'+'kNN_distance_matrix_3D.json'

In [9]:
scoreDict = loadJSONFile(filepath)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


### Get and report average performance for tables

In [10]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [11]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.9020570647076511 0.014260414200471186
   sig_h0, mean, std:  0.7712851497591762 0.018137049369822508
   sig_hch, mean, std:  0.9197953147848965 0.006547174999609766
   sig_LQ, mean, std:  0.8766399182331396 0.014149130221591219
Inverse FPR at TPR=0.3
   sig_A, mean, std:  111.76213942783602 44.36260942876099
   sig_h0, mean, std:  18.254523101918057 1.910918235043383
   sig_hch, mean, std:  102.22610579796773 59.91223954177877
   sig_LQ, mean, std:  28.490429243390444 6.9379977542382925
SI at TPR=0.3
   sig_A, mean, std:  3.0938089386050986 0.5989023933996813
   sig_h0, mean, std:  1.2787543124326444 0.06832433236073215
   sig_hch, mean, std:  2.88374004719849 0.871816735566458
   sig_LQ, mean, std:  1.5847398445160459 0.21218815574200584
F1 at TPR=0.3
   sig_A, mean, std:  0.4579004069804392 0.0012902493171862571
   sig_h0, mean, std:  0.4426702330425657 0.0020566046699665485
   sig_hch, mean, std:  0.456537627525052 0.0031169630303672373
   sig_LQ, mean, 

In [12]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.9021  $\pm$  0.01426
0.7713  $\pm$  0.01814
0.9198  $\pm$  0.006547
0.8766  $\pm$  0.01415
Inverse FPR at TPR=0.3
111.8  $\pm$  44.36
18.25  $\pm$  1.911
102.2  $\pm$  59.91
28.49  $\pm$  6.938
SI at TPR=0.3
3.094  $\pm$  0.5989
1.279  $\pm$  0.06832
2.884  $\pm$  0.8718
1.585  $\pm$  0.2122
F1 at TPR=0.3
0.4579  $\pm$  0.00129
0.4427  $\pm$  0.002057
0.4565  $\pm$  0.003117
0.4484  $\pm$  0.004377


## kNN 2D ground space

In [13]:
filepath   = basePath+'Results/'+'kNN_distance_matrix_2D.json'

In [14]:
scoreDict = loadJSONFile(filepath)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


### Get and report average performance for tables

In [15]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [16]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.6947582733388014 0.017948357453657943
   sig_h0, mean, std:  0.669764010403308 0.011783244458465292
   sig_hch, mean, std:  0.8103431594284152 0.016734256523563844
   sig_LQ, mean, std:  0.7905600332483893 0.025314221471884484
Inverse FPR at TPR=0.3
   sig_A, mean, std:  7.923026559597614 1.294821085547253
   sig_h0, mean, std:  8.17182222334151 1.3590789128082998
   sig_hch, mean, std:  17.491854070243775 4.275584665654385
   sig_LQ, mean, std:  16.01372266942044 4.557511060647353
SI at TPR=0.3
   sig_A, mean, std:  0.8412879908126587 0.06875559693256297
   sig_h0, mean, std:  0.8542381189157672 0.07158307024625277
   sig_hch, mean, std:  1.2449207376195273 0.14671996497641435
   sig_LQ, mean, std:  1.1871197943872231 0.1717304335290795
F1 at TPR=0.3
   sig_A, mean, std:  0.41978390411259525 0.006138771112548631
   sig_h0, mean, std:  0.4208778825467185 0.006355072346046929
   sig_hch, mean, std:  0.44114930101174965 0.004139489228981458
   sig_LQ, mean, s

In [17]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.6948  $\pm$  0.01795
0.6698  $\pm$  0.01178
0.8103  $\pm$  0.01673
0.7906  $\pm$  0.02531
Inverse FPR at TPR=0.3
7.923  $\pm$  1.295
8.172  $\pm$  1.359
17.49  $\pm$  4.276
16.01  $\pm$  4.558
SI at TPR=0.3
0.8413  $\pm$  0.06876
0.8542  $\pm$  0.07158
1.245  $\pm$  0.1467
1.187  $\pm$  0.1717
F1 at TPR=0.3
0.4198  $\pm$  0.006139
0.4209  $\pm$  0.006355
0.4411  $\pm$  0.004139
0.4387  $\pm$  0.006468


## kNN 2D  planed  $p_{\rm T}$  ground space ground space

  `kNN_2D_planed_50GeV.json`   (0,50) GeV

  `kNN_2D_planed_100GeV.json`  (50,100) GeV
  
  `kNN_2D_planed_150GeV.json` (100,150) GeV
  
  `kNN_2D_planed_200GeV.json`(150,200) GeV
  
  `kNN_2D_planed_500GeV.json` (200,500) GeV
  
  `kNN_2D_planed_1000GeV.json` (500,1000) GeV

### Total $p_{\rm T} \in $(0, 50)$~{\rm GeV}$

In [18]:
filepath   = basePath+'Results/'+'kNN_2D_planed_50GeV.json'

In [19]:
scoreDict = loadJSONFile(filepath, INVERTED=True)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


#### Get and report average performance for tables

In [20]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [21]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.6193241561554373 0.013909554559565948
   sig_h0, mean, std:  0.5560362022323223 0.026013491872497616
   sig_hch, mean, std:  0.5752272736176252 0.037555171735308014
   sig_LQ, mean, std:  0.6840947062389287 0.03150385066000139
Inverse FPR at TPR=0.3
   sig_A, mean, std:  5.720323135923932 1.0427996374966004
   sig_h0, mean, std:  4.994751616508332 0.545751959500575
   sig_hch, mean, std:  5.165253383500511 2.332377570460752
   sig_LQ, mean, std:  9.287559135652662 2.4428859950935604
SI at TPR=0.3
   sig_A, mean, std:  0.7145600428307213 0.06268116105177085
   sig_h0, mean, std:  0.6692488318748542 0.03750405327656465
   sig_hch, mean, std:  0.667468668058542 0.13809242844040834
   sig_LQ, mean, std:  0.9052009329542259 0.12514959092758332
F1 at TPR=0.3
   sig_A, mean, std:  0.4055913263194852 0.00776735846331888
   sig_h0, mean, std:  0.3993394610626233 0.006404104515055896
   sig_hch, mean, std:  0.39488579890506886 0.01779246103658605
   sig_LQ, mean, std

In [22]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.6193  $\pm$  0.01391
0.556  $\pm$  0.02601
0.5752  $\pm$  0.03756
0.6841  $\pm$  0.0315
Inverse FPR at TPR=0.3
5.72  $\pm$  1.043
4.995  $\pm$  0.5458
5.165  $\pm$  2.332
9.288  $\pm$  2.443
SI at TPR=0.3
0.7146  $\pm$  0.06268
0.6692  $\pm$  0.0375
0.6675  $\pm$  0.1381
0.9052  $\pm$  0.1251
F1 at TPR=0.3
0.4056  $\pm$  0.007767
0.3993  $\pm$  0.006404
0.3949  $\pm$  0.01779
0.4237  $\pm$  0.01063


### Total $p_{\rm T} \in $(50, 100)$~{\rm GeV}$

In [23]:
filepath   = basePath+'Results/'+'kNN_2D_planed_100GeV.json'

In [24]:
scoreDict = loadJSONFile(filepath, INVERTED=True)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


#### Get and report average performance for tables

In [25]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [26]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.797380753997553 0.008993921054521814
   sig_h0, mean, std:  0.6761395554738507 0.035688653156475125
   sig_hch, mean, std:  0.607118567366921 0.026939186162531433
   sig_LQ, mean, std:  0.7468665866068318 0.021490142570251385
Inverse FPR at TPR=0.3
   sig_A, mean, std:  32.50918717373412 6.754414568235046
   sig_h0, mean, std:  7.727572381160046 2.136936191331599
   sig_hch, mean, std:  5.751161196686206 1.5366540739928831
   sig_LQ, mean, std:  11.439286580295612 2.9745172972154434
SI at TPR=0.3
   sig_A, mean, std:  1.6989672370784958 0.17160988567507615
   sig_h0, mean, std:  0.8266494187991382 0.10749063817764246
   sig_hch, mean, std:  0.7137543873285404 0.08854637440361669
   sig_LQ, mean, std:  1.0054062863386701 0.1320540495474197
F1 at TPR=0.3
   sig_A, mean, std:  0.450468632203808 0.0019917800051518194
   sig_h0, mean, std:  0.4176216278306392 0.008589758258329395
   sig_hch, mean, std:  0.40475626800289455 0.009902786513418773
   sig_LQ, mean, s

In [27]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.7974  $\pm$  0.008994
0.6761  $\pm$  0.03569
0.6071  $\pm$  0.02694
0.7469  $\pm$  0.02149
Inverse FPR at TPR=0.3
32.51  $\pm$  6.754
7.728  $\pm$  2.137
5.751  $\pm$  1.537
11.44  $\pm$  2.975
SI at TPR=0.3
1.699  $\pm$  0.1716
0.8266  $\pm$  0.1075
0.7138  $\pm$  0.08855
1.005  $\pm$  0.1321
F1 at TPR=0.3
0.4505  $\pm$  0.001992
0.4176  $\pm$  0.00859
0.4048  $\pm$  0.009903
0.4306  $\pm$  0.007892


### Total $p_{\rm T} \in $(100, 150)$~{\rm GeV}$

In [28]:
filepath   = basePath+'Results/'+'kNN_2D_planed_150GeV.json'

In [29]:
scoreDict = loadJSONFile(filepath, INVERTED=True)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


#### Get and report average performance for tables

In [30]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [31]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.7011452113981134 0.005785647854299646
   sig_h0, mean, std:  0.5667354772174112 0.04953517731908581
   sig_hch, mean, std:  0.6015605971417531 0.03723336061553597
   sig_LQ, mean, std:  0.6883180348051865 0.015572400289487753
Inverse FPR at TPR=0.3
   sig_A, mean, std:  10.959239610023166 1.5057977233687423
   sig_h0, mean, std:  4.628903164609797 1.0147236114209228
   sig_hch, mean, std:  5.744332790019625 1.3499935185698095
   sig_LQ, mean, std:  8.602780129306655 0.8668260379253375
SI at TPR=0.3
   sig_A, mean, std:  0.9902822497404277 0.06762049166763459
   sig_h0, mean, std:  0.6412754965419624 0.07186805037842407
   sig_hch, mean, std:  0.7129559936909151 0.09149054639897862
   sig_LQ, mean, std:  0.8784145759300168 0.04433735965110864
F1 at TPR=0.3
   sig_A, mean, std:  0.4307761208038784 0.003856379817598979
   sig_h0, mean, std:  0.3932557790812114 0.013542976912522544
   sig_hch, mean, std:  0.40381158611601553 0.015705413620848124
   sig_LQ, mean

In [32]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.7011  $\pm$  0.005786
0.5667  $\pm$  0.04954
0.6016  $\pm$  0.03723
0.6883  $\pm$  0.01557
Inverse FPR at TPR=0.3
10.96  $\pm$  1.506
4.629  $\pm$  1.015
5.744  $\pm$  1.35
8.603  $\pm$  0.8668
SI at TPR=0.3
0.9903  $\pm$  0.06762
0.6413  $\pm$  0.07187
0.713  $\pm$  0.09149
0.8784  $\pm$  0.04434
F1 at TPR=0.3
0.4308  $\pm$  0.003856
0.3933  $\pm$  0.01354
0.4038  $\pm$  0.01571
0.4233  $\pm$  0.003548


### Total $p_{\rm T} \in $(150, 200)$~{\rm GeV}$

In [33]:
filepath   = basePath+'Results/'+'kNN_2D_planed_200GeV.json'

In [34]:
scoreDict = loadJSONFile(filepath, INVERTED=True)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


#### Get and report average performance for tables

In [35]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [36]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.6251299810860171 0.020683095471606176
   sig_h0, mean, std:  0.5761864516416402 0.02572489625664325
   sig_hch, mean, std:  0.6665967908890511 0.021584085318690237
   sig_LQ, mean, std:  0.6981814488486944 0.019362046157130636
Inverse FPR at TPR=0.3
   sig_A, mean, std:  6.605930760481655 1.5480214085001431
   sig_h0, mean, std:  4.896175128209654 0.6393034001378478
   sig_hch, mean, std:  8.071437291308364 1.8286225994933683
   sig_LQ, mean, std:  10.501947319418028 2.782630904738234
SI at TPR=0.3
   sig_A, mean, std:  0.7654720256199452 0.09040097735067379
   sig_h0, mean, std:  0.6621453543887257 0.04472103343582272
   sig_hch, mean, std:  0.8462222469957391 0.09859706695093602
   sig_LQ, mean, std:  0.9632156581674881 0.12778816876644647
F1 at TPR=0.3
   sig_A, mean, std:  0.4112088811062393 0.010569348976099601
   sig_h0, mean, std:  0.3979728988627437 0.007998368794588803
   sig_hch, mean, std:  0.4194717397517076 0.009260432875393104
   sig_LQ, mean,

In [37]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.6251  $\pm$  0.02068
0.5762  $\pm$  0.02572
0.6666  $\pm$  0.02158
0.6982  $\pm$  0.01936
Inverse FPR at TPR=0.3
6.606  $\pm$  1.548
4.896  $\pm$  0.6393
8.071  $\pm$  1.829
10.5  $\pm$  2.783
SI at TPR=0.3
0.7655  $\pm$  0.0904
0.6621  $\pm$  0.04472
0.8462  $\pm$  0.0986
0.9632  $\pm$  0.1278
F1 at TPR=0.3
0.4112  $\pm$  0.01057
0.398  $\pm$  0.007998
0.4195  $\pm$  0.00926
0.4281  $\pm$  0.008049


### Total $p_{\rm T} \in $(200, 500)$~{\rm GeV}$

In [38]:
filepath   = basePath+'Results/'+'kNN_2D_planed_500GeV.json'

In [39]:
scoreDict = loadJSONFile(filepath, INVERTED=True)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


#### Get and report average performance for tables

In [40]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [41]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.6258593862764383 0.024167621160884223
   sig_h0, mean, std:  0.5841808994695518 0.006719364053971755
   sig_hch, mean, std:  0.6885509588655917 0.020427488648409705
   sig_LQ, mean, std:  0.6166094819906541 0.020159541926277634
Inverse FPR at TPR=0.3
   sig_A, mean, std:  6.16329548850563 0.6384270538110367
   sig_h0, mean, std:  4.522568001710299 0.4746227280433852
   sig_hch, mean, std:  7.88692435538473 0.7401787117282455
   sig_LQ, mean, std:  5.8074571033719105 0.7262570293924676
SI at TPR=0.3
   sig_A, mean, std:  0.7435573602421115 0.03838905061227093
   sig_h0, mean, std:  0.636985282699469 0.033088454152113304
   sig_hch, mean, std:  0.8412273796020638 0.039942177318503296
   sig_LQ, mean, std:  0.7214036047842101 0.044044034844943736
F1 at TPR=0.3
   sig_A, mean, std:  0.40989526188439723 0.004700744078859924
   sig_h0, mean, std:  0.3939257593831689 0.005834842230686874
   sig_hch, mean, std:  0.42020930609019524 0.003690353818426614
   sig_LQ, m

In [42]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.6259  $\pm$  0.02417
0.5842  $\pm$  0.006719
0.6886  $\pm$  0.02043
0.6166  $\pm$  0.02016
Inverse FPR at TPR=0.3
6.163  $\pm$  0.6384
4.523  $\pm$  0.4746
7.887  $\pm$  0.7402
5.807  $\pm$  0.7263
SI at TPR=0.3
0.7436  $\pm$  0.03839
0.637  $\pm$  0.03309
0.8412  $\pm$  0.03994
0.7214  $\pm$  0.04404
F1 at TPR=0.3
0.4099  $\pm$  0.004701
0.3939  $\pm$  0.005835
0.4202  $\pm$  0.00369
0.4069  $\pm$  0.005521


### Total $p_{\rm T} \in $(500, 1000)$~{\rm GeV}$

In [43]:
filepath   = basePath+'Results/'+'kNN_2D_planed_1000GeV.json'

In [44]:
scoreDict = loadJSONFile(filepath, INVERTED=True)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


#### Get and report average performance for tables

In [45]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [46]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.798913933102407 0.02230500501928155
   sig_h0, mean, std:  0.5829378691399614 0.021238964288079334
   sig_hch, mean, std:  0.6202817253317441 0.0341775808408001
   sig_LQ, mean, std:  0.5284963612145445 0.015364811569643429
Inverse FPR at TPR=0.3
   sig_A, mean, std:  27.23862082434311 14.046736919289579
   sig_h0, mean, std:  4.954647972202563 0.6115861632887708
   sig_hch, mean, std:  6.2972793548395884 1.5504535239440518
   sig_LQ, mean, std:  3.8560903470524535 0.5762526482992578
SI at TPR=0.3
   sig_A, mean, std:  1.5134496139148408 0.3905470990213326
   sig_h0, mean, std:  0.666315083412993 0.0414528447970033
   sig_hch, mean, std:  0.746830650502067 0.09284853637041392
   sig_LQ, mean, std:  0.5873056969220986 0.04453527934232501
F1 at TPR=0.3
   sig_A, mean, std:  0.4455056541811812 0.007099954804610173
   sig_h0, mean, std:  0.3987824716662447 0.006866453516108084
   sig_hch, mean, std:  0.4087818218693721 0.01160634372569117
   sig_LQ, mean, std: 

In [47]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.7989  $\pm$  0.02231
0.5829  $\pm$  0.02124
0.6203  $\pm$  0.03418
0.5285  $\pm$  0.01536
Inverse FPR at TPR=0.3
27.24  $\pm$  14.05
4.955  $\pm$  0.6116
6.297  $\pm$  1.55
3.856  $\pm$  0.5763
SI at TPR=0.3
1.513  $\pm$  0.3905
0.6663  $\pm$  0.04145
0.7468  $\pm$  0.09285
0.5873  $\pm$  0.04454
F1 at TPR=0.3
0.4455  $\pm$  0.0071
0.3988  $\pm$  0.006866
0.4088  $\pm$  0.01161
0.3835  $\pm$  0.0101


# Anomaly Detection

## kNN 3D ground space, anomaly augmented background as signal

In [48]:
filepath   = basePath+'Results/'+'kNN_3D_anomalyaug.json'

In [49]:
scoreDict = loadJSONFile(filepath)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


### Get and report average performance for tables

In [50]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [51]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.8324266999999999 0.005851705653909785
   sig_h0, mean, std:  0.6864171 0.008349107349890724
   sig_hch, mean, std:  0.8134343000000002 0.006917888020198079
   sig_LQ, mean, std:  0.7569604 0.007814717143953435
Inverse FPR at TPR=0.3
   sig_A, mean, std:  35.43949841049924 5.8985048198167105
   sig_h0, mean, std:  10.622491398764643 1.7538700857839327
   sig_hch, mean, std:  22.117746968986012 3.928158053035346
   sig_LQ, mean, std:  13.37941358504798 0.9646422196454041
SI at TPR=0.3
   sig_A, mean, std:  1.7763932927704502 0.1496935638324087
   sig_h0, mean, std:  0.9741047516728983 0.07811599219942801
   sig_hch, mean, std:  1.4036136334862133 0.12623389390243472
   sig_LQ, mean, std:  1.0958938458978564 0.03932727323383196
F1 at TPR=0.3
   sig_A, mean, std:  0.4514558416225357 0.0017088656863462012
   sig_h0, mean, std:  0.42970562707071736 0.004352275948733205
   sig_hch, mean, std:  0.4455413216752492 0.002834946074295304
   sig_LQ, mean, std:  0.436330

In [52]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.8324  $\pm$  0.005852
0.6864  $\pm$  0.008349
0.8134  $\pm$  0.006918
0.757  $\pm$  0.007815
Inverse FPR at TPR=0.3
35.44  $\pm$  5.899
10.62  $\pm$  1.754
22.12  $\pm$  3.928
13.38  $\pm$  0.9646
SI at TPR=0.3
1.776  $\pm$  0.1497
0.9741  $\pm$  0.07812
1.404  $\pm$  0.1262
1.096  $\pm$  0.03933
F1 at TPR=0.3
0.4515  $\pm$  0.001709
0.4297  $\pm$  0.004352
0.4455  $\pm$  0.002835
0.4363  $\pm$  0.001699


In [53]:
filename = basePath + dataPath + 'finalScoreDict_3DanomalykNN_nEvents1000_nRepeat5.npz'
print(filename)

if os.path.exists(filename) and OVERWRITEFILES==False:
  print("File already exists")
else:
  np.savez(filename, **scoreDict)

/content/gdrive/My Drive/Research/AnomalyDetectionWithOT/OnML4Jets2021DataChallenge/Data/finalScoreDict_3DanomalykNN_nEvents1000_nRepeat5.npz


## oneClassSVM 3D ground space

In [54]:
filepath   = basePath+'Results/'+'OneClassSVM.json'

In [55]:
scoreDict = loadJSONFile(filepath, INVERTED=True)

print(" Top keys:         ", scoreDict.keys())
print("   Sub keys:       ", scoreDict['repeat0'].keys())
print("     Sub sub keys: ",scoreDict['repeat0']['ROC_metric_sig_A'].keys())

 Top keys:          dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
   Sub keys:        dict_keys(['ROC_metric_sig_A', 'ROC_metric_sig_h0', 'ROC_metric_sig_hch', 'ROC_metric_sig_LQ'])
     Sub sub keys:  dict_keys(['auc', 'fpr', 'tpr', 'SI', 'fprInv', 'F1'])


### Get and report average performance for tables

In [56]:
print(scoreDict.keys())
getRepeatAvStd(scoreDict)
print(scoreDict.keys())
print(scoreDict['avStdQuantities']['sig_A'].keys())

dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4'])
Analyzing signal type = sig_A 
Analyzing signal type = sig_h0 
Analyzing signal type = sig_hch 
Analyzing signal type = sig_LQ 
dict_keys(['repeat0', 'repeat1', 'repeat2', 'repeat3', 'repeat4', 'avStdQuantities'])
dict_keys(['auc', 'fpr', 'SI', 'fprInv', 'F1'])


In [57]:
#-- Report results for tables--#
indx = indxOfCertainTPR([np.linspace(0, 1, 101)], TPRval = 0.3)[0] # Assuming base TPR value

print("AUC:")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['auc']['mean'], scoreDict['avStdQuantities'][alias]['auc']['std'])

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx])

print("SI at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], scoreDict['avStdQuantities'][alias]['SI']['std'][indx])

print("F1 at TPR=0.3")
for alias in sigAliasList:
  print("   %s, mean, std: "%alias, scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], scoreDict['avStdQuantities'][alias]['F1']['std'][indx])

AUC:
   sig_A, mean, std:  0.7682 0.006209669878503925
   sig_h0, mean, std:  0.6622 0.007406753674856432
   sig_hch, mean, std:  0.8096 0.00820609529557142
   sig_LQ, mean, std:  0.7386 0.013781872151489365
Inverse FPR at TPR=0.3
   sig_A, mean, std:  14.230637344788155 0.36028904317041455
   sig_h0, mean, std:  6.209409643597402 0.15397041381779136
   sig_hch, mean, std:  28.938496448979244 2.702823952070606
   sig_LQ, mean, std:  10.511847670731312 0.8962393764048461
SI at TPR=0.3
   sig_A, mean, std:  1.1308097907429324 0.014331975131505059
   sig_h0, mean, std:  0.7472704842868603 0.009310354242338451
   sig_hch, mean, std:  1.6097100421914092 0.07561676998885514
   sig_LQ, mean, std:  0.9712773725452204 0.04105975650009234
F1 at TPR=0.3
   sig_A, mean, std:  0.4378558210169949 0.0005744051379021845
   sig_h0, mean, std:  0.4106394574007964 0.0011449658380605894
   sig_hch, mean, std:  0.44948494294103974 0.001136146749418482
   sig_LQ, mean, std:  0.42987386869915456 0.0024360698

In [58]:
# Same order as above but easier for copying over to draft
print("AUC:")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['mean'], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['auc']['std'], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("Inverse FPR at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['fprInv']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("SI at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['SI']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

print("F1 at TPR=0.3")
for alias in sigAliasList:
  mean = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['mean'][indx], NSIGFIGS)
  std  = roundToSigFig(scoreDict['avStdQuantities'][alias]['F1']['std'][indx], NSIGFIGS)
  print(mean, ' $\pm$ ', std )

AUC:
0.7682  $\pm$  0.00621
0.6622  $\pm$  0.007407
0.8096  $\pm$  0.008206
0.7386  $\pm$  0.01378
Inverse FPR at TPR=0.3
14.23  $\pm$  0.3603
6.209  $\pm$  0.154
28.94  $\pm$  2.703
10.51  $\pm$  0.8962
SI at TPR=0.3
1.131  $\pm$  0.01433
0.7473  $\pm$  0.00931
1.61  $\pm$  0.07562
0.9713  $\pm$  0.04106
F1 at TPR=0.3
0.4379  $\pm$  0.0005744
0.4106  $\pm$  0.001145
0.4495  $\pm$  0.001136
0.4299  $\pm$  0.002436


In [59]:
filename = basePath + dataPath + 'finalScoreDict_3DoneClassSVM_nEvents1000_nRepeat5.npz'
print(filename)

if os.path.exists(filename) and OVERWRITEFILES==False:
  print("File already exists")
else:
  np.savez(filename, **scoreDict)

/content/gdrive/My Drive/Research/AnomalyDetectionWithOT/OnML4Jets2021DataChallenge/Data/finalScoreDict_3DoneClassSVM_nEvents1000_nRepeat5.npz
