# Image Classification of ATLAS Calorimeter Topo-Clusters (Jan)

## This is a stripped-down version of Max's re-write, so I have removed *some* functionality.

Quick Navigation:
- [Simple feed-forward Neural Network](#Simple-feed-forward-Neural-Network.)
- [ROC Curve Scans](#ROC-Curve-Scans)
- [Combination Network](#Combination-Network)
- [Convolutional Neural Network](#Convolutional-Neural-Network)
- [Correlation Plots](#Correlation-Plots)

First, let's make sure that we have `latex` set up correctly. 

We will need this for the `atlas_mpl_style` package, which is used throughout some of ML4Pion's pre-existing utilities. As of October 22, 2020, the [UChicago ML platform](ml.maniac.uchicago.edu) does *not* have `latex` pre-installed. We can take care of this with our own installation script -- note that installed `latex` with `conda` [does not work well at the moment](https://github.com/conda-forge/texlive-core-feedstock/issues/19) so we fall back on the slower, regular method for installing `texlive` -- but we haven't made the necessary addition to our `$PATH` for `latex` so we must set it now locally for the notebook. (We avoid touching the `.bash_profile` on the ML platform or [making a custom Jupyter kernel](https://stackoverflow.com/a/53595397) for now.)

In [1]:
# Check if latex is set up already.
# We use some Jupyter magic -- alternatively one could use python's subprocess here.
has_latex = !command -v latex
has_latex = (not has_latex == [])

# If latex was not a recognized command, our setup script should have installed
# at a fixed location, but it is not on the $PATH. Now let's use some Jupyter magic.
# See https://ipython.readthedocs.io/en/stable/interactive/shell.html for info.
if(not has_latex):
    latex_prefix = '/local/home/jano/texlive/2020/bin/x86_64-linux' # '/usr/local/texlive/2020/bin/x86_64-linux'
    jupyter_env = %env
    path = jupyter_env['PATH']
    path = path + ':' + latex_prefix
    %env PATH = $path
    jupyter_env = %env
    path = jupyter_env['PATH']

%load_ext autoreload
%autoreload 2

env: PATH=/local/home/jano/miniconda3/envs/ml4p/bin:/local/home/jano/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/local/home/jano/texlive/2020/bin/x86_64-linux


In [2]:
#import libraries and some constants

import os, sys
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize, LogNorm
import pandas as pd
import ROOT as rt # I will use this for some plotting
import uproot as ur
import atlas_mpl_style as ampl
ampl.use_atlas_style()

params = {'legend.fontsize': 13,
          'axes.labelsize': 18}
plt.rcParams.update(params)

path_prefix = os.getcwd() + '/../'
plotpath = path_prefix+'classifier/Plots/'
modelpath = path_prefix+'classifier/Models/'
# %config InlineBackend.figure_format = 'svg'

# metadata
layers = ["EMB1", "EMB2", "EMB3", "TileBar0", "TileBar1", "TileBar2"]
cell_size_phi = [0.098, 0.0245, 0.0245, 0.1, 0.1, 0.1]
cell_size_eta = [0.0031, 0.025, 0.05, 0.1, 0.1, 0.2]
len_phi = [4, 16, 16, 4, 4, 4]
len_eta = [128, 16, 8, 4, 4, 2]
cell_shapes = {layers[i]:(len_eta[i],len_phi[i]) for i in range(len(layers))}

Welcome to JupyROOT 6.22/02


In [3]:
# fancy display names for each pion type
pi_latex = {
    'pi0': '\(\pi^{0}\)',
    'piplus': '\(\pi^{+}\)',
    'piminus': '\(\pi^{-}\)',
}

Now let's import our resolution utilities. These take care of some plotting, using `matplotlib` and the `atlas_mpl_style` package.

In [4]:
path_prefix = os.getcwd() + '/../'
if(path_prefix not in sys.path): sys.path.append(path_prefix)
from util import resolution_util as ru
from util import plot_util as pu
from util import ml_util as mu

Now, we will import our data from the `ROOT` files into a `pandas` DataFrame. The first cell takes care of scalars, and the second takes care of vectors.

In [5]:
# import pi+- vs. pi0 images
source = 'pion' # also try 'jet'

if(source == 'pion'):
    inputpath = path_prefix+'data/pion/'
    rootfiles = ["pi0", "piplus", "piminus"]
    branches = ['runNumber', 'eventNumber', 'truthE', 'truthPt', 'truthEta', 'truthPhi', 'clusterIndex', 'nCluster', 'clusterE', 'clusterECalib', 'clusterPt', 'clusterEta', 'clusterPhi', 'cluster_nCells', 'cluster_sumCellE', 'cluster_ENG_CALIB_TOT', 'cluster_ENG_CALIB_OUT_T', 'cluster_ENG_CALIB_DEAD_TOT', 'cluster_EM_PROBABILITY', 'cluster_HAD_WEIGHT', 'cluster_OOC_WEIGHT', 'cluster_DM_WEIGHT', 'cluster_CENTER_MAG', 'cluster_FIRST_ENG_DENS', 'cluster_cell_dR_min', 'cluster_cell_dR_max', 'cluster_cell_dEta_min', 'cluster_cell_dEta_max', 'cluster_cell_dPhi_min', 'cluster_cell_dPhi_max', 'cluster_cell_centerCellEta', 'cluster_cell_centerCellPhi', 'cluster_cell_centerCellLayer', 'cluster_cellE_norm']
elif(source == 'jet'):
    inputpath = path_prefix+'jets/training/'
    rootfiles = ["pi0", "piplus"]
    branches = ['runNumber', 'eventNumber', 'truthE', 'truthPt', 'truthEta', 'truthPhi', 'clusterIndex', 'nCluster', 'clusterE', 'clusterECalib', 'clusterPt', 'clusterEta', 'clusterPhi', 'cluster_nCells', 'cluster_ENG_CALIB_TOT']
else:
    assert(False)

trees = {
    rfile : ur.open(inputpath+rfile+".root")['ClusterTree']
    for rfile in rootfiles
}
pdata = {
    ifile : itree.pandas.df(branches, flatten=False)
    for ifile, itree in trees.items()
}

total = 0
for key in rootfiles:
    total += len(pdata[key])

for key in rootfiles:
    n = len(pdata[key])
    print("Number of {a:<7} events: {b:>10}\t({c:.1f}%)".format(a=key, b = n, c = 100. * n / total))
print("Total: {}".format(total))

Number of pi0     events:     263891	(23.3%)
Number of piplus  events:     435967	(38.4%)
Number of piminus events:     434627	(38.3%)
Total: 1134485


The number of events for each category may be quite different -- ultimately we want to train our classifier on a "balanced" dataset, where we have equal numbers of entries from each category.

We're training our network to classify between $\pi^\pm$ and $\pi^0$ events. Thus, we should ultimately merge our $\pi^+$ and $\pi^-$ data.

Thus, we will first generate selected indices for all categories, such that the total number of events from each category is equal, and *then* we will merge things.

Note that as we're dealing with DataFrames (`pdata`) and uproot trees (`trees`, whose contents get loaded into `pcells`), we have to be careful that when we merge data, we do it the same way for both sets of objects. Otherwise we might scramble our $\pi^\pm$ data -- which will matter *if* we ever want to use inputs beyond just the images (from `trees`) as network input.

In [6]:
n_indices = {}
n_max = int(np.min(np.array([len(pdata[key]) for key in trees.keys()])))
rng = np.random.default_rng()

# If we have a piminus key, assume the dataset are piplus, piminus, pi0
if('piminus' in trees.keys()):
    n_indices['piplus']  = int(np.ceil((n_max / 2)))
    n_indices['piminus'] = int(np.floor((n_max / 2)))
    n_indices['pi0']     = n_max
    
# Otherwise, assume we already have piplus (or piplus + piminus) and pi0, no merging needed
else: n_indices = {key:n_max for key in trees.keys}
indices = {key:rng.choice(len(pdata[key]), n_indices[key], replace=False) for key in trees.keys()}

# Make a boolean array version of our indices, since pandas is weird and doesn't handle non-bool indices?
bool_indices = {}
for key in pdata.keys():
    bool_indices[key] = np.full(len(pdata[key]), False)
    bool_indices[key][indices[key]] = True

# Apply the (bool) indices to pdata
for key in trees.keys():
    pdata[key] = pdata[key][bool_indices[key]]


# prepare pcells -- immediately apply our selected indices
pcells = {
    ifile : {
        layer : mu.setupCells(itree, layer, indices = indices[ifile])
        for layer in layers
    }
    for ifile, itree in trees.items()
}

In [7]:
# Now with the data extracted from the trees into pcells, we merge pdata and pcells as needed.
# Note the order in which we concatenate things: piplus -> piplus + piminus.
if('piminus' in trees.keys()):
    print('Merging piplus and piminus.')
    
    # merge pdata
    pdata['piplus'] = pdata['piplus'].append(pdata['piminus'])
    del pdata['piminus']
    
    # merge contents of pcells
    for layer in layers:
        pcells['piplus'][layer] = np.row_stack((pcells['piplus'][layer],pcells['piminus'][layer]))
    del pcells['piminus']

Merging piplus and piminus.


We'll train the network on $\pi^+$ and $\pi^0$ events.

In [8]:
from keras.utils import np_utils
training_dataset = ['pi0','piplus']

# create train/validation/test subsets containing 70%/10%/20%
# of events from each type of pion event
for p_index, plabel in enumerate(training_dataset):
    mu.splitFrameTVT(pdata[plabel],trainfrac=0.7)
    pdata[plabel]['label'] = p_index

# merge pi0 and pi+ events
pdata_merged = pd.concat([pdata[ptype] for ptype in training_dataset])
pcells_merged = {
    layer : np.concatenate([pcells[ptype][layer]
                            for ptype in training_dataset])
    for layer in layers
}
plabels = np_utils.to_categorical(pdata_merged['label'],len(training_dataset))

### Plot a few example images.

These are the images that we will use to train our network (together with a few other variables).

In [None]:
# plots for E = 0.5-2000 GeV pi0/pi+/pi- samples

# specify which cluster to plot
cluster = 100

# make the plot
plt.cla(); plt.clf()
fig = plt.figure(figsize=(60,20))
fig.patch.set_facecolor('white')

i = 0
for ptype, pcell in pcells.items():
    for layer in layers:
        i = i+1
        plt.subplot(3,6,i)
        plt.imshow(pcell[layer][cluster].reshape(cell_shapes[layer]), extent=[-0.2, 0.2, -0.2, 0.2],
            cmap=plt.get_cmap('Blues'), origin='lower', interpolation='nearest')
        plt.colorbar()
        plt.title(ptype+ 'in '+str(layer))
        ampl.set_xlabel("$\Delta\phi$")
        ampl.set_ylabel("$\Delta\eta$")

# show the plots
plt.savefig(plotpath+'plots_pi0_plus_minus.pdf')
plt.show()

### Plot a few histograms.

These are a bit uglier than the `matplotlib` ones Max made, but it's perhaps even easier to see any differences between $\pi^\pm$ and $\pi^0$.

In [None]:
rt.gStyle.SetOptStat(0)

# For storing histograms and legends, to prevent overwriting. (TODO: Probably better ways to do this in PyROOT)
histos = []
legends = []

qtys = ['cluster_nCells', 'clusterE', 'clusterEta', 'clusterPhi', 'cluster_EM_PROBABILITY', 'cluster_sumCellE']
qty_labels = ['Cells/Cluster', 'Cluster Energy [GeV]', 'Cluster #eta', 'Cluster #phi', 'Cluster EMProb', 'Cluster SumCellE']
qty_ranges = [(0,500), (50,200), (-0.8,0.8), (-4.,4.), (0.,1.), (0.,2500.)]

if(source == 'jet'):
    qtys = ['cluster_nCells', 'clusterE', 'clusterEta', 'clusterPhi']
    qty_labels = ['Cells/Cluster', 'Cluster Energy [GeV]', 'Cluster #eta', 'Cluster #phi']
    qty_ranges = [(0,300), (0,100), (-0.8,0.8), (-4.,4.)]

# Set up a canvas.
plot_size = 500
nx = int(np.ceil(len(qtys) / 2))
ny = 2
n_pad = nx * ny
canvas = rt.TCanvas('cluster_hists','c1',plot_size * nx,plot_size * ny)
canvas.Divide(nx,ny)

colors = {'piplus':rt.kRed,'piminus':rt.kBlue,'pi0':rt.kOrange}
styles = {'piplus':3440, 'piminus':3404, 'pi0':1001}

n_bins=20
for i, (qty, label, rng) in enumerate(zip(qtys, qty_labels, qty_ranges)):
    canvas.cd(i+1)
    leg = rt.TLegend(0.7,0.8,0.9,0.9)
    for ptype, p in pdata.items():
        hist = rt.TH1F('h_'+str(ptype)+'_'+str(qty),'',n_bins,rng[0],rng[1])
        for entry in p[qty]: hist.Fill(entry)
        integral = hist.Integral()
        if(integral != 0): hist.Scale(1./hist.Integral())
        hist.SetLineColor(colors[ptype])
        hist.SetLineWidth(2)
        hist.SetFillColorAlpha(colors[ptype],0.5)
        hist.SetFillStyle(styles[ptype])
        hist.Draw('HIST SAME')
        hist.GetXaxis().SetTitle(label)
        hist.GetYaxis().SetTitle('Normalised events')
        hist.SetMaximum(1.5 * hist.GetMaximum())
        leg.AddEntry(hist,pi_latex[ptype],'f')
        leg.Draw()
        histos.append(hist)
        legends.append(leg)
    if(qty in ['cluster_nCells','clusterE', 'cluster_EM_PROBABILITY', 'cluster_sumCellE']): rt.gPad.SetLogy()
canvas.Draw()
canvas.SaveAs(plotpath+'hist_pi0_plus_minus.pdf')

## Simple feed-forward Neural Network.
<div style="text-align: right"> <a href="#Image-Classification-of-ATLAS-Calorimeter-Topo-Clusters-Rewrite">Top</a> </div>

First, we're going to train a simple, feed-foward neural network. This will be our "baseline network".

Let's import `TensorFlow`, and get our GPU ready. We assume that there's only one available, otherwise you can modify the list below.

In [9]:
ngpu = 1
gpu_list = ["/gpu:"+str(i) for i in range(ngpu)]
#gpu_list = ["/gpu:0"] #["/gpu:0","/gpu:1","/gpu:2","/gpu:3"]

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # disable some of the tensorflow info printouts, only display errors
import tensorflow as tf
strategy = tf.distribute.MirroredStrategy(devices=gpu_list)
# strategy = tf.distribute.MirroredStrategy()
ngpu = strategy.num_replicas_in_sync
print ('Number of devices: {}'.format(ngpu))
#sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
Number of devices: 1


In [None]:
from models import baseline_nn_model

lr = 5e-5
dropout = 0.2 # < 0 -> no dropout
model = baseline_nn_model(strategy, lr=lr, dropout=dropout)

In [None]:
# build the model
models = {}
for layer in layers:
    npix = cell_shapes[layer][0]*cell_shapes[layer][1]
    models[layer] = model(npix)
    models[layer].summary()

Training the model.

In [None]:
nepochs = 100
batch_size = 200 * ngpu
verbose = 1 # 2 for a lot of printouts

model_history = {}
model_performance = {}
model_scores = {}
for layer in layers:
    print('On layer ' + layer + '.')
    
    # train+validate model
    model_history[layer] = models[layer].fit(
        pcells_merged[layer][pdata_merged.train], plabels[pdata_merged.train],
        validation_data = (
            pcells_merged[layer][pdata_merged.val], plabels[pdata_merged.val]
        ),
        epochs = nepochs, batch_size = batch_size, verbose = verbose,
    )
    
    model_history[layer] = model_history[layer].history
    
    # get overall performance metric
    model_performance[layer] = models[layer].evaluate(
        pcells_merged[layer][pdata_merged.test], plabels[pdata_merged.test],
        verbose = 0,
    )
    
    # get network scores for the dataset
    model_scores[layer] = models[layer].predict(
        pcells_merged[layer]
    )
    print('Finished layer ' + layer + '.')

Now we can save this model to a set of files.

In [None]:
import pickle
import os
flat_dir = modelpath + 'flat' # directory for our "flat", single calo-layer networks
try: os.makedirs(flat_dir)
except: pass

for layer in layers:
    print('Saving ' + layer)
    models[layer].save(flat_dir + '/' +'model_' + layer + '_flat_do20.h5')
    
    with open(flat_dir + '/' + 'model_' + layer + '_flat_do20.history','wb') as model_history_file:
        pickle.dump(model_history[layer], model_history_file)

Alternatively we can load a saved model from a set of files (use this to skip training above, if it's been done before).

In [None]:
import pickle
flat_dir = modelpath + 'flat' # directory for our "flat", single calo-layer networks
models = {}
model_history = {}
model_scores = {}
for layer in layers:
    print('Loading ' + layer)
    models[layer] = tf.keras.models.load_model(flat_dir + '/'+'model_' + layer + '_flat_do20.h5')
    
    # load history object
    with open(flat_dir + '/' + 'model_' + layer + '_flat_do20.history','rb') as model_history_file:
        model_history[layer] = pickle.load(model_history_file)
    
    # recalculate network scores for the dataset
    model_scores[layer] = models[layer].predict(
        pcells_merged[layer]
    )

Let's plot model accuracy and loss as a function of training epoch, for each layer.

In [None]:
#TODO: Log scale doesn't seem to actually affect the curves. Why? (weird mpl behaviour)
use_log = False
x_lim = [0.,100.]
y_lim = {'acc':[0.5,1.],'loss':[0.2,0.7]}
for layer in layers:
#     print(history_flat[layer_i].history.keys())
    plt.cla(); plt.clf()
    fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,6))
    fig.patch.set_facecolor('white')
    
    if(use_log): 
        ax1.set_yscale('log')
        ax2.set_yscale('log')
        
    ax1.plot(model_history[layer]['acc'])
    ax1.plot(model_history[layer]['val_acc'])
    ax1.set_title('model accuracy for ' + layer)
    ax1.set(xlabel = 'epoch',ylabel='accuracy')
    ax1.set_xlim(x_lim)
    ax1.set_ylim(y_lim['acc'])
    ax1.legend(['train', 'test'], loc='upper left')
    ax1.grid(True)
    extent = ax1.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
    plt.savefig('Plots/accuracy_' + layer + '.pdf',bbox_inches=extent)

    # summarize history for loss
    ax2.plot(model_history[layer]['loss'])
    ax2.plot(model_history[layer]['val_loss'])
    ax2.set_title('model loss for ' + layer)
    ax2.set(xlabel = 'epoch',ylabel='loss')
    ax2.set_xlim(x_lim)
    ax2.set_ylim(y_lim['loss'])
    ax2.legend(['train', 'test'], loc='upper right')
    ax2.grid(True)
    extent = ax2.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
    plt.savefig(plotpath + 'loss_' + layer + '.pdf',bbox_inches=extent)
    plt.show()

Now let's look at ROC curves.

In [None]:
from sklearn.metrics import roc_curve, auc

roc_fpr = {}
roc_tpr = {}
roc_thresh = {}
roc_auc = {}

for layer in layers:
    roc_fpr[layer], roc_tpr[layer], roc_thresh[layer] = roc_curve(
        plabels[pdata_merged.test][:,1],
        model_scores[layer][pdata_merged.test,1],
        drop_intermediate=False,
    )
    roc_auc[layer] = auc(roc_fpr[layer], roc_tpr[layer])
    print('Area under curve for ' + layer + ': ' + str(roc_auc[layer]))

Let's get the area under the curve for the old method.

In [None]:
lc_fpr, lc_tpr, lc_thresh = roc_curve(
    plabels[pdata_merged.test][:,1],
    1-pdata_merged["cluster_EM_PROBABILITY"][pdata_merged.test],
)
lc_auc = auc(lc_fpr, lc_tpr)
print("Area under curve for cluster_EM_PROB: " + str(lc_auc))

Now let's compare one of our new, simple networks, against the old method `EMProb`.

In [None]:
comp_method = 'EMB1'
pu.roc_plot([lc_fpr,roc_fpr[comp_method]],[lc_tpr,roc_tpr[comp_method]],
            figfile=plotpath + 'roc_lc_only.pdf',
            labels=['LC EMProb (area = {:.3f})'.format(lc_auc),
                    comp_method +' (area = {:.3f})'.format(roc_auc[comp_method])],
            extra_lines=[[[0, 1], [0, 1]]],
            title='Simple NN ROC curve: classification of $\pi^+$ vs. $\pi^0$')

# plt.cla(); plt.clf()
# fig = plt.figure()
# fig.patch.set_facecolor('white')
# plt.plot(lc_fpr, lc_tpr, label='LC EMProb (area = {:.3f})'.format(lc_auc))
# plt.plot(roc_fpr['EMB1'], roc_tpr['EMB1'], label='EMB1 (area = {:.3f})'.format(roc_auc['EMB1']))
# plt.plot([0, 1], [0, 1], 'k--')
# plt.xlim(0,1.1)
# plt.ylim(0,1.1)
# plt.title('Simple NN ROC curve: classification of $\pi^+$ vs. $\pi^0$')
# ampl.set_xlabel('False positive rate')
# ampl.set_ylabel('True positive rate')
# plt.legend(loc='best')
# plt.savefig(plotpath + 'roc_lc_only.pdf')
# plt.show()

In [None]:
plt.cla(); plt.clf()
# fig, ax = plt.subplots(1, 2, tight_layout=True, figsize=(10,4))
# fig.patch.set_facecolor('white')

# colors for our simple NN's
colors = ['xkcd:red','xkcd:light orange','xkcd:gold','xkcd:green','xkcd:blue','xkcd:violet']

fig = plt.figure()
fig.patch.set_facecolor('white')
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(lc_fpr, lc_tpr, label='LC EMProb (area = {:.3f})'.format(lc_auc),linestyle='-.')
for layer_i, layer_name in enumerate(layers):
    plt.plot(roc_fpr[layer_name], roc_tpr[layer_name], label='{} (area = {:.3f})'.format(layer_name, roc_auc[layer_name]),color=colors[layer_i])
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('Simple NN ROC curve: classification of $\pi^+$ vs. $\pi^0$')
plt.legend(loc='best')
plt.savefig('Plots/roc_layers.pdf')
plt.show()

# Zoom in view of the upper left corner.
fig = plt.figure()
fig.patch.set_facecolor('white')
plt.xlim(0, 0.25)
plt.ylim(0.6, 1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(lc_fpr, lc_tpr, label='LC EMProb (area = {:.3f})'.format(lc_auc),linestyle='-.')
for layer_i, layer_name in enumerate(layers):
    plt.plot(roc_fpr[layer_name], roc_tpr[layer_name], label='{} (area = {:.3f})'.format(layer_name, roc_auc[layer_name]),color=colors[layer_i])
# ax[1].plot(fpr_nn, tpr_nn, label='Simple NN (area = {:.3f})'.format(auc_nn))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve (zoomed in at top left)')
plt.legend(loc='best')
plt.savefig('Plots/roc_zoom_layers.pdf')
plt.show()

## ROC Curve Scans
<div style="text-align: right"> <a href="#Image-Classification-of-ATLAS-Calorimeter-Topo-Clusters-Rewrite">Top</a> </div>

In [None]:
# a convenience class for creating ROC curve scans
display_digits=2
class roc_var:
    def __init__(self,
                 name, # name of variable as it appears in the root file
                 bins, # endpoints of bins as a list
                 df,   # dataframe to construct subsets from
                 latex='', # optional latex to display variable name with
                 vlist=None, # optional list to append class instance to
                ):
        self.name = name
        self.bins = bins
        
        if(latex == ''): self.latex = name
        else:            self.latex = latex
        
        self.selections = []
        self.labels = []
        for i, point in enumerate(self.bins):
            if(i == 0):
                self.selections.append( df[name]<point )
                self.labels.append(self.latex+'<'+str(round(point,display_digits)))
            else:
                self.selections.append( (df[name]>self.bins[i-1]) & (df[name]<self.bins[i]) )
                self.labels.append(str(round(self.bins[i-1],display_digits))+'<'+self.latex+'<'+str(round(point,display_digits)))
                if(i == len(bins)-1):
                    self.selections.append( df[name]>point )
                    self.labels.append(self.latex+'>'+str(round(point,display_digits)))
        
        if(vlist != None):
            vlist.append(self)

In [None]:
def roc_scan(varlist,scan_targets,labels):
    '''
    Creates a set of ROC curve plots by scanning over the specified variables.
    One set is created for each target (neural net score dataset).
    
    varlist: a list of roc_var instances to scan over
    scan_targets: a list of neural net score datasets to use
    labels: a list of target names (strings); must be the same length as scan_targets
    '''
    for target, target_label in zip(scan_targets,labels):
        for v in varlist:
            # prepare matplotlib figure
            plt.cla()
            plt.clf()
            fig = plt.figure()
            fig.patch.set_facecolor('white')
            plt.plot([0, 1], [0, 1], 'k--')

            for binning, label in zip(v.selections,v.labels):
                # first generate ROC curve
                x, y, t = roc_curve(
                    plabels[pdata_merged.test & binning][:,1],
                    target[pdata_merged.test & binning],
                    drop_intermediate=False,
                )
                var_auc = auc(x,y)
                plt.plot(x, y, label=label+' (area = {:.3f})'.format(var_auc))

            plt.title('ROC Scan of '+target_label+' over '+v.latex)
            plt.xlim(0,1.1)
            plt.ylim(0,1.1)
            ampl.set_xlabel('False positive rate')
            ampl.set_ylabel('True positive rate')
            plt.legend()
            plt.savefig(plotpath+'roc_scan_'+target_label+'_'+v.name+'.pdf')
            plt.show()

In [None]:
# specify variables we are interested in scanning over
varlist = []
cluster_e = roc_var(
    name='clusterE',
    bins=[1,10,50,500],
    df=pdata_merged,
    vlist=varlist,
)

pdata_merged['abs_clusterEta'] = np.abs(pdata_merged.clusterEta)
cluster_eta = roc_var(
    name='abs_clusterEta',
    bins=[0.2,0.4,0.6],
    df=pdata_merged,
    vlist=varlist,
    latex='abs(clusterEta)'
)


In [None]:
# begin the scan
targets = [model_scores[layer][:,1] for layer in layers]+[1-pdata_merged["cluster_EM_PROBABILITY"]]
labels = layers+['LC']
roc_scan(varlist,targets,labels)
        

## ResNet

We can also train an instance of ResNet50.

To do this, we will want to appropriately up/downscale all of our calorimeter images, so they are all of the same dimensions.

In [10]:
for key, val in cell_shapes.items():
    print(key,val)

EMB1 (128, 4)
EMB2 (16, 16)
EMB3 (8, 16)
TileBar0 (4, 4)
TileBar1 (4, 4)
TileBar2 (2, 4)


Based on the above, it seems like a reasonable choice would be rescaling all images to be of shape `(128,16)`. That corresponds with the maximum dimensions along each axis, so we'll just need to do some upscaling. The nice thing of avoiding downscaling is that we are not losing information.

In [14]:
from models import resnet
tf.keras.backend.set_image_data_format('channels_last')
lr = 5e-5
input_shape = (128,16)
model_resnet = resnet(strategy, lr=lr)(input_shape)

In [12]:
# minor data prep -- key names match those defined within resnet model in models.py!
pcells_merged_unflattened = {'input' + str(i):pcells_merged[key].reshape(tuple([-1] + list(cell_shapes[key]))) for i,key in enumerate(pcells_merged.keys())}

rn_train = {key:val[pdata_merged.train] for key,val in pcells_merged_unflattened.items()}
rn_valid = {key:val[pdata_merged.val] for key,val in pcells_merged_unflattened.items()}
rn_test = {key:val[pdata_merged.test] for key,val in pcells_merged_unflattened.items()}

In [None]:
nepochs = 10
batch_size = 20 * ngpu
verbose = 1 # 2 for a lot of printouts

model_key = 'resnet'

# train+validate model
model_history[model_key] = model_resnet.fit(
    x=rn_train,
    y=plabels[pdata_merged.train],
    validation_data=(
        rn_valid,
        plabels[pdata_merged.val]
    ),
    epochs=nepochs,
    batch_size=batch_size,
    verbose=verbose
)
    
model_history[model_key] = model_history[model_key].history
    
# get overall performance metric
model_performance[model_key] = model_resnet.evaluate(
    x=rn_test,
    y=plabels[pdata_merged.test],
    verbose=0
)
    
# get network scores for the dataset
model_scores[model_key] = models[layer].predict(
    pcells_merged_unflattened
)

Epoch 1/10
 2956/14840 [====>.........................] - ETA: 10:14 - loss: 0.4077 - acc: 0.8615

## Combination Network
<div style="text-align: right"> <a href="#Image-Classification-of-ATLAS-Calorimeter-Topo-Clusters-Rewrite">Top</a> </div>

Here we train a simple combination network... its inputs will be the *outputs* of our simple, feed-forward neural networks from above.

In [None]:
from models import simple_combine_model

model_scores_stack = np.column_stack( [model_scores[layer][:,1] for layer in model_scores] )
lr = 1e-3
n_input = model_scores_stack.shape[1]
model_simpleCombine = simple_combine_model(strategy, lr=lr, n_input = n_input)()

In [None]:
epochs = 50
batch_size = 200*ngpu
verbose = 2

simpleCombine_history = model_simpleCombine.fit(model_scores_stack[pdata_merged.train], plabels[pdata_merged.train],
                                                validation_data=(model_scores_stack[pdata_merged.val],
                                                                 plabels[pdata_merged.val]),
                                                epochs=epochs, batch_size=batch_size, verbose=verbose)
simpleCombine_history = simpleCombine_history.history

... and save the results to a file.

In [None]:
import pickle
import os
simple_dir = modelpath + 'simple' # directory for our "simple" network
try: os.makedirs(simple_dir)
except: pass

model_simpleCombine.save(simple_dir + '/' +'model_simple_do20.h5')
with open(simple_dir + '/' + 'model_simple_do20.history','wb') as model_history_file:
    pickle.dump(simpleCombine_history, model_history_file)

We can also load the model from the file.

In [None]:
import pickle
simple_dir = modelpath + 'simple' # directory for our "simple" network
model_simpleCombine = tf.keras.models.load_model(simple_dir + '/' + 'model_simple_do20.h5')
with open(simple_dir + '/' + 'model_simple_do20.history','rb') as model_history_file:
    simpleCombine_history = pickle.load(model_history_file)

Let's plot some results from this model.

In [None]:
pu.make_plot(
    [simpleCombine_history['acc'],simpleCombine_history['val_acc']],
    figfile = plotpath+'accuracy_simpleCombine.pdf',
    xlabel = 'epoch', ylabel = 'accuracy',
    x_log = False, y_log = False,
    labels = ['train','test'],
    title = 'Model accuracy for simple combination',
)

# summarize history for loss
pu.make_plot(
    [simpleCombine_history['loss'],simpleCombine_history['val_loss']],
    figfile = plotpath+'loss_simpleCombine.pdf',
    xlabel = 'epoch', ylabel = 'loss',
    x_log = False, y_log = False,
    labels = ['train','test'],
    title = 'Model loss for simple combination',
)

In [None]:
from sklearn.metrics import roc_curve, auc

simpleCombine_score = model_simpleCombine.predict(model_scores_stack)
simpleCombine_fpr, simpleCombine_tpr, simpleCombine_thresh = roc_curve(
    plabels[pdata_merged.test,1], simpleCombine_score[pdata_merged.test,1]
)
simpleCombine_auc = auc(simpleCombine_fpr, simpleCombine_tpr)
print("Area under curve for simpleCombine: " + str(simpleCombine_auc))

In [None]:
pu.roc_plot(
    [simpleCombine_fpr,lc_fpr]+[roc_fpr[layer] for layer in layers],
    [simpleCombine_tpr,lc_tpr]+[roc_tpr[layer] for layer in layers],
    figfile = plotpath+'roc_simpleCombine.pdf',
    extra_lines=[[[0, 1], [0, 1]]], labels=[
        'simpleCombine (area = {:.3f})'.format(simpleCombine_auc),
        'LC EMProb (area = {:.3f})'.format(lc_auc),
    ]+[layer+' (area = {:.3f})'.format(roc_auc[layer]) for layer in layers],
    title='Simple NN ROC curve: classification of $\pi^+$ vs. $\pi^0$')

pu.roc_plot(
    [simpleCombine_fpr,lc_fpr]+[roc_fpr[layer] for layer in layers],
    [simpleCombine_tpr,lc_tpr]+[roc_tpr[layer] for layer in layers],
    figfile = plotpath+'roc_simpleCombine.pdf',
    x_max=0.25, y_min=0.6,
    extra_lines=[[[0, 1], [0, 1]]], labels=[
        'simpleCombine (area = {:.3f})'.format(simpleCombine_auc),
        'LC EMProb (area = {:.3f})'.format(lc_auc),
    ]+[layer+' (area = {:.3f})'.format(roc_auc[layer]) for layer in layers],
    title='Simple NN ROC curve: zoomed in at top left')

In [None]:
roc_scan(varlist,[simpleCombine_score[:,1]],['simpleCombine'])