In [1]:
import sys
sys.path.append('../')

import utils as pic

# Classifing Collider Images Using 3D Convolution Neural Networks

Evan Koenig

PHY7097 Final Presentation

## Introduction

### At the LHC, proton-proton collisions are used to study the interactions between elementary particles and the fundementals of physics. From these collisions, thousands of particles are create and measured in trackers and calorimeters to be able to reconstruct each event. One crucial step in this reconstruction is particle identification. This can be a difficult task, involving many variables and complicated relationships, a perfect job for machine learning. 

<img src="plots/CollisionImage.png" alt="Collision Image">

[[1] Hackathon GitHub](https://github.com/ML4SCI/ML4SCIHackathon/blob/main/ParticleImagesChallenge/images/CollisionImage.png)

## Particle Images

### The electromagnetic calorimeter (ECAL) is used to measure energy showers from electrons and photons. These energy showers are captured in pixel like cells in the ECAL and can be displayed in 32x32 images
<img src="plots/energy-images.jpg" alt="Calorimeter Images" >


### Along with the energy deposits, the relative time of the hits is also saved.
<img src="plots/timing-images.jpg" alt="Timing images">

### To understand the temporal component, we can investigate a space-time scatter plot of the energy deposits.
<img src="plots/spacetime-scatter.jpg" alt="Spacetime scatter plot">

### We can convert this 3D scatter plot to a 3D image by binning the time. We are then left with a dataset that we can feed into a 3D convolutional network.

## 3D Convolutional Network

### This 3D convolutional network is a generalization of the 2D convolutional network. It implements 3D kernal filters that allow the network to understand relationships between each voxel (3D pixel).
<img src="plots/1-s2.0-S0010448518301349-gr3_lrg.jpg" alt="3D convolutional network scheme">

[[2] FeatureNet](https://www.sciencedirect.com/science/article/pii/S0010448518301349#fig3)

## Building the Model

### The 3D image data is transformed so that 
* ### Energy > 0.005: to reduce small energy deposits
* ### |Time| < 0.1: to reduce empty cells
* ### 21 time bins = 0.0099 time step: to keep enough time information 
* ### Earlier energy deposits linger in later time bins: so that the 3D image is like a folded out movie

<p float="left">
<img src="plots/photon.gif" alt="Photon energy gif" width=600>
<img src="plots/electron.gif" alt="Electron energy gif" width=600>
</p>

### In python, the model is defined as follows

In [2]:
from tensorflow import keras
from tensorflow.keras import layers
import pickle

model = keras.Sequential()

model.add(layers.Reshape((21, 32, 32, 1),
          input_shape=(21, 32, 32)))

model.add(layers.Conv3D(64, 3, padding='same', activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool3D())

model.add(layers.Conv3D(32, 3, padding='same', activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool3D())

model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(2, activation='softmax'))

In [3]:
modeldir = '../models/cnn3d-v2-1.00e-01-9.90e-03/'
model = keras.models.load_model(modeldir)
with open(f'{modeldir}/history.pkl','rb') as f_history: history = pickle.load(f_history)
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 reshape (Reshape)           (None, 21, 32, 32, 1)     0         
                                                                 
 conv3d (Conv3D)             (None, 21, 32, 32, 64)    1792      
                                                                 
 dropout (Dropout)           (None, 21, 32, 32, 64)    0         
                                                                 
 batch_normalization (BatchN  (None, 21, 32, 32, 64)   256       
 ormalization)                                                   
                                                                 
 max_pooling3d (MaxPooling3D  (None, 10, 16, 16, 64)   0         
 )                                                               
                                                                 
 conv3d_1 (Conv3D)           (None, 10, 16, 16, 32)    5

## Model Results

### This model was trained on
* ### 20000 training/validation images
* ### 100 batch size
* ### 100 epochs

<img src="plots/model-history.jpg" alt="Model history">

### The model's performance on the validation data has a lot of fluctuations. How does the model perform on test data?

### The performance of the model can be evaluated using the reciever operator characteristic (ROC) curve. This curve plots the rate of correctly labeling an image agains the rate of incorrectly labeling one. 

<img src="plots/800px-Roc_curve.svg.png" alt="General ROC curve">

### For this model the test data gives a ROC curve of 

<img src="plots/model-roc.jpg" alt="Model ROC curve">

### This model does perform well on the training data, but the fluctuations are a concern. There is a chance that some hyperparameter optimization could help!

## Hyperparameter Optimization Using Talos

### Optimizing hyperparameters can be a daunting task, since there can be some many different parameters to change. The Talos python package can take any keras model, with minimal changes to the existing code, and produce a grid or random search of hyperparameter space. The package can be installed using `pip install talos`. Taking the model that has built so far, we can create a parameter search for the number of filters, kernal size, number of dense nodes, and activation. Taking the existing model, we can put it into its own function

In [4]:
def cnn3d_model(X_train,y_train,X_valid,y_valid,params):
    """Build CNN3D model and train it with passed parameters

    Args:
        X_train (array): Training data
        y_train (array): Training target
        X_valid (array): Validation data
        y_valid (array): Validation target
        params (dict): Dictionary of parameters
    """
    model = keras.Sequential()

    model.add(layers.Reshape((21, 32, 32, 1),
            input_shape=(21, 32, 32)))

    model.add(layers.Conv3D(params['filters_1'], params['kernal_1'], padding='same', activation=params['activation']))
    model.add(layers.Dropout(0.2))
    model.add(layers.BatchNormalization())
    model.add(layers.MaxPool3D())

    model.add(layers.Conv3D(params['filters_2'], params['kernal_2'], padding='same', activation=params['activation']))
    model.add(layers.Dropout(0.2))
    model.add(layers.BatchNormalization())
    model.add(layers.MaxPool3D())

    model.add(layers.Flatten())
    model.add(layers.Dense(params['dense_1'], activation=params['activation']))
    model.add(layers.Dense(params['dense_2'], activation=params['activation']))
    model.add(layers.Dense(2, activation='softmax'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    history = model.fit(
        X_train, y_train,
        validation_data=(X_valid, y_valid),
        epochs=20,
        batch_size=10,
        shuffle=True,
        verbose=0
    )
    return history,model

### Once the model function is defined, we need to define the parameters for Talos to search

In [5]:
params = {
    'activation': ['relu','elu'],
    'filters_1': [32,64,96],
    'kernal_1': [1,3,5],
    'filters_2': [32,64,96],
    'kernal_2': [1,3,5],
    'dense_1': [32,64,96],
    'dense_2': [32,64,96],
}

### With the parameters defined, we can then call the Talos.Scan to begin a parameter search. However, a full grid search would result in 486 permutations which is long and expensive. Talos allows you select a random fraction of the permutations to run as a random search.

In [6]:
X_train, y_train = pic.load_data(0,100)

X_train,_,_,_ = pic.timeordered_BC(X_train,cumulative=True,min_t=-0.1,max_t=0.1)
y_train = keras.utils.to_categorical(y_train)

In [7]:
import talos

scan = talos.Scan(
    X_train,y_train,params,cnn3d_model,'cnn3d',fraction_limit=0.01
)

100%|██████████| 14/14 [01:47<00:00,  7.69s/it]


### Once the scan is finished, the scan object holds the metrics for each of the models that it ran in a pandas DataFrame

In [8]:
scan.data

Unnamed: 0,start,end,duration,round_epochs,loss,accuracy,val_loss,val_accuracy,activation,dense_1,dense_2,filters_1,filters_2,kernal_1,kernal_2
0,11/29/21-201603,11/29/21-201613,9.483703,20,0.523241,0.7,0.72106,0.483333,relu,64,32,96,32,5,1
1,11/29/21-201613,11/29/21-201622,8.852265,20,0.612493,0.657143,0.753675,0.516667,relu,96,96,96,32,1,5
2,11/29/21-201622,11/29/21-201631,9.023381,20,0.63573,0.65,3.772153,0.516667,elu,96,64,96,32,1,5
3,11/29/21-201631,11/29/21-201638,7.201939,20,0.592546,0.642857,2.848106,0.516667,elu,64,96,96,96,3,1
4,11/29/21-201638,11/29/21-201644,5.527343,20,0.681178,0.542857,2.189734,0.483333,elu,32,96,64,64,1,1
5,11/29/21-201644,11/29/21-201652,7.575909,20,0.677442,0.571429,1.640824,0.516667,elu,96,96,32,64,5,5
6,11/29/21-201652,11/29/21-201706,13.611354,20,0.640614,0.642857,1.392052,0.516667,relu,64,96,96,96,1,5
7,11/29/21-201706,11/29/21-201712,5.817755,20,0.398432,0.807143,0.698768,0.466667,relu,96,32,64,96,3,1
8,11/29/21-201712,11/29/21-201719,6.945966,20,0.410493,0.792857,0.697622,0.516667,relu,96,64,96,64,3,1
9,11/29/21-201719,11/29/21-201724,4.835487,20,0.461609,0.764286,0.689068,0.533333,elu,32,32,32,64,5,1


### This information is also saved in a new directory as a csv file. For the models it trained, we can used pandas methods to determine the best parameters

In [11]:
scan.data.sort_values("val_loss")[:5]

Unnamed: 0,start,end,duration,round_epochs,loss,accuracy,val_loss,val_accuracy,activation,dense_1,dense_2,filters_1,filters_2,kernal_1,kernal_2
9,11/29/21-201719,11/29/21-201724,4.835487,20,0.461609,0.764286,0.689068,0.533333,elu,32,32,32,64,5,1
8,11/29/21-201712,11/29/21-201719,6.945966,20,0.410493,0.792857,0.697622,0.516667,relu,96,64,96,64,3,1
7,11/29/21-201706,11/29/21-201712,5.817755,20,0.398432,0.807143,0.698768,0.466667,relu,96,32,64,96,3,1
0,11/29/21-201603,11/29/21-201613,9.483703,20,0.523241,0.7,0.72106,0.483333,relu,64,32,96,32,5,1
11,11/29/21-201733,11/29/21-201739,6.259173,20,0.635692,0.585714,0.721075,0.483333,elu,32,96,64,64,5,1


## Conclusion

### The model presented does very well with the test data, but there is room for improvement. Investigating a large variety of hyperparameters, such as time steps, energy cuts, and model complexity, could help improve the model. Implementing the Talos framework into the workflow can help streamline this process, but it is still a time expensive task.