# Reproducing the results of the paper *Searching for Exotic Particles in High-Energy Physics with Deep Learning*

The paper *Searching for Exotic Particles in High-Energy Physics with Deep Learning* by Baldi et al. is one of the most popular papers presenting the successful usage of deep neural networks in high-energy particle physics applications.

This example reproduces this important result with only about 100 lines of code using Keras.



In [None]:
import numpy as np
np.random.seed(1234)
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
import subprocess
import h5py
import pickle

from keras.models import Sequential
from keras.layers import Dense, advanced_activations, Dropout
from keras import callbacks
from keras.optimizers import Adam
from keras.models import load_model


from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc

import matplotlib.pyplot as plt

## Download the dataset

This can take a while! The final dataset has a size of about 1.2 GB.

In [None]:
#if not os.path.exists("HIGGS_small.h5"):
#    subprocess.call("wget http://mlphysics.ics.uci.edu/data/higgs/HIGGS_small.h5", shell=True)

In [None]:
#if not os.path.exists("HIGGS.h5"):
#    subprocess.call("wget http://mlphysics.ics.uci.edu/data/higgs/HIGGS.h5", shell=True)

## Read-out the inputs and targets

The inputs consist of 21 low-level and and 7 high-level variables. We want to reproduce the result of the paper with all features as inputs called `lo+hi-level` in the paper.

In [None]:
file_ = h5py.File("HIGGS_data_small.h5")
inputs = np.array(file_["features"][:,:])
targets = np.array(file_["targets"])

In [None]:
print("input data",inputs.shape)
print(inputs[1:5])
print("target data ",targets.shape)
print(targets[1:5])

## Set up the models

The model defined below do not match exactly the setup in the paper. However, we define a shallow neural network with a single hidden layer and a deep neural network with 5 hidden layers.

### Shallow network
  
we use only one hidden layer with 500 units and tanh activation function  

In [None]:
model_shallow = Sequential()
model_shallow.add(Dense(500, kernel_initializer="glorot_normal", activation="tanh",
    input_dim=inputs.shape[1]))
model_shallow.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))

model_shallow.summary()

### Deep network

5 hidden layers with 200 units and RELU activation functions 

In [None]:
model_deep = Sequential()

model_deep.add(Dense(200, kernel_initializer="glorot_normal", activation="relu" ,input_dim=inputs.shape[1]))
model_deep.add(Dense(200, kernel_initializer="glorot_normal", activation="relu"))
model_deep.add(Dense(200, kernel_initializer="glorot_normal", activation="relu"))
model_deep.add(Dense(200, kernel_initializer="glorot_normal", activation="relu"))
model_deep.add(Dense(200, kernel_initializer="glorot_normal", activation="relu"))

model_deep.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))

model_deep.summary()

We compile then model by defining the loss function and the type of optimizer we are going to use. 
In this case we use *Nadam*, the Nesterov Adam optimizer 

In [None]:
for model in [model_shallow, model_deep]:
    model.compile(
        loss="binary_crossentropy",
        optimizer="nadam",
        metrics=["accuracy"])

## Define training and test data

To speed up the training, we use only a reduce number of all the data set. We split then the data in traiing and test data, using 20% for testing and the remaining fpr training.
Feel free to enlarge the amount of data to be used. 
The dataset contains in total ~ 10M events. We use only now 100k

In [None]:
# total number of events 
ntotal_evts = inputs.shape[0]
nused_evts = 1000000
evtoffset = 0
print("using for test and training",nused_evts," of a total ",ntotal_evts)

inputs_train, inputs_test, targets_train, targets_test = train_test_split(
        inputs[evtoffset:evtoffset+nused_evts], targets[evtoffset:evtoffset+nused_evts], test_size=0.10, random_state=1234, shuffle=True)

## Prepare pre-processing

As preprocessing, we use a standard scaler provided by the `sklearn` package. This preprocessing method takes each input and subtracts the mean and then divides by the standard-deviation so that the final distribution is centered around 0 with a width of 1.

In [None]:
preprocessing_input = StandardScaler()
preprocessing_input.fit(inputs_train)
pickle.dump(preprocessing_input, open("HIGGS_preprocessing.pickle", "wb"))

## Train the models

The following code trains the models. Here, you can experience quickly why deep-learning is heavily dependent on GPUs to speed up the training!
During training we use 25% of the data for the validation and 75% for the actual training.
We define here also the batch size (e.g. 512).
At the end we save the model in case we need to reuse later for predictions

In [None]:
for model, name in zip([model_shallow, model_deep], ["HIGGS_model_shallow.h5", "HIGGS_model_deep.h5"]):
    print("\nTrain now the model ",name)
    model.fit(
            preprocessing_input.transform(inputs_train),
            targets_train,
            batch_size=512,
            epochs=10,
            validation_split=0.25,
            callbacks=[callbacks.ModelCheckpoint(name, monitor = 'val_loss', verbose=True,
                                                save_best_only=True, mode='auto')]
    )
    
   
    #model.save(modelFileName)

## Test the models

We load the models and the pre-processing parameters to apply to the previously defined test data 


In [None]:
model_shallow = load_model("HIGGS_model_shallow.h5")
model_deep = load_model("HIGGS_model_deep.h5")

In [None]:
preprocessing_input = pickle.load(open("HIGGS_preprocessing.pickle", "rb"))

### Model prediction using the test data 

In [None]:
print(inputs_test.shape[0])

ntest_evts = inputs_test.shape[0]
#in case we want to use a smaller set of test data 
#num_events = 200000
predictions_shallow = model_shallow.predict(
        preprocessing_input.transform(inputs_test[:ntest_evts]))
predictions_deep = model_deep.predict(
        preprocessing_input.transform(inputs_test[:ntest_evts]))

### Compute ROC curve and AUC

In [None]:
fpr_shallow, tpr_shallow, _ = roc_curve(targets_test[:ntest_evts], predictions_shallow)
fpr_deep, tpr_deep, _ = roc_curve(targets_test[:ntest_evts], predictions_deep)

auc_shallow = auc(fpr_shallow, tpr_shallow)
auc_deep = auc(fpr_deep, tpr_deep)

print("AUC shallow model = ",auc_shallow)
print("AUC  deep model   = ",auc_deep)

### Plot Result 

In [None]:
plt.figure(figsize=(6,6))
plt.plot(tpr_deep, 1.0-fpr_deep, lw=3, alpha=0.8,
        label="Deep (AUC={:.2f})".format(auc_deep))
plt.plot(tpr_shallow, 1.0-fpr_shallow, lw=3, alpha=0.8,
        label="Shallow (AUC={:.2f})".format(auc_shallow))
plt.xlabel("Signal efficiency")
plt.ylabel("Background rejection")
plt.legend(loc=3)
plt.xlim((0.0, 1.0))
plt.ylim((0.0, 1.0))
plt.savefig("HIGGS_roc.png", bbox_inches="tight")