# Particle Classification

In this notebook I will create a neural network to identify particles based on particle detector signals. This data comes from a GEANT based simulation of electron-proton inelastic scattering and can be found [here](https://www.kaggle.com/naharrison/particle-identification-from-detector-responses)

In [1]:
# using plaidml to connect to my eGPU
import os

os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

### Preprocessing Data

In [2]:
# read csv file into a pandas dataframe
import numpy as np
import pandas as pd

data = pd.read_csv('pid-5M.csv')
data.head(10)



Unnamed: 0,id,p,theta,beta,nphe,ein,eout
0,211,0.780041,1.08148,0.989962,0,0.0,0.0
1,211,0.260929,0.778892,0.90245,0,0.0,0.0
2,2212,0.773022,0.185953,0.642428,4,0.1019,0.0
3,211,0.476997,0.445561,0.951471,0,0.0,0.0
4,2212,2.12329,0.337332,0.908652,0,0.034379,0.049256
5,211,0.403296,0.694215,0.958553,0,0.0,0.0
6,2212,1.38262,0.436689,0.844835,0,0.200275,0.053651
7,2212,1.13313,0.276831,0.781295,0,0.044038,0.09398
8,2212,0.656291,0.542507,0.560291,0,0.083406,0.0
9,2212,2.07721,0.130479,0.909951,0,0.036164,0.04596


The category that is being predicted has four different particles which means that this is a multi-class classification problem and the label needs to be modified. Fot this data, a simple way to modify the label is to crate dummy columns. 

In [3]:
dummy_field = ['id']
for each in dummy_field:
    dummies = pd.get_dummies(data[each], prefix=each, drop_first=False)
    data = pd.concat([data, dummies], axis=1)
    
drop = ['id']

data = data.drop(drop, axis=1)
data.head()

Unnamed: 0,p,theta,beta,nphe,ein,eout,id_-11,id_211,id_321,id_2212
0,0.780041,1.08148,0.989962,0,0.0,0.0,0,1,0,0
1,0.260929,0.778892,0.90245,0,0.0,0.0,0,1,0,0
2,0.773022,0.185953,0.642428,4,0.1019,0.0,0,0,0,1
3,0.476997,0.445561,0.951471,0,0.0,0.0,0,1,0,0
4,2.12329,0.337332,0.908652,0,0.034379,0.049256,0,0,0,1


In [4]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.2)
train_labels = train[['id_-11', 'id_211', 'id_321', 'id_2212']]
test_labels = test[['id_-11', 'id_211', 'id_321', 'id_2212']]


In [5]:
train_labels.head()

Unnamed: 0,id_-11,id_211,id_321,id_2212
4949263,0,1,0,0
992357,0,0,0,1
1186942,0,1,0,0
2093962,0,1,0,0
757787,0,0,0,1


In [12]:
train=train.drop(['id_-11', 'id_211', 'id_321', 'id_2212'], axis=1)


In [13]:
test=test.drop(['id_-11', 'id_211', 'id_321', 'id_2212'], axis=1)

### Building the model

The model that I have built for this dataa is a simple classification model containing a couple of dense layers and a couple of dropout layers. A couple of differences that this model has from something like a binary classification model is that I have used the softmax activation function on the final layer and I have used categorical crossentropy for calculating loss. The softmax function is like a multi class sigmoid making the final results all add up to one leaving you with a probability for each outcome which is helpful in multi-class classification. Categorical cross entropy is similar to the softmax function except that it is used as a loss function.

In [19]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras import optimizers
from keras import layers

dims = train.shape[1]
print(dims, 'dims')
print("Building model.....")

model = Sequential()
model.add(Dense(64, input_dim=6, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(4, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])


6 dims
Building model.....


In [20]:
model.fit(train, train_labels,
          epochs=2,
          batch_size=128)



Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x11d4ace80>

In [21]:
score = model.evaluate(test, test_labels, batch_size=128)



In [22]:
score

[0.12766741422224046, 0.956481]

As you can see the model trained well and the results were repeated on the test data which is always a good thing. 