# AI Application in HEP Tutorial (Supervised): Classification

In the previous tutorial, we went through a simple regression task where we attempted to use ML algorithms to model the invarint mass of the heavy particle using the jet kinematic info. Another major application of ML in HEP is classification, as quite often the task is to distinguish signal from background.

In this tutorial, we will see how to do solve a classification problem in HEP. We will try to identify heavy particle decays with multijet events as the background. Since we have introduced the necessary SW tricks and details in the last tutorial, we will skip them this time. 

The example in this tutorial is only considering one mass point (1 TeV Z'). More signal points are provided for you to exercise further as homework. 

## Step I: Import libraries and load the data

Everything is the same as the last tutorial except we have to also load the background data. 

In [None]:
import numpy as np
import pandas as pd
from math import *
from tqdm import tqdm
import uproot as r
import awkward as ak
import vector

# Use the following syntax to read a delphes root file and parse the branches to arrays
# You might want to open the test file using ROOT to see the structure. 
# What is the tree name? How are all the branches defined?

signal = r.open("./data/delphes_zprime_1TeV.root")
delphes_tree_signal = signal["Delphes;1"]
background = r.open("./data/delphes_multijet.root")
delphes_tree_background = background["Delphes;1"]

pt_signal = delphes_tree_signal["Jet.PT"].array()
eta_signal = delphes_tree_signal["Jet.Eta"].array()
phi_signal = delphes_tree_signal["Jet.Phi"].array()
m_signal = delphes_tree_signal["Jet.Mass"].array()

pt_background = delphes_tree_background["Jet.PT"].array()
eta_background = delphes_tree_background["Jet.Eta"].array()
phi_background = delphes_tree_background["Jet.Phi"].array()
m_background = delphes_tree_background["Jet.Mass"].array()

# Here we use the vector (https://github.com/scikit-hep/vector) library
# It allows us to use its implemented Lorentz vector functions.

vector.register_awkward()

jet_vec_signal = ak.zip({
  "pt": pt_signal,
  "phi": phi_signal,
  "eta": eta_signal,
  "mass": m_signal,
},with_name="Momentum4D")

jet_vec_background = ak.zip({
  "pt": pt_background,
  "phi": phi_background,
  "eta": eta_background,
  "mass": m_background,
},with_name="Momentum4D")   

jet_vec_select_signal = jet_vec_signal[ak.num(jet_vec_signal) >= 2] 
jet_vec_select_background = jet_vec_background[ak.num(jet_vec_background) >= 2] 

lead_jet_signal = jet_vec_select_signal[:,0] # https://stackoverflow.com/questions/16815928/what-does-mean-on-numpy-arrays
sublead_jet_signal = jet_vec_select_signal[:,1]

dijet_signal = lead_jet_signal + sublead_jet_signal
dijet_mass_signal = dijet_signal.mass

lead_jet_background = jet_vec_select_background[:,0] 
sublead_jet_background = jet_vec_select_background[:,1]

dijet_background = lead_jet_background + sublead_jet_background
dijet_mass_background = dijet_background.mass

## Step II: Visualizing the data

As usually, let's check the data. Please notice that we do have both the signal and background processes. We should compare the properties.

**Question:** Which quantities have the best discriminating power?

In [None]:
import matplotlib.pylab as plt

plt.hist(pt_signal[ak.num(pt_signal) >= 1][:,0],bins=20,range=(0,1000),density=True,
         label='Signal',fill=False,edgecolor='red',linewidth=3,histtype='step');
plt.hist(pt_background[ak.num(pt_background) >= 1][:,0],bins=20,range=(0,1000),density=True,
         label='Background',fill=False,edgecolor='blue',linewidth=3,histtype='step');
plt.xlabel(r'Leading Jet $p_T$ (GeV/c)',fontsize=14)
plt.legend(prop={'size': 10})
plt.show()

plt.hist(dijet_mass_signal,bins=20,range=(0,2000),density=True,
         label='Signal',fill=False,edgecolor='red',linewidth=3,histtype='step');
plt.hist(dijet_mass_background,bins=20,range=(0,2000),density=True,
         label='Background',fill=False,edgecolor='blue',linewidth=3,histtype='step');
plt.xlabel(r'Mass  (GeV/c)',fontsize=14)
plt.show()

**Homework**: 

**1**: Compare other quantities between signal and background, and identify possible candidates to select (veto) signal (background).

**2**: What variables can be constructed for the di-jet system besides the invariant mass?

## Step III: Save datasets for ML

We can either save various separate files and then combine them when loading the datasets, or combine the datasets before saving them.

**Question**: Can you try the latter approach, combining the datasets first and then saving them as a signal csv?

In [None]:
lead_jet_pt_signal = ak.to_numpy(lead_jet_signal.pt).reshape(1,-1)
lead_jet_eta_signal = ak.to_numpy(lead_jet_signal.eta).reshape(1,-1)
lead_jet_phi_signal = ak.to_numpy(lead_jet_signal.phi).reshape(1,-1)
lead_jet_e_signal = ak.to_numpy(lead_jet_signal.e).reshape(1,-1)
sublead_jet_pt_signal = ak.to_numpy(sublead_jet_signal.pt).reshape(1,-1)
sublead_jet_eta_signal = ak.to_numpy(sublead_jet_signal.eta).reshape(1,-1)
sublead_jet_phi_signal = ak.to_numpy(sublead_jet_signal.phi).reshape(1,-1)
sublead_jet_e_signal = ak.to_numpy(sublead_jet_signal.e).reshape(1,-1)
dijet_mass_signal = ak.to_numpy(dijet_mass_signal).reshape(1,-1)
label_signal = np.ones(sublead_jet_e_signal.shape, dtype = int)

dataset_signal = np.row_stack((lead_jet_pt_signal, lead_jet_eta_signal, lead_jet_phi_signal, 
                          lead_jet_e_signal, sublead_jet_pt_signal, sublead_jet_eta_signal, 
                          sublead_jet_phi_signal, sublead_jet_e_signal, dijet_mass_signal, label_signal))

dataset_df_signal = pd.DataFrame(np.transpose(dataset_signal), columns=['jet1_pt', 'jet1_eta', 'jet1_phi', 'jet1_e', 
                                                          'jet2_pt', 'jet2_eta', 'jet2_phi', 'jet2_e', 'mass', 'label']) 

dataset_df_signal.to_csv("dataset_signal.csv")

# Now let's deal with the background

lead_jet_pt_background = ak.to_numpy(lead_jet_background.pt).reshape(1,-1)
lead_jet_eta_background = ak.to_numpy(lead_jet_background.eta).reshape(1,-1)
lead_jet_phi_background = ak.to_numpy(lead_jet_background.phi).reshape(1,-1)
lead_jet_e_background = ak.to_numpy(lead_jet_background.e).reshape(1,-1)
sublead_jet_pt_background = ak.to_numpy(sublead_jet_background.pt).reshape(1,-1)
sublead_jet_eta_background = ak.to_numpy(sublead_jet_background.eta).reshape(1,-1)
sublead_jet_phi_background = ak.to_numpy(sublead_jet_background.phi).reshape(1,-1)
sublead_jet_e_background = ak.to_numpy(sublead_jet_background.e).reshape(1,-1)
dijet_mass_background = ak.to_numpy(dijet_mass_background).reshape(1,-1)

# Attention!!!
label_background = np.zeros(sublead_jet_e_background.shape, dtype = int)

dataset_background = np.row_stack((lead_jet_pt_background, lead_jet_eta_background, lead_jet_phi_background, 
                          lead_jet_e_background, sublead_jet_pt_background, sublead_jet_eta_background, 
                          sublead_jet_phi_background, sublead_jet_e_background, dijet_mass_background, label_background))

dataset_df_background = pd.DataFrame(np.transpose(dataset_background), columns=['jet1_pt', 'jet1_eta', 'jet1_phi', 'jet1_e', 
                                                          'jet2_pt', 'jet2_eta', 'jet2_phi', 'jet2_e', 'mass', 'label']) 

dataset_df_background.to_csv("dataset_background.csv")

## Step IV: Train the models

We will again try three different approaches, linear model, BDT and neural network.

### Linear model (logistic regression)

Compared to a linear regression, a logistic regression differs only at the very last step. You do not need to change much when setting up the model.

In [None]:
from sklearn import datasets, linear_model

df_signal = pd.read_csv('dataset_signal.csv')
df_background = pd.read_csv('dataset_background.csv')

df_combine = pd.concat([df_signal, df_background])
df_combine.drop(columns="Unnamed: 0", inplace=True)

y = df_combine['label'].copy()
X = df_combine.drop(['label'], axis=1).to_numpy()

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,
                                                    random_state=42) # https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Call the logistic regression model. It is a class with various useful functions.
log_regr = linear_model.LogisticRegression()

log_regr.fit(X_train, y_train)

#Test the model using the test dataset.
y_pred = log_regr.predict(X_test)

### Performance metric

For the classification, we care about the four quantities: TN,TP,FP,FN

**Question:** What are they?

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.show()

### Boosted Regression Tree

The same as above, we just need to swap the function to call. 

In [None]:
# pandas can falculate the covariance matrix for us and the seaborn library can make 
# very nice heatmaps (https://seaborn.pydata.org/).

X_and_y = df_combine
X_and_y.corr()

import seaborn as sns

sns.heatmap(X_and_y.corr(), cmap = "YlGnBu", annot=True)
X_and_y.corr()['label'].sort_values(ascending=False)

In [None]:
import xgboost as xgb

xgb_cla = xgb.XGBClassifier(random_state=123)

xgb_cla.get_params()
xgb_cla.set_params(n_estimators=10)

xgb_cla.fit(X_train, y_train)

y_pred_bdt = xgb_cla.predict(X_test)

xgb_cla.save_model("resonance_classifier.json")

In [None]:
cm = confusion_matrix(y_test, y_pred_bdt)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.show()

#### Tree visualization

As mentioned before, one great advantage of the boost tree is its interpretability. Now let's see how to use those functions.

In [None]:
import matplotlib
matplotlib.rcParams['figure.figsize'] = (10.0, 8)

xgb.plot_importance(xgb_cla)

In [None]:
# Plot gain instead of weight
xgb.plot_importance(xgb_cla, importance_type="gain")

**Question**: are they as expected?

In [None]:
matplotlib.rcParams['figure.figsize'] = (20.0, 8)

# Plot the first tree
xgb.plot_tree(xgb_cla, num_trees=0)

format = 'svg' 

image = xgb.to_graphviz(xgb_cla)

#Set a different dpi (work only if format == 'png')
image.graph_attr = {'dpi':'400'}

image.render('bdt_classifier', format = format)

In [None]:
xgb.plot_tree(xgb_cla, num_trees=9, rankdir='LR')

### Deep Neutral Network

It is fairly easy as well to build a classifier using NN. We have to modify a few more steps here though. 

**Question**: What needs to be modified?

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
import torch.optim as optim

df_signal = pd.read_csv('dataset_signal.csv')
df_background = pd.read_csv('dataset_background.csv')

df_combine = pd.concat([df_signal, df_background])
df_combine.drop(columns="Unnamed: 0", inplace=True)

y = df_combine['label'].copy()
X = df_combine.drop(['label'], axis=1).to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42, shuffle = True)
X_train = X_train[:, [i for i in range(0,8)]]

# Save a test dataset wich keeps the mass variable to get the mass easily afterwards
X_test_keep_mass = X_test
X_train_raw = X_train
X_test_raw = X_test[:, [i for i in range(0,8)]]
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

#### Loss function for classification

The loss function applied in regresions is easy to understand as it represents the "distance" between the predicted and true values. For the classification, the loss function relies on imformation theory. 

In [None]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

class NeuralNetworkClassification(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            # The number of input nodes matches the number of total features
            nn.Linear(8, 16),
            nn.ReLU(),
            # The fully connected layers have to have compatible number of nodes.
            nn.Linear(16, 32),
            nn.ReLU(),
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.Linear(16, 1),
            nn.Sigmoid()
        )
    def forward(self, x):
        return self.linear_relu_stack(x)
        
# loss function and optimizer
model = NeuralNetworkClassification().to(device)
loss_fn = nn.BCELoss()  # binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.0001)

**Question**: What has been changed compared to the regression model?

In [None]:
X_train = torch.tensor(X_train, dtype=torch.float32).to(device)
y_train = torch.tensor(y_train.values, dtype=torch.float32).reshape(-1, 1).to(device)
X_test = torch.tensor(X_test, dtype=torch.float32).to(device)
X_test_raw = torch.tensor(X_test_raw, dtype=torch.float32).to(device)
y_test = torch.tensor(y_test.values, dtype=torch.float32).reshape(-1, 1).to(device)

In [None]:
import tqdm
import copy

# training parameters
n_epochs = 100   # number of epochs to run
batch_size = 10  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size) #find the location of each sample for each batch
 
# Hold the best model
best_acc = 0   # init to infinity, so the minimization will start decreasing the mse
best_weights = None
history = []

# training loop
for epoch in range(n_epochs):
    print("Epoch " + str(epoch))
    #Set the model to training mode. This is needed as some times the layers will behave differently in training 
    #and testing. Though in our exmaples both the RELU and Linear functions behace the same, it is a good practice to set the 
    #Model to training mode before it starts training. 
    model.train()
    #Here we are using tqdm (https://tqdm.github.io/) to monitor the process but the effects can only bee seen in a terminal. 
    with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
        bar.set_description(f"Epoch {epoch}")
        for start in bar:
            # take a batch
            X_batch = X_train[start:start+batch_size].to(device)
            y_batch = y_train[start:start+batch_size].to(device)
            # forward pass
            y_pred = model(X_batch)
            # calculate the loss
            loss = loss_fn(y_pred, y_batch)
            # backward pass
            # here we set zero_grad() why?
            optimizer.zero_grad()
            loss.backward()
            # update weights using one single step.
            optimizer.step()
            # Why does the following line work?
            acc = (y_pred.round() == y_batch).float().mean()
            bar.set_postfix(
              loss=float(loss),
              acc=float(acc)
            )
    model.eval()
    y_pred = model(X_test)
    acc = (y_pred.round() == y_test).float().mean()
    acc = float(acc)
    history.append(acc)
    print("Accuracy: " + str(acc))
    if acc > best_acc:
        best_acc = acc
        best_weights = copy.deepcopy(model.state_dict())

    # restore model and return best accuracy
model.load_state_dict(best_weights)
# restore model and return best accuracy

In [None]:
y_pred_nn = model(X_test)

In [None]:
cm = confusion_matrix(y_test, y_pred_nn.round().detach().numpy())
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
print(cm)
print("Background fake rate: " + str(cm[0,1]/(cm[0,1] + cm[0,0])))
print("Signal accuracy: " + str(cm[1,1]/(cm[1,1] + cm[1,0])))
disp.plot()
plt.show()

#### ROC curves

When it comes to classification, you will see the so-called ROC curves very often. 

**Question:** Why do we need ROC curves? What is changing along the curve.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import metrics

fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred_nn.detach().numpy())

roc_auc = metrics.auc(fpr, tpr)

display = metrics.RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc, estimator_name='NN Classification')

display.plot()

plt.show()

The classifier looks awesome....but

Does it introduce biases?

**Question:** What background events have been identified as signal?

In [None]:
pred_mask = y_pred_nn.round().detach().numpy().reshape(-1)
true_mask = y_test.detach().numpy().reshape(-1)

x_test_signal_mass = X_test_keep_mass[:,8][(true_mask == 1) & (pred_mask == 1)]
x_test_background_mass = X_test_keep_mass[:,8][(true_mask == 0) & (pred_mask == 1)]

plt.hist(x_test_signal_mass,bins=20,range=(0,2000),density=True,
         label='Signal',fill=False,edgecolor='red',linewidth=3,histtype='step');
plt.hist(x_test_background_mass,bins=20,range=(0,2000),density=True,
         label='Background',fill=False,edgecolor='blue',linewidth=3,histtype='step');

plt.xlabel(r'Mass (GeV/c)',fontsize=14)
plt.legend(prop={'size': 10})
plt.show()

#### Attention!!!

Always be aware of the biases!

ML can learn whatever it wants, and it is possible that it learns in a way we do not appreciate.

**Question**: What can those biases introduce in actual data analyses?


#### Ablation

To mitigate the biases, there are many approaches to try. Let's see what happens if we remove the jet pT and energy variables in the training. 

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
import torch.optim as optim

df_signal = pd.read_csv('dataset_signal.csv')
df_background = pd.read_csv('dataset_background.csv')

df_combine = pd.concat([df_signal, df_background])
df_combine.drop(columns="Unnamed: 0", inplace=True)

y = df_combine['label'].copy()
X = df_combine.drop(['label','jet1_pt', 'jet2_pt', 'jet1_e', 'jet2_e'], axis=1).to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42, shuffle = True)
X_train = X_train[:, [i for i in range(0,4)]]

X_test_keep_mass = X_test
X_train_raw = X_train
X_test_raw = X_test[:, [i for i in range(0,4)]]
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

In [None]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

class NeuralNetworkClassification(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            # The number of input nodes matches the number of total features
            nn.Linear(4, 8),
            nn.ReLU(),
            nn.Linear(8, 4),
            nn.ReLU(),
            nn.Linear(4, 1),
            nn.Sigmoid()
        )
    def forward(self, x):
        return self.linear_relu_stack(x)
        
# loss function and optimizer
model = NeuralNetworkClassification().to(device)
loss_fn = nn.BCELoss()  # binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.0001)

In [None]:
X_train = torch.tensor(X_train, dtype=torch.float32).to(device)
y_train = torch.tensor(y_train.values, dtype=torch.float32).reshape(-1, 1).to(device)
X_test = torch.tensor(X_test, dtype=torch.float32).to(device)
X_test_raw = torch.tensor(X_test_raw, dtype=torch.float32).to(device)
y_test = torch.tensor(y_test.values, dtype=torch.float32).reshape(-1, 1).to(device)

In [None]:
import tqdm
import copy

# training parameters
n_epochs = 100   # number of epochs to run
batch_size = 10  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size) #find the location of each sample for each batch
 
# Hold the best model
best_acc = 0   # init to infinity, so the minimization will start decreasing the mse
best_weights = None
history = []

# training loop
for epoch in range(n_epochs):
    print("Epoch " + str(epoch))
    #Set the model to training mode. This is needed as some times the layers will behave differently in training 
    #and testing. Though in our exmaples both the RELU and Linear functions behace the same, it is a good practice to set the 
    #Model to training mode before it starts training. 
    model.train()
    #Here we are using tqdm (https://tqdm.github.io/) to monitor the process but the effects can only bee seen in a terminal. 
    with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
        bar.set_description(f"Epoch {epoch}")
        for start in bar:
            # take a batch
            X_batch = X_train[start:start+batch_size].to(device)
            y_batch = y_train[start:start+batch_size].to(device)
            # forward pass
            y_pred = model(X_batch)
            # calculate the loss
            loss = loss_fn(y_pred, y_batch)
            # backward pass
            # here we set zero_grad() why?
            optimizer.zero_grad()
            loss.backward()
            # update weights using one single step.
            optimizer.step()
            # Why does the following line work?
            acc = (y_pred.round() == y_batch).float().mean()
            bar.set_postfix(
              loss=float(loss),
              acc=float(acc)
            )
    model.eval()
    y_pred = model(X_test)
    acc = (y_pred.round() == y_test).float().mean()
    acc = float(acc)
    history.append(acc)
    print("Accuracy: " + str(acc))
    if acc > best_acc:
        best_acc = acc
        best_weights = copy.deepcopy(model.state_dict())

    # restore model and return best accuracy
model.load_state_dict(best_weights)
# restore model and return best accuracy

In [None]:
y_pred_nn = model(X_test)
cm = confusion_matrix(y_test, y_pred_nn.round().detach().numpy())
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
print(cm)
print("Background fake rate: " + str(cm[0,1]/(cm[0,1] + cm[0,0])))
print("Signal accuracy: " + str(cm[1,1]/(cm[1,1] + cm[1,0])))
disp.plot()
plt.show()

In [None]:
pred_mask = y_pred_nn.round().detach().numpy().reshape(-1)
true_mask = y_test.detach().numpy().reshape(-1)

x_test_signal_mass = X_test_keep_mass[:,4][(true_mask == 1) & (pred_mask == 1)]
x_test_background_mass = X_test_keep_mass[:,4][(true_mask == 0) & (pred_mask == 1)]

plt.hist(x_test_signal_mass,bins=20,range=(0,2000),density=True,
         label='Signal',fill=False,edgecolor='red',linewidth=3,histtype='step');
plt.hist(x_test_background_mass,bins=20,range=(0,2000),density=True,
         label='Background',fill=False,edgecolor='blue',linewidth=3,histtype='step');

plt.xlabel(r'Mass (GeV/c)',fontsize=14)
plt.legend(prop={'size': 10})
plt.show()

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import metrics

fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred_nn.detach().numpy())

roc_auc = metrics.auc(fpr, tpr)

display = metrics.RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc, estimator_name='NN Classification')

display.plot()

plt.show()

#### Impacts from training data.

Let's think about it, we might only care about whether the algorithm can distinguish the signal from the background in a given mass region.

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
import torch.optim as optim

df_signal = pd.read_csv('dataset_signal.csv')
df_background = pd.read_csv('dataset_background.csv')

df_combine = pd.concat([df_signal, df_background])
df_combine.drop(columns="Unnamed: 0", inplace=True)

y = df_combine['label'].copy()
X = df_combine.drop(['label','jet1_pt', 'jet2_pt', 'jet1_e', 'jet2_e'], axis=1).to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42, shuffle = True)

y_train = y_train[(X_train[:,4] > 800) & (X_train[:,4] < 1100)]
X_train = X_train[(X_train[:,4] > 800) & (X_train[:,4] < 1100)]
X_train_raw = X_train[:, [i for i in range(0,4)]]

X_test_keep_mass = X_test.copy()
X_test_all_mass = X_test[:, [i for i in range(0,4)]]
y_test_all_mass = y_test.copy()

y_test = y_test[(X_test[:,4] > 800) & (X_test[:,4] < 1100)]
X_test = X_test[(X_test[:,4] > 800) & (X_test[:,4] < 1100)]
X_test_raw = X_test[:, [i for i in range(0,4)]]

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

class NeuralNetworkClassification(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            # The number of input nodes matches the number of total features
            nn.Linear(4, 8),
            nn.ReLU(),
            nn.Linear(8, 4),
            nn.ReLU(),
            nn.Linear(4, 1),
            nn.Sigmoid()
        )
    def forward(self, x):
        return self.linear_relu_stack(x)
        
# loss function and optimizer
model = NeuralNetworkClassification().to(device)
loss_fn = nn.BCELoss()  # binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.0001)

X_train = torch.tensor(X_train, dtype=torch.float32).to(device)
y_train = torch.tensor(y_train.values, dtype=torch.float32).reshape(-1, 1).to(device)
X_test = torch.tensor(X_test, dtype=torch.float32).to(device)
X_test_raw = torch.tensor(X_test_raw, dtype=torch.float32).to(device)
y_test = torch.tensor(y_test.values, dtype=torch.float32).reshape(-1, 1).to(device)
X_test_all_mass = torch.tensor(X_test_all_mass, dtype=torch.float32).to(device)

import tqdm
import copy

# training parameters
n_epochs = 100   # number of epochs to run
batch_size = 10  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size) #find the location of each sample for each batch
 
# Hold the best model
best_acc = 0   # init to infinity, so the minimization will start decreasing the mse
best_weights = None
history = []

# training loop
for epoch in range(n_epochs):
    print("Epoch " + str(epoch))
    #Set the model to training mode. This is needed as some times the layers will behave differently in training 
    #and testing. Though in our exmaples both the RELU and Linear functions behace the same, it is a good practice to set the 
    #Model to training mode before it starts training. 
    model.train()
    #Here we are using tqdm (https://tqdm.github.io/) to monitor the process but the effects can only bee seen in a terminal. 
    with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
        bar.set_description(f"Epoch {epoch}")
        for start in bar:
            # take a batch
            X_batch = X_train[start:start+batch_size].to(device)
            y_batch = y_train[start:start+batch_size].to(device)
            # forward pass
            y_pred = model(X_batch)
            # calculate the loss
            loss = loss_fn(y_pred, y_batch)
            # backward pass
            # here we set zero_grad() why?
            optimizer.zero_grad()
            loss.backward()
            # update weights using one single step.
            optimizer.step()
            # Why does the following line work?
            acc = (y_pred.round() == y_batch).float().mean()
            bar.set_postfix(
              loss=float(loss),
              acc=float(acc)
            )
    model.eval()
    y_pred = model(X_test)
    acc = (y_pred.round() == y_test).float().mean()
    acc = float(acc)
    history.append(acc)
    print("Accuracy: " + str(acc))
    if acc > best_acc:
        best_acc = acc
        best_weights = copy.deepcopy(model.state_dict())

    # restore model and return best accuracy
model.load_state_dict(best_weights)
# restore model and return best accuracy

y_pred_nn = model(X_test)
cm = confusion_matrix(y_test, y_pred_nn.round().detach().numpy())
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
print(cm)
print("Background fake rate: " + str(cm[0,1]/(cm[0,1] + cm[0,0])))
print("Signal accuracy: " + str(cm[1,1]/(cm[1,1] + cm[1,0])))
disp.plot()
plt.show()

y_pred_all_mass_nn = model(X_test_all_mass)
pred_mask = y_pred_all_mass_nn.round().detach().numpy().reshape(-1)
true_mask = y_test_all_mass

x_test_signal_mass = X_test_keep_mass[:,4][(true_mask == 1) & (pred_mask == 1)]
x_test_background_mass = X_test_keep_mass[:,4][(true_mask == 0) & (pred_mask == 1)]

plt.hist(x_test_signal_mass,bins=20,range=(0,2000),density=True,
         label='Signal',fill=False,edgecolor='red',linewidth=3,histtype='step');
plt.hist(x_test_background_mass,bins=20,range=(0,2000),density=True,
         label='Background',fill=False,edgecolor='blue',linewidth=3,histtype='step');

plt.xlabel(r'Mass (GeV/c)',fontsize=14)
plt.legend(prop={'size': 10})
plt.show()

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import metrics

fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred_nn.detach().numpy())

roc_auc = metrics.auc(fpr, tpr)

display = metrics.RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc, estimator_name='NN Classification')

display.plot()

plt.show()

## Take home message and exercises

Using these two examples, we have introduced two major tasks in HEP where ML can help: regression and classification. Both examples are supervised approaches. We did not introduce un-supervised approaches due to time limits but these two examples should have already taught you the basic SW skills to go further. 

As mentioned several times during this course, ML in HEP is promising but we should not abandon our physics instincts and knowledge. In particular, for experimentalists, how to ensure good control over the performance in data is very crucial. Not all problems in HEP are worth solving using ML at the moment, as in certain areas classic methods still retain their power. However, we should all be open-minded and dare to change. 

Here is a small project for you to exercise more. You will need to acquire some SW knowledge that has not been introduced in this course, but you should know how to get them. 

### Comprehensive exercise

**Goal**: design a classifier that:

**1**: Works for the various mass pointes provided

**2**: Does not sculpt the background invariant mass distribution significantly. 