<hr style="height: 1px;">
<i>This notebook was authored by the 8.S50x Course Team, Copyright 2022 MIT All Rights Reserved.</i>
<hr style="height: 1px;">
<br>

<h1>Project 2 - Part II: Measuring Properties of the W Boson Using Deep Learning</h1>


<a name='section_2_0'></a>
<hr style="height: 1px;">


## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">PROJ2.P2.0 Overview and Expectations</h2>

| [0 - Overview and Expectations](#section_2_0) | [1 - Loading Data and Functions](#section_2_1) | [2 - Extracting the W with a Neural Network](#section_2_2) |

<h3>Overview</h3>

In the first part of this Project, we used just one variable and decorrelated to extract the W boson. However this sample has many variables that allow us to further enhance our W peak by isolating features not like the QCD background. To incorporate these variables, we can use deep learning and output a single discriminator to isolate W. However, we are also going to have to decorrelate our deep learning against the mass, otherwise our neural network will try to sculpt the background to look peaky like the W boson. In Part II of the Project, we will go about a simple approach to add more variables and decorrelate our neural network.

The outline of this notebook is as follows:



- **PROJ2.P2.0:** We outline the expectations for completing this part of the project and describe the grading scheme.


- **PROJ2.P2.1:** We load the data and relevant functions that were previously defined in Part I.



- **PROJ2.P2.2:** We provide guidance for using deep learning to isolate the W signal and make a best-fit plot. We go through the work in detail, but only use a minimal number of features to train the model initially. Thus, we show a minimal working example that you should expand upon. **This is the part of the project that you are expected to submit, once you optimize your approach.**

<h3>Expectations and Grading</h3>

For this open-ended task, you will be expected to develop some procedure, analyze your results, and present your findings. Specifically, you will do the following:
       
1. Submit a pdf of your work on MITx, to be graded by your peers (based on the criteria outlined on MITx).
2. Grade the work of others based on the same criteria.

For full credit on this peer-reviewed checkpoint, we specifically expect you to complete these three tasks (and support your work with thorough explanation):

- Task 1: Develop and apply the neural network.
- Task 2: Explain your approach.
- Task 3: Describe your results and characterize the significance.


<a name='section_2_1'></a>
<hr style="height: 1px;">


## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">PROJ2.P2.1 Loading Data and Functions</h2>

| [0 - Overview and Expectations](#section_2_0) | [1 - Loading Data and Functions](#section_2_1) | [2 - Extracting the W with a Neural Network](#section_2_2) |

<h3>Data</h3>

>description: Boosted Single Jet dataset at 8TeV<br>
>source: https://zenodo.org/record/8035318 <br>
>attribution: Philip Harris (CMS Collaboration), DOI:10.5281/zenodo.8035318 

In [None]:
#>>>RUN: PROJ2.P2.1-runcell00

# NOTE: these files are too large to include in the original repository,
# so you must download them using the options below
#
# Ways to download:
#     1. Copy/paste the link (replace =0 with =1 to download automatically)
#     2. Use the wget commands below (works in Colab, but you may need to install wget if using locally)
#
# Location of files:
#     Move the files to the directory 'data'
#
# Using wget: (works in Colab)
#     Upon downloading, the code below will move them to the appropriate directory

#3GB Data Set: data1
!wget https://www.dropbox.com/s/bcyab2lljie72aj/data.tgz?dl=0
!mv data.tgz?dl=0 data.tgz #rename
!tar -xvf data.tgz #extract the data
!rm data.tgz #clean the downloaded file

#130MB Data Set: data2
!wget https://www.dropbox.com/s/p756oa4mfw17lfw/data.zip?dl=0
!mv data.zip?dl=0 data.zip #rename
!unzip data.zip #extract the data
!rm data.zip #clean the downloaded file

<h3>Importing Libraries</h3>

Before beginning, run the cell below to import the relevant libraries for this notebook.

In [None]:
#>>>RUN: PROJ2.P2.1-runcell01

# pre-requisites: install now if you have not already done so
# uproot High energy physics python file format: https://masonproffitt.github.io/uproot-tutorial/aio.html
!pip install uproot
!pip install lmfit
!pip install mplhep

In [None]:
#>>>RUN: PROJ2.P2.1-runcell02

import uproot
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import os,sys

#!pip install lmfit #install lmfit if you have not done this already
import lmfit as lm

#!pip install mplhep #install mplhep if you have not done this already
# plotting style for High Energy physics 
import mplhep as hep
plt.style.use(hep.style.CMS)

<h3>Setting Default Figure Parameters</h3>

The following code cell sets default values for figure parameters.

In [None]:
#>>>RUN: PROJ2.P2.1-runcell03

#set plot resolution
%config InlineBackend.figure_format = 'retina'

#set default figure parameters
plt.rcParams['figure.figsize'] = (6,6)

medium_size = 12
large_size = 15

plt.rc('font', size=medium_size)          # default text sizes
plt.rc('xtick', labelsize=medium_size)    # xtick labels
plt.rc('ytick', labelsize=medium_size)    # ytick labels
plt.rc('legend', fontsize=medium_size)    # legend
plt.rc('axes', titlesize=large_size)      # axes title
plt.rc('axes', labelsize=large_size)      # x and y labels
plt.rc('figure', titlesize=large_size)    # figure title


<h3>Loading Data and Defining Functions</h3>

As in the first part of the project, we load the data and define necessary functions, below.

In [None]:
#>>>RUN: PROJ2.P2.1-runcell04

#Load the data, if you have not done so in Section 1

wqq    = uproot.open("data/WQQ_s.root")["Tree"]
zqq    = uproot.open("data/ZQQ_s.root")["Tree"]
wqq13  = uproot.open("data/skimh/WQQ_sh.root")["Tree"]
zqq13  = uproot.open("data/skimh/ZQQ_sh.root")["Tree"]
wqq_n  = uproot.open("data/WQQ_8TeV_Jan11_r.root")["Tree"]
zqq_n  = uproot.open("data/ZQQ_8TeV_Jan11_r.root")["Tree"]
qcd    = uproot.open("data/QCD_s.root")["Tree"]
tt     = uproot.open("data/TT.root")["Tree"]
ww     = uproot.open("data/WW.root")["Tree"]
wz     = uproot.open("data/WZ.root")["Tree"]
zz     = uproot.open("data/ZZ.root")["Tree"]
ggh    = uproot.open("data/ggH.root")["Tree"]
data   = uproot.open("data/JetHT_s.root")["Tree"]

In [None]:
#>>>RUN: PROJ2.P2.1-runcell05

def selection(iData):
    '''
    Standard pre-selection
    '''
    #lets apply a trigger selection
    trigger = (iData.arrays('trigger', library="np")["trigger"].flatten() > 0)

    #Now lets require the jet pt to be above a threshold (400 TODO: ASK about units)
    jetpt   = (iData.arrays('vjet0_pt', library="np")["vjet0_pt"].flatten() > 400)

    #Lets apply both jetpt and trigger at the same time
    #standard_trig = (iData.arrays('trigger', library="np")["trigger"].flatten() % 4 > 1) #lets require one of our standard triggers (jet pT > 370 )
    allcuts = np.logical_and.reduce([trigger,jetpt])

    return allcuts
    
def get_weights(iData,weights,sel):
    
    weight = weights[0]
    
    for i in range(1,len(weights)):
        weight *= iData.arrays(weights[i],library="np")[weights[i]][sel]
        
    return weight

def integral(iData,iWeights):
    '''
    This computs the integral of weighted events 
    assuming a selection given by the function selection (see below)
    '''
    
    #perform a selection on the data (
    mask_sel=selection(iData)
    
    #now iterate over the weights not the weights are in the format of [number,variable name 1, variable name 2,...]
    weight  =iWeights[0]
    
    for i0 in range(1,len(iWeights)):
        weightarr = iData.arrays(iWeights[i0], library="np")[iWeights[i0]][mask_sel].flatten()
        
        #multiply the weights
        weight    = weight*weightarr
    
    #now take the integral and return it
    return np.sum(weight)


def scale(iData8TeV,iData13TeV,iWeights):
    '''
    This computes the integral of two selections for two datasets labelled 8TeV and 13TeV,
    but really can be 1 and 2. Then it returns the ratio of the integrals
    '''
    
    int_8TeV  = integral(iData8TeV,iWeights)
    int_13TeV = integral(iData13TeV,iWeights)
    
    return int_8TeV/int_13TeV

In [None]:
#>>>RUN: PROJ2.P2.1-runcell06

def plotDataSim(iVar, iSelection, iVarName, iRange, iYrange=-1):
    
    #Lets Look at the mass
    weights = [1000*18300, "puweight", "scale1fb"]
    mrange = iRange #range for mass histogram [GeV]
    bins=40            #bins for mass histogram
    density = False     #to plot the histograms as a density (integral=1)

    qcdsel      = iSelection(qcd)
    wsel        = iSelection(wqq13)
    zsel        = iSelection(zqq13)
    datasel     = iSelection(data)
    ttsel       = iSelection(tt)
    wwsel       = iSelection(ww)
    wzsel       = iSelection(wz)
    zzsel       = iSelection(zz)
    gghsel      = iSelection(ggh)

    wscale=scale(wqq,wqq13,weights)
    zscale=scale(zqq,zqq13,weights)

    # Getting the masses of selected events
    dataW = data.arrays(iVar, library="np") [iVar][datasel]
    qcdW  = qcd.arrays(iVar, library="np")  [iVar][qcdsel]
    wW    = wqq13.arrays(iVar, library="np")[iVar][wsel]
    zW    = zqq13.arrays(iVar, library="np")[iVar][zsel]
    zzW   = zz   .arrays(iVar, library="np")[iVar][zzsel]
    wzW   = wz   .arrays(iVar, library="np")[iVar][wzsel]
    wwW   = ww   .arrays(iVar, library="np")[iVar][wwsel]
    ttW   = tt   .arrays(iVar, library="np")[iVar][ttsel]
    gghW  = ggh  .arrays(iVar, library="np")[iVar][gghsel]

    #Define the weights for the histograms
    hist_weights = [get_weights(qcd,weights,qcdsel),
                    get_weights(wqq13,weights,wsel)*wscale,
                    get_weights(zqq13,weights,zsel)*zscale,
                    get_weights(zz,weights,zzsel),
                    get_weights(wz,weights,wzsel),
                    get_weights(ww,weights,wwsel),
                    get_weights(tt,weights,ttsel),
                   ]

    #Hint: Provide a list of selected data
    plt.hist([qcdW,wW, zW, zzW, wzW, wwW, ttW],
             color=["royalblue","r", "orange","g", "b", "purple", "cyan",], 
             label=["QCD", "W", "Z", "ZZ", "WZ", "WW", "tt",], 
             weights=hist_weights,
             range=mrange, bins=50, alpha=.6, density=density,stacked=True)

    #Other configurations for the histogram
    counts, bins = np.histogram(dataW, bins=bins, range=mrange, density=density)
    yerr = np.sqrt(counts)#/ np.sqrt(len(dataW)*np.diff(bins))
    binCenters = (bins[1:]+bins[:-1])*.5
    plt.errorbar(binCenters, counts, yerr=yerr,fmt="o",c="k",label="data")
    plt.legend()
    plt.xlabel(iVarName)
    plt.ylabel("Counts")
    if iYrange != -1:
        plt.ylim(0,iYrange)
    plt.show()

<a name='section_2_2'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">PROJ2.P2.2 Extracting the W with a Neural Network - Submit This Section</h2>   

| [0 - Overview and Expectations](#section_2_0) | [1 - Loading Data and Functions](#section_2_1) | [2 - Extracting the W with a Neural Network](#section_2_2) |

<h3>Task 1: Develop and apply the neural network.</h3>

**Steps 1-12 will guide you through a working approach. Run through the cells, then use more variables, change parameters, and make improvements in order to optimize your own approach.**

<h3>Objective</h3>

We will go about a simple approach to add more variables and decorrelate our neural network. This approach is not exhaustive and there are variety of more sophisticated approaches that exist beyond this.

As we have seen there are a variety of ways to design and use deep learning algorithms. In this project, what we would like to do is solve the same decorrelation problem by telling the deep learning algorithm to discriminate signal from background, while simulatneously ensuring that the correlation coefficient of the deep learning discriminator with the mass is small. This is exactly what we did in the main project (without deep learning). However, here we are going to guide the algorithm to do this.

**Decorrelation and Bias**

For this project, we will choose just one way to decorrelate. However, there is a rich literature for decorrelation in artificial Intelligence. This field is broadly referred to as "Ethical-AI" since we are telling the AI to remain unbiased against a feature. In scientific studies this can be a variable, which we wish to remain unbiased against. In many real world applications, we might want to force our AI to not cheat, or preferentially select one group of things over another.

Moreover, there are many more sophisticated ways to remove bias from measurements. This can be done through unbiased samples, new correlation calculations such as distance correlation, or even using adversarial networks to push the deep learning algorithm away from a correlated solution. This project really is just the tip of the iceberg.

**The Idea of this Part: Starting with a Walk-Through**

Below, we are going to do a step by step walk through of a deep learning algorithm that both separates the W from the QCD background and decorrelates. The idea here is to show you *an example* of how to do this. **Ultimately, we would like to you to change the variables, modify the neural network, and see how good a performance you can get.**

We can guarantee you that if you work on this project, you should be able to outperform the basic $\tau_{2}/\tau_{1}$ decorrelated mass measurement with a bigger peak and smaller uncertainty on the mass.


<h3>1. Select Features for Deep Learning</h3>

First of all, we need select a set of features that you would like to use for your deep learning. In the code below, we will perform the base selection and then list all the features. Our goal with this strategy is to show you what is available for deep learning, so that you have an idea of what we can use. We will print alll the variabls using the keys function.

In [None]:
#>>>RUN: PROJ2.P2.2-runcell01

datasel = selection(data)
wsel = selection(wqq13)
zsel = selection(zqq13)
qcdsel = selection(qcd)
wwsel = selection(ww)
zzsel = selection(zz)
ttsel = selection(tt)
wzsel = selection(wz)
gghsel = selection(ggh)

print("All data loaded. Available features:")
print(*data.keys(), sep=" | ")


Now what we are going to do is just select a few variables, in this case $\tau_{1}$, $\tau_{2}$, $\tau_{3}$, and jet btag probability (csv) to start with. We will also ignore a few variables that are correlated with the mass or exist only in the simulation (gen and flavor info). You can see some example code to add A LOT more variables if you like.

More importantly, you might ask why these variables. Well...Ideally we wanted to just have $\tau_{1}$, $\tau_{2}$ and show that you can train an NN to learn how to decorrelate and find the W signal. However, using just $\tau_{1}$, $\tau_{2}$ is actually a very hard machine learning project, so we somewhat randomly are suggesting we use these variables to start with. **Ultimately, your job is to explore the variables, so just view this as a starting point.**

Now our first task will be to make a neural network and see what it does to select the W boson.

**WARNING: the NN is not going to do what you want, and it will be very hard to extract the signal from it. Try adding more variables to `keys`**


In [None]:
#>>>RUN: PROJ2.P2.2-runcell02

# Get features
def keep_key(key):
    kws = [
        "gen", "mass", "msd0", "msd1", "flavor", "mprune", "mtrim", "trig","pt", "eta", "phi"
    ] 
    for kw in kws:
        if kw in key: return False
    return True

#change the keys to add more variables
keys = ["vjet0_t1","vjet0_t2","vjet0_t3","vjet0_csv"]

qcd_samples  = np.stack(list(qcd.arrays(keys, library="np").values()),axis=-1)[qcdsel]
w_samples    = np.stack(list(wqq13.arrays(keys, library="np").values()),axis=-1)[wsel]
data_samples = np.stack(list(data.arrays(keys, library="np").values()),axis=-1)[datasel]
z_samples    = np.stack(list(zqq13.arrays(keys, library="np").values()),axis=-1)[zsel]
zz_samples   = np.stack(list(zz.arrays(keys, library="np").values()),axis=-1)[zzsel]
wz_samples   = np.stack(list(wz.arrays(keys, library="np").values()),axis=-1)[wzsel]
ww_samples   = np.stack(list(ww.arrays(keys, library="np").values()),axis=-1)[wwsel]
tt_samples   = np.stack(list(tt.arrays(keys, library="np").values()), axis=-1)[ttsel]
ggh_samples  = np.stack(list(ggh.arrays(keys, library="np").values()),axis=-1)[gghsel]


print("Used features:")
print(*keys, sep=" | ")


<h3>2. Prepare the Variables for Training</h3>

Now we want to make our variables easy to process and ready to train. In the following, we define a preprocesing script to normalize (or regularize) the inputs so that the ranges are roughly the same and the means are roughly the same. This will make the neural network a little easier to train.

The regularization takes the min and max range of the variables and shifts the sample so that the min value is 0 and the max value is 1. This is common deep learning practice. Shifting the mean to 0 and dividing by the standard deviation is another common one.

Also we are going to make a training sample, which is a merger of W and QCD. Let's go ahead and set this up. 

In [None]:
#>>>RUN: PROJ2.P2.2-runcell03

combined_samples = np.concatenate([w_samples, qcd_samples],axis=0).astype("float32")
maxsamples = combined_samples.max(axis=0)
minsamples = combined_samples.min(axis=0)

def normalize(iSample,iMax,iMin):
    lSample = iSample - iMin / (iMax - iMin)
    return lSample.astype("float32")

combined_samples = normalize(combined_samples,minsamples,maxsamples)
qcd_samples      = normalize(qcd_samples,minsamples,maxsamples)
w_samples        = normalize(w_samples,minsamples,maxsamples)
z_samples        = normalize(z_samples,minsamples,maxsamples)
zz_samples       = normalize(zz_samples,minsamples,maxsamples)
ww_samples       = normalize(ww_samples,minsamples,maxsamples)
wz_samples       = normalize(wz_samples,minsamples,maxsamples)
tt_samples       = normalize(tt_samples,minsamples,maxsamples)
ggh_samples      = normalize(ggh_samples,minsamples,maxsamples)
data_samples     = normalize(data_samples,minsamples,maxsamples)

# plotting features
fig, axes = plt.subplots(4, 1, figsize=(14, 14))
kwargs = {"bins": 50, "density": True, "histtype": "step"}
for i, ax in enumerate(axes.flatten()):
    try:
        bkg_min = qcd_samples[:,i].min() if qcd_samples[:, i].min() > -10 else -1
        sig_min = w_samples[:, i].min() if w_samples[:, i].min() > -10 else -1
        ax.hist(w_samples[:, i],
                range=[sig_min, w_samples[:, i].max()],
                color="r",
                label='W',
                **kwargs)
        ax.hist(qcd_samples[:, i],
                range=[bkg_min, qcd_samples[:, i].max()],
                color="b",
                label='QCD',
                **kwargs)
    except IndexError:
        continue
    ax.set_title("-".join(keys[i].split("_")))
    ax.set_yticks([])
    if i == 0: ax.legend()
    fig.tight_layout()


<h3>3. Extract the Mass</h3>

Now what we are going to do is make a separate dataset that saves the mass. We want to keep this as a separate dataset so that we can manipulate just the mass, independent of the other parameters in our dataset. This just follows the selection we have above. Lets go ahead and extract the mass as a separate array with the same pre-selection.

In [None]:
#>>>RUN: PROJ2.P2.2-runcell04

qcd_mass  = qcd.arrays("vjet0_msd0", library="np")["vjet0_msd0"][qcdsel]
w_mass    = wqq13.arrays("vjet0_msd0", library="np")["vjet0_msd0"][wsel]
z_mass    = zqq13.arrays("vjet0_msd0", library="np")["vjet0_msd0"][zsel]
zz_mass   = zz.arrays("vjet0_msd0", library="np")["vjet0_msd0"][zzsel]
ww_mass   = ww.arrays("vjet0_msd0", library="np")["vjet0_msd0"][wwsel]
wz_mass   = wz.arrays("vjet0_msd0", library="np")["vjet0_msd0"][wzsel]
tt_mass   = tt.arrays("vjet0_msd0", library="np")["vjet0_msd0"][ttsel]
ggh_mass  = ggh.arrays("vjet0_msd0", library="np")["vjet0_msd0"][gghsel]
data_mass = data.arrays("vjet0_msd0", library="np")["vjet0_msd0"][datasel]

# Combined Sample mass
mass   = np.concatenate([w_mass, qcd_mass]).astype("float32")

# Data labels 0 for signal and 1 for background. This is the opposite of the usual convention.
labels = np.concatenate([np.zeros(len(w_samples)), np.ones(len(qcd_samples))])
labels = labels.astype("float32")


<h3>4. Prepare the Data for Torch</h3>

Now, we want to make a torch dataset and data loader to make it easier to process the data. Since we will want to decorrelate against the mass, we will make a custom datasset that outputs the mass. 

In [None]:
#>>>RUN: PROJ2.P2.2-runcell05

import torch
from torch.utils.data import Dataset, DataLoader
torch.random.manual_seed(42) # fix a random seed for reproducibility

class DataSet(Dataset):
    def __init__(self, samples, labels, m=None):
        self.labels = labels
        self.samples = samples
        self.m = m
        if len(samples) != len(labels):
            raise ValueError(
                f"should have the same number of samples({len(samples)}) as there are labels({len(labels)})")

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, index):
        # Select sample
        X = self.samples[index]
        y = self.labels[index]
        m = self.m[index] if self.m is not None else self.m
        return X, y, m

dataset = DataSet(samples=combined_samples,labels=labels,m=mass)
traindataset, testdataset = torch.utils.data.random_split(dataset, [int(0.8*len(labels)),len(labels)-int(0.8*len(labels))])

#loaders
trainloader = DataLoader(traindataset,batch_size=4096)
testloader  = DataLoader(testdataset,batch_size=4096)

<h3>5. Define the Neural Network</h3>

Alright time to make a neural network. We will define an MLP that can be used for binary classification. For now we will use 16 hidden parameters. **Remember this is just an example - you may want to use more!**

In [None]:
#>>>RUN: PROJ2.P2.2-runcell06

torch.random.manual_seed(42) # fix a random seed for reproducibility

class MLP(torch.nn.Module):  # Model from utils
    def __init__(self,input_size=10,out_channels=1,name=None):
        """
         DNN Model inherits from torch.torch.nn.Module. Can be initialized with input_size: Number of features per sample.

        This is a class wrapper for a simple DNN model. Creates an instance of torch.torch.nn.Module that has 4 linear layers. Use torchsummary for details.

        Parameters
        ----------
        input_size : int=10
            The number of features to train on.
        out_channels : int=1
            The number of outputs. For binary classification we usually want one output for the "probability" 
            that a given sample is a signal event. If we want to classify samples into QCD, W, and Z, for example, we would use 3 outpute channels.
        name : string=None
            Specifiy a name for the Dtorch.nn.break
        """
        super().__init__()
        self.act     = torch.nn.ReLU()
        self.linear  = torch.nn.Linear(input_size, 16, bias=False)
        self.linear1 = torch.nn.Linear(16,16,)
        self.linear2 = torch.nn.Linear(16, 16)
        self.batchnorm = torch.nn.BatchNorm1d(16)
        self.out = torch.nn.Linear(16, out_channels)
        self.activation = torch.nn.Sigmoid()
        # Defaults
        self.out_channels = out_channels
        self.yhat_val = None
        self.yhat = None
        self.name = name

    def forward(self, x):
        x = self.act(self.linear(x))
        x = self.batchnorm(x)
        x = self.act(self.linear1(x))
        x = self.act(self.linear2(x))
        x = self.out(x)
        x = self.activation(x)
        return x
    
model = MLP(input_size=combined_samples.shape[1])

<h3>6. Training and Testing</h3>

Now, lets define a training and testing function so that we can train our network and evaluate. To get the training to work we will have to back propoagate the loss.

After defining the functions, train the network over 10 epochs.

In [None]:
#>>>RUN: PROJ2.P2.2-runcell07

def train(iNEpoch,iModel,iDataLoader,lossfunc):
    simple_criterion = lossfunc
    simple_optimizer = torch.optim.Adam(iModel.parameters(), lr=0.005) 
    for epoch in range(iNEpoch):
        for batch_idx, pData  in enumerate(iDataLoader):
            simple_optimizer.zero_grad()
            outputs = iModel(pData[0]).flatten()
            loss = simple_criterion(outputs, pData[1])
            loss.backward()
            simple_optimizer.step()    
            current_loss = loss.item()
        if epoch % 1 == 0: print('[%d] loss: %.4f  ' % (epoch + 1,  current_loss))

def test(iModel,iXData):
    with torch.no_grad():
        iModel.eval()
        inputs = torch.tensor(iXData[:][0])
        outputs = iModel(inputs)
        return outputs

In [None]:
#>>>RUN: PROJ2.P2.2-runcell08

train(10,model,trainloader,torch.nn.BCELoss())

<h3>7. Plot the ROC and AUC</h3>

Now plot the ROC and AUC. This will tell us how good our discrimination is. Is it good?

In [None]:
#>>>RUN: PROJ2.P2.2-runcell09

from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve

scores=test(model,testdataset).flatten()

# This is just a plot of the roc curve
auc = roc_auc_score(y_score=scores, y_true=testdataset[:][1])
fpr, tpr, cuts = roc_curve(y_score=scores, y_true=testdataset[:][1])
fig, ax = plt.subplots(1,1,figsize=(4,3),dpi=150)
plt.plot(fpr, tpr,label=f"{auc:.2f}")
plt.xlabel("fpr")
plt.ylabel("tpr")
plt.legend(title="auc")

plt.show()

<h3>8. Determine a Cut</h3>

Now that we have a neural network that does some good discriminiation, what we really want to do is apply this neural network to the whole dataset to see if we can see a bigger W peak that we can fit and extract properties about. That is ultimately our goal at the end.

So in light of that what we first want to do is plot the mass distribution of our testing samples while requireing the NN score to be less than or greater than a value (its likely going to be less than), such that we select more W bosons and less background.

Below, lets go ahead and make the plot.



In [None]:
#>>>RUN: PROJ2.P2.2-runcell10

def plot_hists(cut,scores,test_mass,test_labels,c="w",density=True):
    # sig_bkg
    fig, ax      = plt.subplots(1, 1)
    _,bins,_=plt.hist(test_mass[test_labels == 0],                   bins=80,density=density,histtype="step",label="Signal",color="b",ls='--')
    _,bins,_=plt.hist(test_mass[test_labels == 1],                   bins=bins,density=density,histtype="step",label="Background",color="r",ls='--')
    _,bins,_=plt.hist(test_mass[(test_labels == 1) & (scores > cut)],bins=bins,density=density,histtype="step",label="selected bkg",color="r")
    _,bins,_=plt.hist(test_mass[(test_labels == 0) & (scores > cut)],bins=bins,density=density,histtype="step",label="selected sig",color="b")
    plt.legend(loc='upper right', fontsize=12, ncol=1)
    plt.xticks(fontsize=12)
    plt.yticks(fontsize=12)
    plt.xlim([40, 240])
    plt.yscale("log")
    ax.set_xlabel("Mass [GeV]", fontsize=14)
    ax.set_ylabel("Counts", fontsize=14)
    fig.tight_layout(pad=0)
    return bins,fig, ax


    plt.legend(loc='upper right', fontsize=12, ncol=1)
    plt.xticks(fontsize=12)
    plt.yticks(fontsize=12)
    plt.xlim([40, 240])
    plt.yscale("log")
    ax.set_xlabel("Mass [GeV]", fontsize=14)
    ax.set_ylabel("Counts", fontsize=14)
    fig.tight_layout(pad=0)
    return bins,fig, ax

##plot discriminator
#plot the sculpting (consider function above)
labels=testdataset[:][1]
masses=testdataset[:][2]
scores=scores.detach().numpy()
plt.hist(scores[labels==0],alpha=0.5,density=True,label='sig')
plt.hist(scores[labels==1],alpha=0.5,density=True,label='bkg')
plt.xlabel("NN Discriminator")
plt.ylabel("Events (normalized)")
plt.show()

plot_hists(-0.75,-1*scores,masses,labels,density=False)
plt.show()

Now what exactly is going on with the neural network? You see that as you apply a selection to the NN (solid lines), both your signal and background start to be peaky at the same spot. This makes it very difficult to separate out the two. To see this, lets go ahead and apply an NN selection on the data to see how it looks.


In [None]:
#>>>RUN: PROJ2.P2.2-runcell11

def selectionWNN(iSamples):
    #Pre-selection citeria
    samplesel = selection(iSamples)
    lSamples  = np.stack(list(iSamples.arrays(keys, library="np").values()),axis=-1)
    lSamples  = normalize(lSamples,minsamples,maxsamples)
    dataset   = DataSet(samples=lSamples,labels=np.ones(len(lSamples)),m=np.ones(len(lSamples)))
    lScores   = test(model,dataset).flatten()
    nnsel     = lScores < 0.7
    allcuts   = samplesel & nnsel.numpy()
    return allcuts

plotDataSim("vjet0_msd0", selectionWNN, "Jet Mass", [40,200])

This is where are stuck! What you can see with the above is that our neural network is sculpting the signal mass distribution pretty dramatically. You now have a bump on a bump and you cannot extract the signal from this.

What have we done? Well our NN has learned the mass distribution along with some other features to find W jets. However, we don't want it to learn the mass, since now all samples we select on give us a biased mass. **In the rest of this example, we are going to show you how to unbias this by computing the correlation of our discriminator with the mass and penalizing the learning process.**

<h3>9. Using Correlation as a Loss During Training</h3>

Now finally we need to compute the correlation as a loss. What this means is we will build a function that inherits from pytroch, that takes a prediction and a target, and computes the pearson correlation. You will notice two additional components. First, the pearosn corrlelation we return is the correlation squared, that is to ensure that its always positive and the minimum is at zero correlation. Secondly, you will notice that we clamp on the mass range. This is because we want to ensure the correlation is strongest in the low mass region from 20 to 100 GeV, so we ignore the correlation at high mass and very low masses.

Finally and most importantly, the loss  needs to be differentiable, so check that the gradient is saved so we can backpropagate it later.


In [None]:
#>>>RUN: PROJ2.P2.2-runcell12

import torch.nn as nn

class PearsonLoss(nn.Module):
    def __init__(self,power=1):
        super(PearsonLoss, self).__init__()
        self.ipow     = power
        
    def pearson(self, pred, target):
        target = torch.clamp(target,20,100)#hack for correlation (we will just look corrleation of mass between 20 and 100)
        pred   = pred - pred.mean()
        pred   = (pred / pred.std())
        target = target - target.mean()
        target = (target / target.std())
        pred   = pred.pow(self.ipow)
        target = target.pow(self.ipow)
        ret = torch.mean((pred * target))
        return ret*ret
    
    def forward(self, features, labels, mask=None):
        if mask is not None:
            featuretest=features[mask]
            labeltest=labels[mask]
        else:
            featuretest=features
            labeltest=labels
        return self.pearson(featuretest,labeltest)
    
PLoss = PearsonLoss()
tsin   = torch.tensor(combined_samples[0:100])
tscore = model(tsin)
tmass  = torch.tensor(mass[0:100])
value=PLoss(tscore,tmass)
print("Print check backpropagate:",value.grad_fn)

Now we will write code to train the network, compute the correlation, and add this to the loss with a weight parameter (`lam`, short for lambda), then backpropagate it. The weight parameter lambda is a tunable parameter that will help us to gauge whether our loss is strongly corrleated or not. Let's go ahead and write the training function.

In [None]:
#>>>RUN: PROJ2.P2.2-runcell13

torch.manual_seed(1)

def train_decorr(iNEpoch,iModel,iDataLoader,lossfunc,corrfunc,lam=1):
    simple_criterion = lossfunc
    simple_optimizer = torch.optim.Adam(iModel.parameters(), lr=0.005) 
    for epoch in range(iNEpoch):
        for batch_idx, data  in enumerate(iDataLoader):
            simple_optimizer.zero_grad()
            outputs  = iModel(data[0]).flatten()
            baseloss = simple_criterion(outputs, data[1])
            masscorr = corrfunc(outputs,data[2])
            loss     = baseloss + lam*masscorr
            loss.backward()
            simple_optimizer.step()    
            current_loss = loss.item()
        if epoch % 1 == 0: print('[%d] loss: %.4f  base %.4f corr %.4f' % (epoch + 1,  current_loss, baseloss.item(), masscorr.item()))

model_decorr = MLP(input_size=combined_samples.shape[1])

Now, we can run the decorrelated training, and see what it does.

In [None]:
#>>>RUN: PROJ2.P2.2-runcell14

#in Colab, you may need to increase lam to get a result, try at least lam=25
train_decorr(10,model_decorr,trainloader,torch.nn.BCELoss(),PearsonLoss(),lam=20)

<h3>10. Compute the Decorrelated ROC and AUC</h3>

Ok, so now that we have a new training, let go ahead and look at the performance compared to the previous neural network.

So, we compute the decorrelated ROC and AUC and compare to the previous one


In [None]:
#>>>RUN: PROJ2.P2.2-runcell15

scores_decorr=test(model_decorr,testdataset).flatten()
# This is just a plot of the roc curve
auc_decorr = roc_auc_score(y_score=scores_decorr, y_true=testdataset[:][1])
fpr_decorr, tpr_decorr, cuts = roc_curve(y_score=scores_decorr, y_true=testdataset[:][1])
fig, ax = plt.subplots(1,1,figsize=(4,3),dpi=150)
plt.plot(fpr, tpr,label=f"{auc:.2f}")
plt.plot(fpr_decorr, tpr_decorr,label=f"{auc_decorr:.2f}"+ " decorr")
plt.xlabel("fpr")
plt.ylabel("tpr")
plt.legend(title="auc")
plt.show()

<h3>11. Plot the Discriminator and Plot with the Data</h3>

You can see the performance is not as good, but it's still non--zero. To go a little further, lets plot the discriminantor, cut on it and see how it sculpts our dataset. Ideally, we want our background to be flat.


In [None]:
#>>>RUN: PROJ2.P2.2-runcell16

labels=testdataset[:][1]
masses=testdataset[:][2]
scores_decorr=scores_decorr.detach().numpy()
_,bins,_=plt.hist(scores_decorr[labels==0],alpha=0.5,density=True,label='sig')
plt.hist(scores_decorr[labels==1],alpha=0.5,density=True,label='bkg',bins=bins)
plt.xlabel("NN Discriminator")
plt.ylabel("Events (normalized)")
plt.show()

print(masses[(labels == 0) & (scores_decorr > -1) ])
plot_hists(-0.82,-1*scores_decorr,masses,labels,density=False)
plt.show()

Finally, we write a script to take a sample, run the preselection, get the variables for the NN, normalize them, make a dataset and run the neural network so that you can finally apply the cut and output an array of `True` or `False` to apply the selection. This we can then apply to the data and simulation using the `plotDataSim` function.

In [None]:
#>>>RUN: PROJ2.P2.2-runcell17

def selectionWNN(iSamples):    
    #Pre-selection citeria
    samplesel = selection(iSamples)
    lSamples  = np.stack(list(iSamples.arrays(keys, library="np").values()),axis=-1)
    lSamples  = normalize(lSamples,minsamples,maxsamples)
    dataset   = DataSet(samples=lSamples,labels=np.ones(len(lSamples)),m=np.ones(len(lSamples)))
    lScores   = test(model_decorr,dataset).flatten()
    nnsel     = lScores < 0.9
    allcuts   = samplesel & nnsel.numpy()
    return allcuts

plotDataSim("vjet0_msd0", selectionWNN, "Jet Mass", [40,200])

<h3>11. Fit the W Peak</h3>

Ok great, now lets take our selection on data and run the same fitting that we have doen in the original project. Let's take a look.

In [None]:
#>>>RUN: PROJ2.P2.2-runcell18

datasel     = selectionWNN(data)
dataWNN = data.arrays('vjet0_msd0', library="np")["vjet0_msd0"][datasel]

def fitW(x, p0, p1, p2, p3, p4, p5, a, mu, sigma):
    '''
    Our model is a gaussian on top of 6th order polynomial.
    '''
    
    #Define the polynomial
    pols=[p0, p1, p2, p3, p4, p5]
    poly  = np.polyval(pols,x)
    
    #Define the gaussian
    gauss = np.exp(-((x-mu)**2.)/(2.*sigma**2))
    
    #Stick them together
    y =  poly + a*gauss
    
    return y

def fitMassNN(iMass,iPlot=True):
    #----------------------------
    # Now we get the data histogram so we can fit it
    bins = 28
    mrange = (40,175)
    counts, bins = np.histogram(dataWNN,bins=bins,range=mrange,density=False)

    yvar = counts
    w = 1./np.sqrt(yvar)
    binCenters = (bins[1:]+bins[:-1])*.5
    x,y = binCenters.astype("float32"), counts.astype("float32")

    #Perform the S+B fit 
    #NOTE: you may need to play with the initial parameters to get a good fit
    model = lm.Model(fitW,)
    p = model.make_params(p0=1.0919e-10,
                      p1=1.7249e-06,
                      p2=-4.1204e-04,
                      p3=5.0022e-04,
                      p4=0.63324239,
                      p5=1164.87316,
                      a=269.005533,
                      mu=iMass,
                      sigma=11.701217273469597,)
    p["mu"].vary=False
    
    result_W = model.fit(data=y,
                       params=p,
                       x=x,
                       weights=w,
                       method="leastsq")

    chisqrS = result_W.chisqr
    p["a"].value = 0
    p["a"].vary = False
    p["sigma"].vary = False
    result_Wb = model.fit(data=y,
                       params=p,
                       x=x,
                       weights=w,
                       method="leastsq")
    
    chisqrB = result_Wb.chisqr
    #Print the fit summary
    if iPlot:
        print(result_W.fit_report())
        plt.figure()
        result_W.plot()
        plt.xlabel("mass[GeV]",position=(0.92,0.1))
        plt.ylabel("Entries/bin",position=(0.1,0.84))

    
    return chisqrS-chisqrB

fitMassNN(86)

<h3>12. Mass Measurement</h3>

Finally, we put it all together and perform the mass measurement.

In [None]:
#>>>RUN: PROJ2.P2.2-runcell19

import scipy.stats as stats

chi2sqarr = np.array([])
mxrange = np.arange(80,89,0.1)
for mass in mxrange:
    dchi2=fitMassNN(mass,False)
    chi2sqarr=np.append(chi2sqarr,dchi2)
    
#get the best fit
chi2sqarr-= np.min(chi2sqarr)
argbest   = np.argmin(chi2sqarr)
bestfit   = mxrange[argbest]

#chi2 of 2 degrees of freedom 1 sigma bound
def pvalue(isigma):
    return stats.norm.cdf(isigma)-stats.norm.cdf(-isigma)
OneSigma   = stats.chi2.ppf(pvalue(1),2)

x_up   = np. interp(OneSigma,  chi2sqarr[argbest:], mxrange[argbest:]) 
x_down = np. interp(-OneSigma, -chi2sqarr[:argbest], mxrange[:argbest]) 
print("Best fit: ", bestfit, "+", (x_up-bestfit), "-",(bestfit-x_down)  )

plt.plot(mxrange,chi2sqarr)
plt.axhline(OneSigma,c='red')
plt.xlabel("mass[GeV]",position=(0.92,0.1))
plt.ylabel("$\Delta \chi^{2}$",position=(0.1,0.84))
plt.show()

<h3>Discussion and Next Steps</h3>

So what do we get?

With all of this, you should have performed a mass measurement. We have effectively taught a NN to do the hard work that we did in the project. However, all we have done here is given the neural network the problem and told it to decorrelate. This effectively automates the discovery of critical physics observables, and ultimately gives us better measurements of the parameters that exist!

**Go ahead and play with this project. Revisit steps 1-12 and do some or all of the following: try using more or different variables, change parameters in the NN, change the cuts that are made, change the weight `lam` in the loss function, etc. See if you can improve the decorrelation!**

Ultimately see if you can get a better measurement than what you got on Part 1 of the project. We can guarantee to you that you can, and we can also say this is just the tip of the iceberg. You can start to try to understand with the NN is doing, what are the critical observables? How do you find the data, what else can you explore? Its all embedded in here!

**Finally, remember to complete Tasks 2 and 3, below, where you will explain your approach and describe your results.**

<h3>Task 2: Explain your approach</h3>

**Explain how you approached optimizing your results, here:**

<h3>Task 3: Describe your results and characterize the significance.</h3>

**Describe your results here:**