<hr style="height: 1px;">
<i>This notebook was authored by the 8.S50x Course Team, Copyright 2022 MIT All Rights Reserved.</i>
<hr style="height: 1px;">
<br>

<h1>Lesson 16: Predicting the Momentum of the Tau Using Machine Learning</h1>


<a name='section_16_0'></a>
<hr style="height: 1px;">


## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L16.0 Overview</h2>


<h3>Navigation</h3>

<table style="width:100%">
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_16_1">L16.1 Higgs to Taus</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_16_1">L16.1 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_16_2">L16.2 Regression Analysis of Tau Momentum</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_16_2">L16.2 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_16_3">L16.3 Reconstructing the Higgs Mass</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_16_3">L16.3 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_16_4">L16.4 The Full Mass Regression</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_16_4">L16.4 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_16_5">L16.5 Tuning the NN Architecture</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_16_5">L16.5 Exercises</a></td>
    </tr>
</table>

In [None]:
#>>>RUN: L16.0-runcell00

!pip install torch
!pip install imageio
!pip install george
!pip install uproot
!pip install awkward
!pip install pylorentz

In [None]:
#>>>RUN: L16.0-runcell01

import torch                        #https://pytorch.org/docs/stable/torch.html
import torch.nn as nn               #https://pytorch.org/docs/stable/nn.html
from torch.autograd import Variable #https://pytorch.org/docs/stable/autograd.html
import torch.nn.functional as F     #https://pytorch.org/docs/stable/nn.functional.html
import torch.utils.data as Data     #https://pytorch.org/docs/stable/data.html

import matplotlib.pyplot as plt     #https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html
%matplotlib inline

from pylorentz import Momentum4     #https://pypi.org/project/pylorentz/

import numpy as np                  #https://numpy.org/doc/stable/
import imageio                      #https://imageio.readthedocs.io/en/stable/

In [None]:
#>>>RUN: L16.0-runcell02

#set plot resolution
%config InlineBackend.figure_format = 'retina'

#set default figure parameters
plt.rcParams['figure.figsize'] = (9,6)

medium_size = 12
large_size = 15

plt.rc('font', size=medium_size)          # default text sizes
plt.rc('xtick', labelsize=medium_size)    # xtick labels
plt.rc('ytick', labelsize=medium_size)    # ytick labels
plt.rc('legend', fontsize=medium_size)    # legend
plt.rc('axes', titlesize=large_size)      # axes title
plt.rc('axes', labelsize=large_size)      # x and y labels
plt.rc('figure', titlesize=large_size)    # figure title

<a name='section_16_1'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L16.1 Higgs to Taus</h2>  

| [Top](#section_16_0) | [Previous Section](#section_16_5) | [Exercises](#exercises_16_1) | [Next Section](#section_16_2) |


In [None]:
#>>>RUN: L16.1-slides

from IPython.display import IFrame
IFrame(src='https://mitx-8s50.github.io/slides/L19/slides_L19_06.html', width=970, height=550)

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 16.1.1</span>

Our objective is to reconstruct the tau momentum, and thereby probe the Higgs mass. First, we must consider the tau decay products, some of which are not visible (i.e. neutrinos).

If we write down all possible tau decays, including decays to electrons, muons, charged and neutral pions, there are as many as 10 different types of decays, all with similar probabilities of happening. Each decay produces neutrinos going in a slightly different direction.  Why would machine learning (ML) be a good way to determine the directions of the neutrinos? (Also, note, we can simulate this whole process well.)

A) ML can figure out the exact decay.\
B) ML is much faster than a rule based algorithm and speed is critical here.\
C) ML can come up with a weighted decision for the probability of each decay and, based on this weight and knowledge of the decays, determine the most likely MET (missing transverse energy).\
D) ML regression can simulate all decays and choose the best.


<a name='section_16_2'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L16.2 Regression Analysis of Tau Momentum</h2>  

| [Top](#section_16_0) | [Previous Section](#section_16_1) | [Exercises](#exercises_16_2) | [Next Section](#section_16_3) |


In [None]:
#>>>RUN: L16.2-slides

from IPython.display import IFrame
IFrame(src='https://mitx-8s50.github.io/slides/L19/slides_L19_07.html', width=970, height=550)

<h3>Data</h3>

>description: Higgs to tau dataset decaying to to tau leptons<br>
>source: https://zenodo.org/record/8035277 <br>
>attribution: Philip Harris (CMS Collaboration), DOI:10.5281/zenodo.8035277 

In [None]:
#>>>RUN: L16.2-runcell01

# NOTE: these files are too large to include in the original repository,
# so you must download them from the sources below
#
# Ways to download:
#     1. Copy/paste the link (replace =0 with =1 to download automatically)
#     2. Use the wget commands below (works in Colab, but you may need to install wget if using locally)
#
# Location of files:
#     Move the files to the directory data/L16
#
# Using wget: (works in Colab)
#     Upon downloading, the code below will move them to the appropriate directory

#get the data
!wget -P data/L16/ https://www.dropbox.com/s/csgx8t35i3un9kr/Regression2.root?dl=0
!mv data/L16/Regression2.root?dl=0 data/L16/Regression2.root

In [None]:
#>>>RUN: L16.2-runcell02

import uproot
from collections import OrderedDict 
reg    = uproot.open("data/L16/Regression2.root")["Tree"]

In [None]:
#>>>RUN: L16.2-runcell03

#what are the inputs
print(reg.keys())
cut=reg['genpt1'].array() >  100
vals=reg['genpt1'].array(library="np")[cut]
np.histogram(vals)

In [None]:
#>>>RUN: L16.2-runcell04

import numpy as np

def plot(iVar,iMin,iMax,iColor,iLabel): 
    mask=(reg[iVar].array() > 0)
    data=reg[iVar].array(library="np")[mask]
    counts, binEdges = np.histogram(data,bins=50,range=(iMin,iMax),density=False)
    binCenters = (binEdges[1:]+binEdges[:-1])*.5
    err = np.sqrt(counts)
    plt.errorbar(binCenters, counts, yerr=err,fmt="o",alpha=0.5,c=iColor,label=iLabel, ms=3)
    plt.xlabel("mass")
    plt.ylabel("N events")
    return binCenters,counts,err

plot("genpt1",0,200,"black","true $p_{T}$")
plot("recopt1",0,200,"red","observed $p_{T}$")
plt.legend()
plt.show()

In [None]:
#>>>RUN: L16.2-runcell05

#To visualize the whole problem let's make a 2D plot
mask=np.logical_and(reg["genpt1"].array() > 0, reg["recopt1"].array()>0)
x=reg["genpt1"].array(library="np")[mask]
y=reg["recopt1"].array(library="np")[mask]
plt.xlabel("gen $p_{T}$")
plt.ylabel("reco $p_{T}$")
plt.hist2d(x,y,bins=200)
plt.xlim(0,100)
plt.ylim(0,100)
plt.show()


print("Pre Correlation:",np.corrcoef(y.flatten(),x.flatten())[0][1])

In [None]:
#>>>RUN: L16.2-runcell06

#Let's prepare the data to be pytorch friendly
x=torch.from_numpy(reg["recopt1"].array(library="np")[mask].reshape(len(reg["recopt1"].array(library="np")[mask]),1))
y=torch.from_numpy(reg["genpt1"].array(library="np")[mask].reshape(len(reg["genpt1"].array(library="np")[mask]),1))
x, y = Variable(x), Variable(y)
#torch_dataset = Data.TensorDataset(x, y)
#loader = Data.DataLoader(dataset=torch_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2,)

#Now let's make a simple model
torch.manual_seed(1)    # reproducible
net = torch.nn.Sequential(
        torch.nn.Linear(1, 200),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(200, 100),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(100, 1),
    )
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
loss_func = torch.nn.MSELoss()

In [None]:
#>>>RUN: L16.2-runcell07

def makePlot(x,y,prediction,ax,fig,images,t,loss,ymin,ymax):
    # plot and show learning process
    plt.cla()
    ax.set_title('Regression Analysis', fontsize=35)    
    ax.set_xlabel('Mass', fontsize=24)
    ax.set_ylabel('N', fontsize=24)
    ax.hist(prediction.data.numpy(),color="red",bins=20,range=(0,200),alpha=0.5,label='pred')
    ax.hist(y.data.numpy(),color="black" ,bins=20,range=(0,200),alpha=0.5,label='gen')
    ax.hist(x.data.numpy(),color="green",bins=20,range=(0,200),alpha=0.5,label='reco')
    #ax.scatter(x.data.numpy(), y.data.numpy(), color = "orange")
    #ax.plot(x.data.numpy(), prediction.data.numpy(), 'g-', lw=3)
    ax.text(100, 2000, 'Epoch = %d' % t,fontdict={'size': 24, 'color':  'red'})
    ax.text(100, 5000, 'Loss = %.4f' % loss.data.numpy(),fontdict={'size': 24, 'color':  'red'})
    fig.canvas.draw()       # draw the canvas, cache the renderer
    ax.legend()
    image = np.frombuffer(fig.canvas.tostring_rgb(), dtype='uint8')
    image  = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))
    images.append(image)
    
def train(x,y,net,loss_func,opt,nepochs,ymin,ymax):
    images = []
    fig, ax = plt.subplots(figsize=(12,7))
    for epoch in range(nepochs):
        if epoch % 50 == 0: 
            print("epoch:",epoch)
        prediction = net(x)
        loss = loss_func(prediction, y) 
        opt.zero_grad()
        loss.backward() 
        optimizer.step()
        if epoch % 4 == 0:
            makePlot(x,y,prediction,ax,fig,images,epoch,loss,ymin,ymax)
    return images

In [None]:
#>>>RUN: L16.2-runcell08

from IPython.display import Image
images=train(x,y,net,loss_func,optimizer,150,0,1)
torch.save(net.state_dict(), 'data/L16/tau_pt_simple.pt')
imageio.mimsave('data/L16/reg_1.gif', images, fps=12)
Image(open('data/L16/reg_1.gif','rb').read())

# Note: you may wish to visualize the results as open histograms, rather than filled-in.
# Here, we will keep the same styling as the related video.

In [None]:
#>>>RUN: L16.2-runcell09

true=reg["genpt1"].array(library="np")[mask]
reco=reg["recopt1"].array(library="np")[mask]
pred=net(x)
ratio=np.array(true/reco)
ratiopred=y/pred
plt.hist(ratio,color="red",bins=20,range=(0,3),alpha=0.5,label="true")
plt.hist(ratiopred.data.numpy(),color="blue",bins=20,range=(0,3),alpha=0.5,label="corr")
plt.legend()
plt.show()
print("True Mean: ",ratio.mean(),"True StdDev:",ratio.std())
print("NN Mean: ",ratiopred.data.numpy().mean(),"NN StdDev:",ratiopred.data.numpy().std())

<a name='exercises_16_2'></a>     

| [Top](#section_16_0) | [Restart Section](#section_16_2) | [Next Section](#section_16_3) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 16.2.1</span>

Consider the preceding plot, generated by code cell `L16.2-runcell09`. Before we discuss the neural network prediction, why does the ratio of the generated over reconstructed tau momentum (labeled `true`) exhibit a large tail extending to values bigger than 1?

A) Systematic uncertainties in the measurement contribute to variations in the momentum estimation, leading to a large tail in the reconstructed tau momentum.\
B) The reconstruction process underestimates the total momentum (on average) because it only accounts for the momentum of the visible components, ignoring the momentum of any neutrinos.\
C) The reconstruction process overestimates the total momentum (on average) because it only accounts for the momentum of the visible components, ignoring the momentum of any neutrinos.\
D) The reconstruction process accounts for too many neutrinos, thus overestimating the total momentum (on average).\
E) The reconstruction process accounts for too many neutrinos, thus underestimating the total momentum (on average).

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 16.2.2</span>

The blue histogram generated by code cell `L16.2-runcell09` shows the ratio of the true momentum divided by that found by the neural network. Why does the neural network output yield an average ratio that is much closer to 1, and why is the width (i.e. the standard deviation of the ratio distribution) much smaller? Select all that apply:

A) The neural network brings the average correction to 1 because (on average) it accurately predicts the neutrino momentum.\
B) The neural network brings the average correction to 1 because it learns to always predict the neutrino momentum perfectly.\
C) The width of the ratio distribution is partly an indication of how well the neural network can predict the neutrino momentum.\
D) As the neural network becomes more complex and incorporates additional inputs, the width of the distribution should become narrower, indicating improved prediction of the neutrino momentum.



### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 16.2.3</span>

Code cell `L16.2-runcell05` generated a 2D plot of the correlation between the reconstructed and generated tau momentum and also calculated their correlation coefficient. Complete the code cell below to calculate the correlation coefficient for the momentum predicted by the neural network versus the generated value.

What is the correlation of the neural network output with the true value? How does this compare to the correlation before the NN correction? Enter your answer as a list of numbers with precision 1e-2: `[corr-original, corr-NN]`


In [None]:
#>>>EXERCISE: L16.2.3
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

ytmp=y.detach().numpy()
ptmp=pred.detach().numpy()
print("Pre Correlation:", #YOUR CODE HERE)
print("NN Correlation:", #YOUR CODE HERE)

<a name='section_16_3'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L16.3 Reconstructing the Higgs Mass</h2>  

| [Top](#section_16_0) | [Previous Section](#section_16_2) | [Exercises](#exercises_16_3) | [Next Section](#section_16_4) |


In [None]:
#>>>RUN: L16.3-runcell01

#now let's construct the Higgs mass
plot("hmass",0,200,"black","True Higgs Mass")
plot("recohmass",0,200,"red"," Reconstructed")
plt.legend()
plt.show()

In [None]:
#>>>RUN: L16.3-runcell02

#!pip install pylorentz
from pylorentz import Momentum4

#Let's compute the mass on the fly
#Momentum4 calculates the 4-vector, taking as input the mass, eta and phi angles, and the transverse momentum (pT) 
def masscompute(iVec1,iVec2):
    tau_1 = Momentum4.m_eta_phi_pt(iVec1[3], iVec1[1], iVec1[2], iVec1[0])
    tau_2 = Momentum4.m_eta_phi_pt(iVec2[3], iVec2[1], iVec2[2], iVec2[0])
    return (tau_1+tau_2).m
    
def hmass(massfunc):
    mask=(reg["recohmass"].array() > 0)
    varlist=["recopt1","recoeta1","recophi1","recomass1","recopt2","recoeta2","recophi2","recomass2"]
    arr=0
    idx=0
    for x in varlist:
        pArr=reg[x].array(library="np")[mask]
        if idx == 0: 
            arr = pArr
            idx = idx + 1
        else:
            arr=np.vstack((arr,pArr))
    arr = arr.T
    massc = lambda iarr: massfunc(iarr[0:4],iarr[4:8]) 
    hmasses = np.array([massc(p) for p in arr])
    return hmasses

rawmvis=hmass(masscompute)
plt.title('Reconstructed Mass')
plt.hist(rawmvis,bins=50,range=(0,200),color='blue',label="New mass calculation")
#plot("hmass",0,200,"black","True Higgs Mass")
plot("recohmass",0,200,"red","Mass from dataset")
plt.legend()
plt.show()

In [None]:
#>>>RUN: L16.3-runcell03

#Let's compute the mass on the fly
def masscompute(iVec1,iVec2):
    pt1 = torch.tensor([iVec1[0]])
    pt2 = torch.tensor([iVec2[0]])
    corr1 = net(pt1).data.numpy()
    corr2 = net(pt2).data.numpy()
    # Here, we replace the reconstructed momentum with the neural-net correcged value
    tau_1 = Momentum4.m_eta_phi_pt(iVec1[3], iVec1[1], iVec1[2], corr1)
    tau_2 = Momentum4.m_eta_phi_pt(iVec2[3], iVec2[1], iVec2[2], corr2)
    return (tau_1+tau_2).m

rawmass=hmass(masscompute)
plt.hist(rawmass,bins=50,range=(0,200),color='blue',label="Neural net corrected mass")
#plot("hmass",0,200,"black","True Higgs Mass")
plot("recohmass",0,200,"red","Reconstructed mass")
plt.legend()
plt.show()

In [None]:
#>>>RUN: L16.3-runcell06

#Now let's make a simple model
torch.manual_seed(1)    # reproducible
net = torch.nn.Sequential(
        torch.nn.Linear(1, 200),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(200, 100),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(100, 1),
    )
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
loss_func = torch.nn.MSELoss()
x=torch.from_numpy(reg["recopt1"].array(library="np")[mask].reshape(len(reg["recopt1"].array(library="np")[mask]),1))
y=torch.from_numpy(reg["genpt1"].array(library="np")[mask].reshape(len(reg["genpt1"].array(library="np")[mask]),1))
x, y = Variable(x), Variable(y)
ratio=torch.div(y,x)
y=ratio

#Let's compute the mass on the fly
def masscomputeNN(iVec1,iVec2):
    pt1 = torch.tensor([iVec1[0]])
    pt2 = torch.tensor([iVec2[0]])
    corr1 = net(pt1).data.numpy()*iVec1[0]
    corr2 = net(pt2).data.numpy()*iVec2[0]
    tau_1 = Momentum4.m_eta_phi_pt(iVec1[3], iVec1[1], iVec1[2], corr1)
    tau_2 = Momentum4.m_eta_phi_pt(iVec2[3], iVec2[1], iVec2[2], corr2)
    return (tau_1+tau_2).m

#And let's plot the mass instead of the pT
def makePlot(x,y,prediction,ax,fig,images,t,loss,ymin,ymax):
    #compute the mass
    rawmass=hmass(masscomputeNN)
    # plot and show learning process
    plt.cla()
    ax.hist(rawmvis,bins=40,range=(0,250),color='blue',alpha=0.5)#,label='raw')
    ax.hist(rawmass,bins=40,range=(0,250),color='red',alpha=0.5)#,label='regressed')
    ax.text(150, 300, 'Epoch = %d' % t,fontdict={'size': 24, 'color':  'red'})
    ax.text(150, 600, 'Loss = %.4f' % loss.data.numpy(),fontdict={'size': 24, 'color':  'red'})
    ax.set_title('Regression Analysis', fontsize=35)
    ax.set_xlabel('Mass', fontsize=24)
    ax.set_ylabel('N', fontsize=24)
    ax.set_ylim(0,2000)
    fig.canvas.draw()       # draw the canvas, cache the renderer
    #ax.legend()
    image = np.frombuffer(fig.canvas.tostring_rgb(), dtype='uint8')
    image  = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))
    images.append(image)

In [None]:
#>>>RUN: L16.3-runcell07

#NOTE: if training does not complete due to timeout in Colab,
#reduce the number of epochs to 250 and run this cell twice,
#or reduce to 125 and run this cell four times,
#for a total of 500 training epochs
images=train(x,y,net,loss_func,optimizer,500,0,1)
torch.save(net.state_dict(), 'data/L16/tau_pt_ratio.pt')
imageio.mimsave('data/L16/reg2_long.gif', images, fps=12)
Image(open('data/L16/reg2_long.gif','rb').read())

<a name='exercises_16_3'></a>     

| [Top](#section_16_0) | [Restart Section](#section_16_3) | [Next Section](#section_16_4) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 16.3.1</span>

The Higgs boson has a mass of 125 GeV, which means that for a Higgs at rest, the momentum of each tau will be 125/2 = 62.5 GeV. This also means that the visible components of the tau will be less than 62.5. For instance, if the visible components take up half the tau momentum, then the visible momentum is 31.25 GeV. The correction to account for the neutrino will be large or small depending on whether the tau has low  or high momentum, respectively. In this problem, we will verify that the neural network has learned this momentum dependence by looking at the correction that it makes for a visible tau momentum of 20 GeV (low), compared to a visible tau momentum of 200 GeV(high). 

Now, you might ask, how do you get 200 GeV taus as decay products from the Higgs? This can happen because the Higgs is typically **not** produced at rest. Instead, it usually has some non-zero momentum that is a result of the production process inside the proton collision. Basically, other quarks in the proton recoil off the Higgs, giving it momentum. As a result, we can get higher momentum Higgs bosons decaying into higher momentum taus. However, the rate of these types of events rapidly drops off with increasing momentum, with the highest typical Higgs momentum being 50 GeV.

Compute the relative NN correction for a 20 GeV input, compared to the correction for a 200 GeV input (use the state of the network after training for 500 epochs, i.e., after having run `L16.3-runcell07`). Express these results in terms of a fractional correction (the ratio of final over initial momenta), in order to compare the relative scale properly. Report your answer as a list of two numbers with precision 1e-3: `[fractional correction for 20 GeV input , fractional correction for 200 GeV input]`


In [None]:
#>>>EXERCISE: L16.3.1
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

pVal=torch.tensor([20]).float()
corr_20GeV = net(pVal).data.numpy()
print("Fraction Correction [20 GeV Input]: ", #YOUR CODE HERE)

pVal=torch.tensor([200]).float()
corr_200GeV = net(pVal).data.numpy()
print("Fraction Correction [200 GeV Input]: ", #YOUR CODE HERE)

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 16.3.2</span>

In this section, we showed that we could reconstruct the Higgs mass based on tau momenta. We then applied the NN to the reconstructed (observed) tau momenta and used the new NN-corrected momenta to calculate the Higgs mass, which exhibited a nice peak at the expected value.

Before moving on, let's consider sources of bias in this analysis. Select all options below that characterize a source of bias:

A) Our NN is potentially biased because the features are all corrleated.\
B) Our NN is potentially biased because we gave it one Higgs sample at mass of 125 and so it will assume all taus no matter what energy they have came from a Higgas with a mass of 125.\
C) Our NN is potentially biased because some variables have clear correlations with our regression target, when the tau has a specific energy, but not all energies.\
D) Our NN is potentially biased because our simulation is not as good at simulating taus with hadronic neutral pion decays.



<a name='section_16_4'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L16.4 The Full Mass Regression</h2>     

| [Top](#section_16_0) | [Previous Section](#section_16_3) | [Exercises](#exercises_16_4) | [Next Section](#section_16_5) |


In [None]:
#>>>RUN: L16.4-slides

from IPython.display import IFrame
IFrame(src='https://mitx-8s50.github.io/slides/L19/slides_L19_08.html', width=970, height=550)

In [None]:
#>>>RUN: L16.4-runcell01

#Let's compute the mass on the fly
def makedataset(iMask,iPart="part1"):
    varlist=[iPart+"pt1",iPart+"eta1",iPart+"phi1",iPart+"id1",iPart+"pt2",iPart+"eta2",iPart+"phi2",iPart+"id2",iPart+"pt3",iPart+"eta3",iPart+"phi3",iPart+"id3",iPart+"pt4",iPart+"eta4",iPart+"phi4",iPart+"id4",iPart+"pt5",iPart+"eta5",iPart+"phi5",iPart+"id5"]
    arr=0
    idx=0
    for x in varlist:
        pArr=reg[x].array(library="np")[iMask]
        if idx == 0: 
            arr = pArr
            idx = idx + 1
        else:
            arr=np.vstack((arr,pArr))
    arr = arr.T
    return arr

mask1=(reg["genpt1"].array(library="np") > 0)
mask2=(reg["recopt1"].array(library="np") > 0)
mask3=(reg["genpt2"].array(library="np") > 0)
mask4=(reg["recopt2"].array(library="np") > 0)
mask = np.logical_and.reduce([mask1,mask2,mask3,mask4])
x=torch.from_numpy(makedataset(mask))
yb=torch.from_numpy(reg["recopt1"].array(library="np")[mask].reshape(len(reg["recopt1"].array(library="np")[mask]),1))
y=torch.from_numpy(reg["genpt1"].array(library="np")[mask].reshape(len(reg["genpt1"].array(library="np")[mask]),1))
ratio=torch.div(y,yb)
y=ratio
x,y = Variable(x),Variable(y)
torch_dataset = Data.TensorDataset(x, y)
#print(x)

In [None]:
#>>>RUN: L16.4-runcell02

ds=makedataset(mask)
colors = ['g','r','b','y','orange']
for i0 in range(5):
    for ipart in range(5):
        plt.scatter(ds[i0,4*ipart+1], ds[i0,4*ipart+2], s=ds[i0,4*ipart]*5000/yb[i0], c=colors[ipart], alpha=0.5)
    plt.xlim(-0.5,0.5)
    plt.ylim(-0.5,0.5)
    plt.xlabel("$\eta$")
    plt.ylabel("$\phi$")
    plt.text(-0.3,0.4,"Correction Factor "+str(ratio[i0].numpy()[0]))
    plt.show()

In [None]:
#>>>RUN: L16.4-runcell03

#now let's see if we can improve this with something more complicated
torch.manual_seed(1)    # reproducible
net = torch.nn.Sequential(
        torch.nn.Linear(20, 200),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(200, 200),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(200, 50),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(50, 1),
    )
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
loss_func = torch.nn.MSELoss()

In [None]:
#>>>RUN: L16.4-runcell04

p1=torch.from_numpy(makedataset(mask,"part1"))
p2=torch.from_numpy(makedataset(mask,"part2"))

def masscomputeNN(iC1,iC2,iVec1,iVec2):
    corr1 = iC1*iVec1[0]
    corr2 = iC2*iVec2[0]
    tau_1 = Momentum4.m_eta_phi_pt(iVec1[3], iVec1[1], iVec1[2], corr1)
    tau_2 = Momentum4.m_eta_phi_pt(iVec2[3], iVec2[1], iVec2[2], corr2)
    return (tau_1+tau_2).m

#now let's compute the corrected mass on the data set
def hmass(masscomputeNN):
    corr1=net(p1)
    corr2=net(p2)
    varlist=["recopt1","recoeta1","recophi1","recomass1","recopt2","recoeta2","recophi2","recomass2"]
    arr=np.vstack((corr1.data.numpy().T,corr2.data.numpy().T))
    for x in varlist:
        pArr=reg[x].array(library="np")[mask]
        arr=np.vstack((arr,pArr))
    arr = arr.T
    massc = lambda iarr: masscomputeNN(iarr[0],iarr[1],iarr[2:6],iarr[6:10]) 
    hmasses = np.array([massc(p) for p in arr])
    return hmasses

#now update to add history
history_lr = {'loss':[], 'val_loss':[]}
def train(x,y,net,loss_func,opt,nepochs,ymin,ymax):
    images = []
    fig, ax = plt.subplots(figsize=(12,7))
    for epoch in range(nepochs):
        if epoch % 50 == 0: 
            print("epoch:",epoch)
        prediction = net(x)
        loss = loss_func(prediction, y) 
        opt.zero_grad()
        loss.backward() 
        optimizer.step()
        with torch.no_grad():#disable updating gradient
            if epoch % 50 == 0:
                print('[%d] loss: %.4f ' % (epoch + 1, loss ))
            history_lr['loss'].append(loss)
        if epoch % 5 == 0:
            makePlot(x,y,prediction,ax,fig,images,epoch,loss,ymin,ymax)
    return images

rawmass=hmass(masscomputeNN)
plt.hist(rawmass,bins=50,range=(0,500),color='blue',alpha=0.5,label="Adding the NN")
plt.xlabel("mass(GeV)")
plt.ylabel("N$_{events}$")
plt.show()

In [None]:
#>>>RUN: L16.4-runcell05

#NOTE: if training does not complete due to timeout in Colab,
#reduce the number of epochs to 250 and run this cell twice,
#or reduce to 125 and run this cell four times,
#for a total of 500 training epochs
images=train(x,y,net,loss_func,optimizer,500,0,1)
torch.save(net.state_dict(), 'data/L16/tau_reg_fullpart.pt')
imageio.mimsave('data/L16/full_reg2.gif', images, fps=12)
Image(open('data/L16/full_reg2.gif','rb').read())

In [None]:
#>>>RUN: L16.4-runcell06

tmploss=[]
for i0 in range(len(history_lr['loss'])):
    tmploss.append(history_lr['loss'][i0].detach().numpy())
plt.semilogy(tmploss, label='loss')
plt.legend(loc="upper right")
plt.xlabel('epoch')
plt.ylabel('loss (binary crossentropy)')
plt.show()

In [None]:
#>>>RUN: L16.4-runcell07

def plotcorr(iVar,iNN,iMin,iMax,iColor,iLabel,iCorr=True): 
    corr1=iNN(p1)
    data=reg[iVar].array(library="np")[mask]
    if iCorr:
        data=data*corr1.data.numpy().T
    counts, binEdges = np.histogram(data,bins=50,range=(iMin,iMax),density=False)
    binCenters = (binEdges[1:]+binEdges[:-1])*.5
    err = np.sqrt(counts)
    plt.errorbar(binCenters, counts, yerr=err,fmt="o",c=iColor, ms=3,label=iLabel)
    
plotcorr("genpt1" ,net,0,200,"black","gen",False)
plotcorr("recopt1",net,0,200,"red","reco",False)
plotcorr("recopt1",net,0,200,"blue","corrected")
plt.legend()
plt.show()

<a name='exercises_16_4'></a>     

| [Top](#section_16_0) | [Restart Section](#section_16_4) | [Next Section](#section_16_5) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 16.4.1</span>

Let's see how well this works by looking at the correction for the second particle, which was not included in the training. Run the code cell below in order to compare the NN correction with the true correction (use the state of the network after training for 500 epochs, i.e., after having run `L16.4-runcell05`). Report your answer as a list of two numbers with precision 1e-2: `[NN correction, true correction]'

In [None]:
#>>>EXERCISE: L16.4.1

def plotcorrp2(iTrueVar,iRecoVar,iNN,iMin,iMax,iColor,iLabel,iCorr=True): 
    corr1=iNN(p2).detach().numpy()    
    true=reg[iTrueVar].array(library="np")[mask]
    reco=reg[iRecoVar].array(library="np")[mask]
    corr2=true/reco
    counts, binEdges = np.histogram(corr1,bins=50,range=(iMin,iMax),density=False)
    binCenters = (binEdges[1:]+binEdges[:-1])*.5
    err = np.sqrt(counts)
    plt.errorbar(binCenters, counts, yerr=err,fmt="o",c=iColor, ms=3,label="NN")
    counts, binEdges = np.histogram(corr2,bins=50,range=(iMin,iMax),density=False)
    plt.errorbar(binCenters, counts, yerr=err,fmt="o",c="Black", ms=3,label="True")
    print("NN   Mean : ",np.mean(corr1),"\t RMS: ",corr1.std())
    print("True Mean : ",np.mean(corr2),"\t RMS: ",corr2.std())
    
plotcorrp2("genpt2","recopt2" ,net,0,4,"green","gen",False)
plt.legend()
plt.show()

<a name='section_16_5'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L16.5 Tuning the NN Architecture</h2>     

| [Top](#section_16_0) | [Previous Section](#section_16_4) | [Exercises](#exercises_16_5) |


In [None]:
#>>>RUN: L16.5-runcell01

torch.manual_seed(1)    # reproducible

class LSTM(nn.Module):

    def __init__(self, num_classes, input_size, hidden_size, num_layers):
        super(LSTM, self).__init__()
        
        self.num_classes = num_classes
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=True)
        
        self.fc1 = nn.Linear(hidden_size, 20)
        self.fc2 = nn.Linear(20, num_classes)

    def forward(self, x):
        h_0 = Variable(torch.zeros(
            self.num_layers, x.size(0), self.hidden_size))
        
        c_0 = Variable(torch.zeros(
            self.num_layers, x.size(0), self.hidden_size))
        
        # Propagate input through LSTM
        ula, (h_out, _) = self.lstm(x, (h_0, c_0))
        
        h_out = h_out.view(-1, self.hidden_size)
        
        out = self.fc1(h_out)
        out = F.relu(out)
        out = self.fc2(out)
        return out

input_size = 4 # take in 4 vectors
hidden_size = 128 # hidden layers
num_layers = 1 # output layers
num_classes = 1 # output values (just 1 the correction)
lstm = LSTM(num_classes, input_size, hidden_size, num_layers)
criterion = torch.nn.MSELoss()    # mean-squared error for regression
optimizer = torch.optim.Adam(lstm.parameters(), lr=0.01)
lstm.train()

In [None]:
#>>>RUN: L16.5-runcell02

def makedatasetrnn(iMask,iPart="part1"):
    arr=makedataset(iMask,iPart)
    return arr.reshape(len(arr),5,4)

x=torch.from_numpy(makedatasetrnn(mask))

In [None]:
#>>>RUN: L16.5-runcell03

p1=torch.from_numpy(makedatasetrnn(mask,"part1"))
p2=torch.from_numpy(makedatasetrnn(mask,"part2"))

#now let's compute the corrected mass on the data set
def hmass(masscomputeNN):
    mask=(reg["recohmass"].array(library="np") > 0)
    corr1=lstm(p1)
    corr2=lstm(p2)
    varlist=["recopt1","recoeta1","recophi1","recomass1","recopt2","recoeta2","recophi2","recomass2"]
    arr=np.vstack((corr1.data.numpy().T,corr2.data.numpy().T))
    for x in varlist:
        pArr=reg[x].array(library="np")[mask]
        arr=np.vstack((arr,pArr))
    arr = arr.T
    massc = lambda iarr: masscomputeNN(iarr[0],iarr[1],iarr[2:6],iarr[6:10]) 
    hmasses = np.array([massc(p) for p in arr])
    return hmasses

outmass=hmass(masscomputeNN)
plt.hist(outmass,bins=40,range=(0,250),color='blue')
plt.xlabel("mass")
plt.ylabel("N$_{events}$")
plt.show()

In [None]:
#>>>RUN: L16.5-runcell04

#NOTE: if training does not complete due to timeout in Colab,
#reduce the number of epochs to 250 and run this cell twice,
#or reduce to 125 and run this cell four times,
#for a total of 500 training epochs
history_lr = {'loss':[], 'val_loss':[]}
images=train(x,y,lstm,criterion,optimizer,500,0,1)
torch.save(lstm.state_dict(), 'data/L16/tau_reg_lstm.pt')
imageio.mimsave('data/L16/reg_lstm.gif', images, fps=12)
Image(open('data/L16/reg_lstm.gif','rb').read())


In [None]:
#>>>RUN: L16.5-runcell05

tmploss=[]

for i0 in range(len(history_lr['loss'])):
    tmploss.append(history_lr['loss'][i0].detach().numpy())
plt.semilogy(tmploss, label='loss')
plt.legend(loc="upper right")
plt.xlabel('epoch')
plt.ylabel('loss (binary crossentropy)')
plt.show()

In [None]:
#>>>RUN: L16.5-runcell06

def plotcorr(iVar,iNN,iMin,iMax,iColor,iLabel,iCorr=True):
    corr1=iNN(p1)
    data=reg[iVar].array(library="np")[mask]
    if iCorr:
        data=data*corr1.data.numpy().T
    counts, binEdges = np.histogram(data,bins=50,range=(iMin,iMax),density=False)
    binCenters = (binEdges[1:]+binEdges[:-1])*.5
    err = np.sqrt(counts)
    plt.errorbar(binCenters, counts, yerr=err,fmt="o",c=iColor, ms=3,label=iLabel)
    
plotcorr("genpt1" ,lstm,0,200,"black","gen",False)
plotcorr("recopt1",lstm,0,200,"red","reco",False)
plotcorr("recopt1",lstm,0,200,"blue","corrected")
plt.legend()
plt.show()

<a name='exercises_16_5'></a>   

| [Top](#section_16_0) | [Restart Section](#section_16_5) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 16.5.1</span>

Complete the code cell below to compute the correlation coefficient with the LSTM that we have made with the gen momentum. How does this compare to our initial correlation before using the neural network? Use the state of the network after training for 1000 epochs, i.e., after having run `L16.5-runcell04`.

Enter your answer as a list of numbers with precision 1e-2: `[corr-original, corr-NN]`

In [None]:
#>>>EXERCISE: L16.5.1
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

pred=lstm(p1)
ytmp=y.detach().numpy().flatten()*(yb.detach().numpy().flatten())
ptmp=pred.detach().numpy().flatten()*(yb.detach().numpy().flatten())

print("Pre Correlation:", #YOUR CODE HERE)
print("NN Correlation:", #YOUR CODE HERE)