<hr style="height: 1px;">
<i>This notebook was authored by the 8.S50x Course Team, Copyright 2022 MIT All Rights Reserved.</i>
<hr style="height: 1px;">
<br>

<h1>Lesson 14: An Example With LHC Data</h1>


<a name='section_14_0'></a>
<hr style="height: 1px;">


## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L14.0 Overview</h2>


<h3>Navigation</h3>

<table style="width:100%">
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_14_1">L14.1 Large Hadron Collider Data</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_14_1">L14.1 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_14_2">L14.2 Loading Data and Defining the Network</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_14_2">L14.2 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_14_3">L14.3 Training and Testing the Network</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_14_3">L14.3 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_14_4">L14.4 Adding a Hidden Layer</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_14_4">L14.4 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_14_5">L14.5 Regularization, Batch Normalization, and Dropout</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_14_5">L14.5 Exercises</a></td>
    </tr>
</table>



<h3>Data</h3>

>description: CMS Crystal Shower Shape Data<br>
>source: https://zenodo.org/record/8035308 <br>
>attribution: Rankin, Dylan (CMS Collaboration), DOI:10.5281/zenodo.8035308 

In [None]:
#>>>RUN: L14.0-runcell00

# NOTE: these files are too large to include in the original repository, so you must download them from here:
# https://www.dropbox.com/s/i1dbakzr3pn9twd/xtalTuple_TTbar_PU0.z?dl=0
#
# Ways to download:
#     1. Copy/paste the link (replace =0 with =1 to download automatically)
#     2. Use the wget commands below (works in Colab, but you may need to install wget if using locally)
#
# Location of files:
#     Move the files to the directory data/L14
#
# Using wget: (works in Colab)
#     Upon downloading, the code below will move them to the appropriate directory

#get the data
!wget -P data/L14 https://www.dropbox.com/s/i1dbakzr3pn9twd/xtalTuple_TTbar_PU0.z?dl=0
!mv data/L14/xtalTuple_TTbar_PU0.z?dl=0 data/L14/xtalTuple_TTbar_PU0.z 

In [None]:
#>>>RUN: L14.0-runcell01

#If using notebooks locally, run the following within your conda environment (if not done already)
#conda install pandas

import numpy as np               #https://numpy.org/doc/stable/
import matplotlib.pyplot as plt  #https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html
import h5py                      #https://docs.h5py.org/en/stable/quick.html#quick
import pandas as pd              #https://pandas.pydata.org/docs/user_guide/index.html
import torch                     #https://pytorch.org/docs/stable/torch.html

In [None]:
#>>>RUN: L14.0-runcell02

#set plot resolution
%config InlineBackend.figure_format = 'retina'

#set default figure parameters
plt.rcParams['figure.figsize'] = (9,6)

medium_size = 12
large_size = 15

plt.rc('font', size=medium_size)          # default text sizes
plt.rc('xtick', labelsize=medium_size)    # xtick labels
plt.rc('ytick', labelsize=medium_size)    # ytick labels
plt.rc('legend', fontsize=medium_size)    # legend
plt.rc('axes', titlesize=large_size)      # axes title
plt.rc('axes', labelsize=large_size)      # x and y labels
plt.rc('figure', titlesize=large_size)    # figure title


<a name='section_14_1'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L14.1 Large Hadron Collider Data</h2>  

| [Top](#section_14_0) | [Previous Section](#section_14_0) | [Exercises](#exercises_14_1) | [Next Section](#section_14_2) |


In [None]:
#>>>RUN: L14.1-slides

from IPython.display import IFrame
IFrame(src='https://mitx-8s50.github.io/slides/L14/slides_L14_01.html', width=970, height=550)

<a name='exercises_14_1'></a>     

| [Top](#section_14_0) | [Restart Section](#section_14_1) | [Next Section](#section_14_2) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.1.1</span>

The CMS ECAL is intended to identify photons and electrons. However, it is often the case that you can get particles that mimic photons and electrons. In particular, pions can leave large energy deposits in the ECAL. Charged pions will produce a charged track and shower in the calorimeter. These are usually not a problem since they can be identified by the fact that they also deposit energy in the Hadron Calorimeter behind the ECAL.

On the other hand, neutral pions will decay into two photons that are close to each other (colinear). In fact, they are typically so close together that they look like a single photon. The problem that we would like to solve is the separation of neutral pion decays from photons directly from the original collision. Let's say we are looking for a process that decays to photons, for example the Higgs decay to two well-separated photons. Selecting the Higgs involves selecting two photons on top of backgrounds from *fake* photons. What could a neural network do to remove fake photons? 

A) Reduce the background by eliminating fake events that are produced from pions. This is done by selecting events that have a large probability of containing TWO real photons.\
B) Do nothing to the background, just help to make suggestions as to what is more likely background.\
C) Generate a weight for each event quantifying the likelihood that it contains real photons. This weight can be used to look for the Higgs.\
D) Reduce the background by eliminating fake events that are produced from pions. This is done by selecting events that have a large probability of containing ONE real photon.



### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.1.2</span>

If our dominant background comes from pions that decay into two nearby photons, what would allow us to discriminate these cases from real photons? Select all that apply:

A) Calorimeter shapes that look like two energy blobs in the cells.\
B) Wider calorimeter shapes.\
C) A single high energy deposit.\
D) Calorimeter shapes that are wider in a specific direction.



<a name='section_14_2'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L14.2 Loading Data and Defining the Network</h2>  

| [Top](#section_14_0) | [Previous Section](#section_14_1) | [Exercises](#exercises_14_2) | [Next Section](#section_14_3) |


In [None]:
#>>>RUN: L14.2-runcell01

import h5py
import pandas as pd


treename = 'l1pf_egm_reg'

VARS = ['pt', 'eta', 'phi', 'energy',
  'e2x2', 'e2x5', 'e3x5', 'e5x5', 'e2x2_div_e2x5', 'e2x2_div_e5x5', 'e2x5_div_e5x5',#7
  'hoE', 'bremStrength', 'ecalIso', 'crystalCount',#4
  'lowerSideLobePt','upperSideLobePt',#2
  'phiStripContiguous0', 'phiStripOneHole0', 'phiStripContiguous3p', 'phiStripOneHole3p',#4
  'sihih','sipip','sigetaeta','sigphiphi','sigetaphi',#5
  'e_m2_m2','e_m2_m1','e_m2_p0','e_m2_p1','e_m2_p2',
  'e_m1_m2','e_m1_m1','e_m1_p0','e_m1_p1','e_m1_p2',
  'e_p0_m2','e_p0_m1','e_p0_p0','e_p0_p1','e_p0_p2',
  'e_p1_m2','e_p1_m1','e_p1_p0','e_p1_p1','e_p1_p2',
  'e_p2_m2','e_p2_m1','e_p2_p0','e_p2_p1','e_p2_p2',#^25
  'h_m1_m1','h_m1_p0','h_m1_p1',
  'h_p0_m1','h_p0_p0','h_p0_p1',
  'h_p1_m1','h_p1_p0','h_p1_p1',#^9
  'gen_match']

filename = 'data/L14/xtalTuple_TTbar_PU0.z'

h5file = h5py.File(filename, 'r') # open read-only
params = h5file[treename][()]

df = pd.DataFrame(params,columns=VARS)

TODROP = [
  'e2x2_div_e2x5', 'e2x2_div_e5x5', 'e2x5_div_e5x5',#7
  'e_m2_m2','e_m2_m1','e_m2_p0','e_m2_p1','e_m2_p2',
  'e_m1_m2','e_m1_m1','e_m1_p0','e_m1_p1','e_m1_p2',
  'e_p0_m2','e_p0_m1','e_p0_p0','e_p0_p1','e_p0_p2',
  'e_p1_m2','e_p1_m1','e_p1_p0','e_p1_p1','e_p1_p2',
  'e_p2_m2','e_p2_m1','e_p2_p0','e_p2_p1','e_p2_p2',#^25
  'h_m1_m1','h_m1_p0','h_m1_p1',
  'h_p0_m1','h_p0_p0','h_p0_p1',
  'h_p1_m1','h_p1_p0','h_p1_p1',#^9
]

df = df.drop(TODROP, axis=1) #remove custom variables

#normalize the shower shapes by energy
for ie in ['e2x2', 'e2x5', 'e3x5', 'e5x5']:
    df[ie] /= df['energy']

#add some labels
df['isPU'] = pd.Series(df['gen_match']==0, index=df.index, dtype='i4')
df['isEG'] = pd.Series(df['gen_match']==1, index=df.index, dtype='i4')

#now select the dataset based on their transverse momentum (pt)
MINPT = 0.5
MAXPT = 100.
df = df.loc[(df['pt']>MINPT) & (MAXPT>df['pt']) & (1.3>abs(df['eta']))]
df.fillna(0., inplace=True)

#take a fixed nubmer of events
df0 = df[df['gen_match']==0].head(100000)
df1 = df[df['gen_match']==1].head(10000)

df = pd.concat([df0, df1], ignore_index=True)
df = df.sample(frac=1).reset_index(drop=True)
col_names = list(df.columns)

#Now let's check it all
print(df)
print(sum(df['gen_match']==0))
print(sum(df['gen_match']==1))

In [None]:
#>>>RUN: L14.2-runcell02

col_names = list(df.columns)
print(col_names)

fig, axs = plt.subplots(len(col_names),1,figsize=(4,4*len(col_names)))
for ix,ax in enumerate(axs):
    ax.hist(df[col_names[ix]][df['gen_match']==0],bins=np.linspace(np.min(df[col_names[ix]]),np.max(df[col_names[ix]]),20),histtype='step',color='r',density=True)
    ax.hist(df[col_names[ix]][df['gen_match']==1],bins=np.linspace(np.min(df[col_names[ix]]),np.max(df[col_names[ix]]),20),histtype='step',color='b',density=True)
    ax.set_xlabel(col_names[ix])

plt.show()

In [None]:
#>>>RUN: L14.2-runcell03


dataset = df.values

X = dataset[:,4:-3]
#last 3 columns are labels
ninputs = len(list(df.columns))-3-4

Y = dataset[:,-1:]
#last column will be used for the label

test_frac = 0.3
val_frac = 0.2

alldataset = torch.utils.data.TensorDataset(torch.tensor(X, dtype=torch.float32), torch.tensor(Y, dtype=torch.float32))

torch.random.manual_seed(42) # fix a random seed for reproducibility
testdataset, trainvaldataset = torch.utils.data.random_split(
    alldataset, [int(len(Y)*test_frac),
              int(len(Y)*(1-test_frac))])

torch.random.manual_seed(42) # fix a random seed for reproducibility
traindataset, valdataset = torch.utils.data.random_split(
    trainvaldataset, [int(len(Y)*(1.-test_frac)*(1.-val_frac)),
              int(len(Y)*(1.-test_frac)*val_frac)])

testloader = torch.utils.data.DataLoader(testdataset,
                                          num_workers=6,
                                          batch_size=500,
                                          shuffle=False)
trainloader = torch.utils.data.DataLoader(traindataset,
                                          num_workers=6,
                                          batch_size=500,
                                          shuffle=True)
valloader = torch.utils.data.DataLoader(valdataset,
                                        num_workers=6,
                                        batch_size=500,
                                        shuffle=False)


In [None]:
#>>>RUN: L14.2-runcell04

class LR_net(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = torch.nn.Linear(ninputs,1)
        self.output = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.output(x)
        return x
        
torch.random.manual_seed(42)  # fix a random seed for reproducibility

model_lr = LR_net()
print(model_lr)
print('----------')
print(model_lr.state_dict())

<a name='exercises_14_2'></a>     

| [Top](#section_14_0) | [Restart Section](#section_14_2) | [Next Section](#section_14_3) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.2.1</span>

Consider the variables that you plotted in `L14.2-runcell02`. Among the following variables, which appear to have high discrimination power? In other words, in which plots is the background (red) distinguishable from the egamma (blue), meaning there is a relatively small overlap between the red and blue histograms? Select all that apply.

A) pt\
B) eta\
C) phi\
D) energy\
E) e2x2\
F) sihih


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.2.2</span>

How are training, validation, and testing data used in machine learning model development?

A) Training data is used to evaluate the model's performance, validation data is used to select the best hyperparameters, and testing data is used to train the model.\
B) Training data is used to train the model, validation data is used to tune the hyperparameters, and testing data is used to evaluate the model's performance.\
C) Training data is used to tune the hyperparameters, validation data is used to evaluate the model's performance, and testing data is used to train the model.\
D) All three datasets are used interchangeably to train, tune, and evaluate the model.

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.2.3</span>

In the one-layer neural network that we have defined in this section (`LR_net` from `L14.2-runcell04`), we are using 19 input features. How many weights does this neural network have? Enter your answer as an integer.

Extra: How is this different from the two-weight model we were using in the last Lesson?

<a name='section_14_3'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L14.3 Training and Testing the Network</h2>  

| [Top](#section_14_0) | [Previous Section](#section_14_2) | [Exercises](#exercises_14_3) | [Next Section](#section_14_4) |


In [None]:
#>>>RUN: L14.3-runcell01

criterion = torch.nn.BCELoss()
optimizer_lr = torch.optim.Adam(model_lr.parameters(), lr=0.003) 

history_lr = {'loss':[], 'val_loss':[]}

for epoch in range(20):

    current_loss = 0.0 #rezero loss
    
    for i, data in enumerate(trainloader):

        inputs, labels = data
        
        # zero the parameter gradients
        optimizer_lr.zero_grad()

        # forward + backward + optimize (training magic)
        # This will use the pytorch autograd feature to adjust the
        ## parameters of our function to minimize the loss
        outputs = model_lr(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer_lr.step()
        
        # add loss statistics
        current_loss += loss.item()
        
        if i == len(trainloader)-1:
            current_val_loss = 0.0
            with torch.no_grad():#disable updating gradient
                for iv, vdata in enumerate(valloader):
                    val_inputs, val_labels = vdata
                    val_loss = criterion(model_lr(val_inputs), val_labels)
                    current_val_loss += val_loss.item()
            print('[%d, %4d] loss: %.4f  val loss: %.4f' % 
                  (epoch + 1, i + 1, current_loss/float(i+1) , current_val_loss/float(len(valloader))))

            history_lr['loss'].append(current_loss/float(i+1))
            history_lr['val_loss'].append(current_val_loss/float(len(valloader)))
            
print('Finished Training')
torch.save(model_lr.state_dict(), 'data/L14/lr_model.pt')
print(model_lr.state_dict())

In [None]:
#>>>RUN: L14.3-runcell02

plt.semilogy(history_lr['loss'], label='loss')
plt.semilogy(history_lr['val_loss'], label='val_loss')
plt.legend(loc="upper right")
plt.xlabel('epoch')
plt.ylabel('loss (binary crossentropy)')
plt.show()

In [None]:
#>>>RUN: L14.3-runcell03

def train(model,trainloader,valloader,nepochs=100,lr=0.003,l2reg=0.,patience=5,name=None):

    criterion = torch.nn.BCELoss()
    
    #NOTE: l2 regularization is set to 0 by default,
    #but we will address this in later sections
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=l2reg) 

    history = {'loss':[], 'val_loss':[]}

    min_loss = 999999.
    min_epoch = 0
    min_model = model.state_dict()
    should_stop = False
    
    for epoch in range(nepochs):

        current_loss = 0.0 #rezero loss

        for i, data in enumerate(trainloader):

            inputs, labels = data

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            # This will use the pytorch autograd feature to adjust the
            ## parameters of our function to minimize the loss
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            current_loss += loss.item()

            if i == len(trainloader)-1:
                current_val_loss = 0.0
                with torch.no_grad():#disable updating gradient
                    model.eval() #place model in evaluation state
                                ## necessary for some layer types (like dropout)
                    for iv, vdata in enumerate(valloader):
                        val_inputs, val_labels = vdata
                        val_loss = criterion(model(val_inputs), val_labels)
                        current_val_loss += val_loss.item()
                    model.train() #return to training state
                current_loss = current_loss/float(i+1)
                current_val_loss = current_val_loss/float(len(valloader))
                print('[%d, %4d] loss: %.4f  val loss: %.4f' % 
                      (epoch + 1, i + 1, current_loss , current_val_loss))

                if current_val_loss < min_loss:
                    min_loss = current_val_loss
                    min_model = model.state_dict()
                    min_epoch = epoch
                elif epoch-min_epoch==5:
                    model.load_state_dict(min_model)
                    should_stop = True
                    break

                history['loss'].append(current_loss)
                history['val_loss'].append(current_val_loss)
                
            if should_stop:
                break

    print('Finished Training')
    if name is not None:
        filename_save = 'data/L14/' + name + '.pt'
        torch.save(model.state_dict(), filename_save)
    return history

In [None]:
history_lr = train(model_lr,trainloader,valloader,name='lr_model')

In [None]:
#>>>RUN: L14.3-runcell04

plt.semilogy(history_lr['loss'], label='loss')
plt.semilogy(history_lr['val_loss'], label='val_loss')
plt.legend(loc="upper right")
plt.xlabel('epoch')
plt.ylabel('loss (binary crossentropy)')
plt.show()

In [None]:
#>>>RUN: L14.3-runcell05

def apply(model, testloader):
    with torch.no_grad():
        model.eval()
        outputs = []
        labels = []
        for data in testloader:
            test_inputs, test_labels = data
            outputs.append(model(test_inputs).numpy())
            labels.append(test_labels.numpy())
        model.train()

        Y_test_predict = outputs
        Y_test = labels

    Y_test_predict = np.concatenate(Y_test_predict)
    Y_test = np.concatenate(Y_test)
    
    return Y_test_predict,Y_test

Y_test_predict_lr, Y_test = apply(model_lr, testloader)

print(Y_test_predict_lr.shape)
print(Y_test.shape)

In [None]:
#>>>RUN: L14.3-runcell06

plt.hist(Y_test_predict_lr[Y_test==0],histtype='step',color='r',density=True)
plt.hist(Y_test_predict_lr[Y_test==1],histtype='step',color='b',density=True)
plt.xlabel('Logistic Regression Discriminant')
plt.show()

<a name='exercises_14_3'></a>     

| [Top](#section_14_0) | [Restart Section](#section_14_3) | [Next Section](#section_14_4) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.3.1</span>

Which of the following features indicates that training is NOT performing successfully? Select all that apply:

A) The loss for the validation data differs significantly from that for the training data.\
B) The loss as a function of epoch flattens out for both data sets.\
C) The loss as a function of epoch is shifted slightly for the validation data compared to the training data.\
D) The loss as a function of epoch continues to decrease for the training data set, but remains constant for the validation data set.

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.3.2</span>

We can approximate the uncertainty on the loss by assuming that the number of events in both datasets follows a Poisson distribution. Using this concept, complete the code below to calculate the statistical disagreement between the training loss and validation loss. Specifically, write a function that returns the difference between the losses in terms of the number of standard deviations.

**Extra:** How significant is the difference after the last epoch? Try plotting this as a function of epoch!

**Note:** The related plot may yield wildly different results, depending on how your network runs. Here we focus on how to write the function, instead of analyzing the output of the plot.

In [None]:
#>>>EXERCISE: L14.3.2
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

def num_stdev(iNTrain,iNVal,loss1_array, loss2_array):
    #convert from list to array
    loss1_array = np.array(loss1_array)
    loss2_array = np.array(loss2_array)
    
    sigma_loss1 = #YOUR CODE HERE (the stdev of the training loss)
    sigma_loss2 = #YOUR CODE HERE (the stdev of the validation loss)
    
    #the combined uncertainty
    sigma_tot = np.sqrt(sigma_loss1**2. + sigma_loss2**2.) 
    
    #the difference in losses
    delta = loss2_array-loss1_array
    
    #calculate the difference in terms of number of standard deviations
    diff = abs(delta/sigma_tot) 
    
    return diff

#plot
#----------------------------------------------------
N_train = len(trainloader)*trainloader.batch_size
N_val   = len(valloader)*valloader.batch_size #the number of rows in the data set
diff_sig = num_stdev(N_train,N_val,history_lr['loss'], history_lr['val_loss'])
print("Significance of last epoch",diff_sig[-1])


plt.plot(np.arange(len(diff_sig)),diff_sig)
plt.xlabel("N-iteration")
plt.ylabel("(train-test)/$\sigma$")
plt.show()

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.3.3</span>

A common way to select events based on a particular final discriminator value from the neural network, is to make normalized histograms (i.e. histograms with the same integral), as is done in the previous examples, and then to select only events above the line where the signal and background histograms cross. For the histogram produced by code cell `L14.3-runcell06`, what fraction of egamma events (blue) or background (red) are above the bin of intersection (you should see this intersection occur at a value of `intersection_bin = 0.20`)?

Report your answer as a list of numbers with precision 1e-2: `[frac EG, frac PU]`


In [None]:
#>>>EXERCISE: L14.3.3
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

#determine fraction of events above intersection
intersection_bin = 0.20
EG_frac = #YOUR CODE HERE
PU_frac = #YOUR CODE HERE

print("EG", EG_frac)
print("PU:", PU_frac)

>#### Follow-up 14.3.3a (ungraded)
>
>Define a two-weight network, as we had in the last Lesson, and run the training of your two-weight network on the data. What do the resulting weights look like? Can you make a 1D histogram of the separation? Play with the smearing parameter, how do things change?
>
>**NOTE:** Be sure to label your classes, functions, and outputs differently. We will continue to use the results from above, so do not get your previous results confused with your results from this follow-up exercise!

In [None]:
#>>>EXERCISE: L14.3.3a
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

class LR_net_2(torch.nn.Module):
    #YOUR CODE HERE

        
model_lr_2 = LR_net_2()
print(model_lr_2)
print('----------')
print(model_lr_2.state_dict())


#-----------------
#TRAIN THE NETWORK
#YOUR CODE HERE

<a name='section_14_4'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L14.4 Adding a Hidden Layer</h2>     

| [Top](#section_14_0) | [Previous Section](#section_14_3) | [Exercises](#exercises_14_4) | [Next Section](#section_14_5) |


In [None]:
#>>>RUN: L14.4-runcell01

class MLP2_net(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = torch.nn.Linear(ninputs,30)
        self.act1 = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(30,10)
        self.act2 = torch.nn.ReLU()
        self.fc3 = torch.nn.Linear(10,1)
        self.output = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.act1(x)
        x = self.fc2(x)
        x = self.act2(x)
        x = self.fc3(x)
        x = self.output(x)
        return x
    
torch.random.manual_seed(42)  # fix a random seed for reproducibility

model_mlp_2layer = MLP2_net()
print(model_mlp_2layer)

In [None]:
#>>>RUN: L14.4-runcell02

history_mlp_2layer = train(model_mlp_2layer,trainloader,valloader,name='mlp_2layer_model')
Y_test_predict_mlp_2layer, Y_test = apply(model_mlp_2layer, testloader)

In [None]:
#>>>RUN: L14.4-runcell03

plt.semilogy(history_mlp_2layer['loss'], label='loss')
plt.semilogy(history_mlp_2layer['val_loss'], label='val_loss')
plt.legend(loc="upper right")
plt.xlabel('epoch')
plt.ylabel('loss (binary crossentropy)')
plt.show()

plt.hist(Y_test_predict_mlp_2layer[Y_test==0],histtype='step',color='r',density=True)
plt.hist(Y_test_predict_mlp_2layer[Y_test==1],histtype='step',color='b',density=True)
plt.xlabel('MLP (2 hidden layers) Discriminant')
plt.show()

In [None]:
#>>>RUN: L14.4-runcell04

print("Signal",len(Y_test_predict_mlp_2layer[Y_test==1][Y_test_predict_mlp_2layer[Y_test==1] > 0.10])/len(Y_test_predict_mlp_2layer[Y_test==1]))
print("Big:"  ,len(Y_test_predict_mlp_2layer[Y_test==0][Y_test_predict_mlp_2layer[Y_test==0] > 0.10])/len(Y_test_predict_mlp_2layer[Y_test==0]))

In [None]:
#>>>RUN: L14.4-runcell05

def compute_ROC(labels, predicts, npts=101):
    cutvals = np.linspace(0.,1.,num=npts)
    tot0 = float(len(labels[labels==0]))
    tot1 = float(len(labels[labels==1]))
    tpr = []
    fpr = []
    for c in cutvals:
        fpr.append(float(len(predicts[(labels==0) & (predicts>c)]))/tot0)
        tpr.append(float(len(predicts[(labels==1) & (predicts>c)]))/tot1)
    
    return np.array(fpr),np.array(tpr)

mlp_2layer_rocpts = compute_ROC(Y_test,Y_test_predict_mlp_2layer)
lr_rocpts = compute_ROC(Y_test,Y_test_predict_lr)

plt.plot(mlp_2layer_rocpts[0],mlp_2layer_rocpts[1],'g-',label="MLP (2 hidden layers)")
plt.plot(lr_rocpts[0],lr_rocpts[1],'m--',label="Logistic Regression")
plt.title("ROC (Receiver Operating Characteristic) Curve")
plt.xlabel("False Positive Rate (FPR) aka Background Efficiency")
plt.ylabel("True Positive Rate (TPR) aka Signal Efficiency")
plt.legend(loc="lower right")
plt.show()

<a name='exercises_14_4'></a>   

| [Top](#section_14_0) | [Restart Section](#section_14_4) | [Next Section](#section_14_5) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.4.1</span>

When we compare the performance of two algorithms, we often like to fix the signal efficiency and look at the change in the background rejection rate. For a fixed signal efficiency of 97%, what is the fractional reduction in the false-positive rate between the logistic and MLP networks (i.e. the difference between the two divided by the value for the logistic)? Report your answer as a number with precision 5e-2.

In [None]:
#>>>EXERCISE: L14.4.1
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.


def frac_reduc_fpr(lr_rocpts, mlp_2layer_rocpts, sig_eff=0.97):
    #false-positive-rate (background efficiency)
    #true-positive-rate (signal efficiency):
    lr_fpr = lr_rocpts[0]
    lr_tpr = lr_rocpts[1]
    mlp_2layer_fpr = mlp_2layer_rocpts[0]
    mlp_2layer_tpr = mlp_2layer_rocpts[1]
    
    #find lr_fpr where lr_tpr is closest to sig_eff
    lr_fpr_val = #YOUR CODE HERE
    
    #find mlp_2layer_fpr where lr_tpr is closest to sig_eff
    mlp_2layer_fpr_val = #YOUR CODE HERE
    
    #calculate the fractional reduction in false-positive-rate
    frac_red = #YOUR CODE HERE
    
    return frac_red

#find where the signal efficiency is 97%
#find difference in logistic vs. MLP
print("Fractional Reduction in FPR:",frac_reduc_fpr(lr_rocpts, mlp_2layer_rocpts, 0.97))

<a name='section_14_5'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L14.5 Regularization, Batch Normalization, and Dropout</h2>     

| [Top](#section_14_0) | [Previous Section](#section_14_4) | [Exercises](#exercises_14_5) |


In [None]:
#>>>RUN: L14.5-runcell01

class MLP3_net(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = torch.nn.Linear(ninputs,50)
        self.act1 = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(50,30)
        self.act2 = torch.nn.ReLU()
        self.fc3 = torch.nn.Linear(30,10)
        self.act3 = torch.nn.ReLU()
        self.fc4 = torch.nn.Linear(10,1)
        self.output = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.act1(x)
        x = self.fc2(x)
        x = self.act2(x)
        x = self.fc3(x)
        x = self.act3(x)
        x = self.fc4(x)
        x = self.output(x)
        return x
    
torch.random.manual_seed(42)  # fix a random seed for reproducibility
model_mlp_3layer = MLP3_net()
print(model_mlp_3layer)

In [None]:
#>>>RUN: L14.5-runcell02

# NOTE: we add the l2 regularization to the optimzer by setting l2reg=0.0001, which modifies the loss.
# The l2 regularization is already implemented in the train() function, with default value l2reg=0.

#you may choose to call the model again, if you have already run it
#model_mlp_3layer.load_state_dict(torch.load('mlp_3layer_model.pt'))

history_mlp_3layer = train(model_mlp_3layer,trainloader,valloader,l2reg=0.0001,name='mlp_3layer_model')
Y_test_predict_mlp_3layer, Y_test = apply(model_mlp_3layer, testloader)

In [None]:
#>>>RUN: L14.5-runcell03

mlp_2layer_rocpts = compute_ROC(Y_test,Y_test_predict_mlp_2layer)
mlp_3layer_rocpts = compute_ROC(Y_test,Y_test_predict_mlp_3layer)
lr_rocpts = compute_ROC(Y_test,Y_test_predict_lr)

plt.plot(mlp_2layer_rocpts[0],mlp_2layer_rocpts[1],'g-',label="MLP (2 hidden layers)")
plt.plot(mlp_3layer_rocpts[0],mlp_3layer_rocpts[1],'--',color='orange',label="MLP (3 hidden layers)")
plt.plot(lr_rocpts[0],lr_rocpts[1],'m--',label="Logistic Regression")
plt.title("ROC (Receiver Operating Characteristic) Curve")
plt.xlabel("False Positive Rate (FPR) aka Background Efficiency")
plt.ylabel("True Positive Rate (TPR) aka Signal Efficiency")
plt.legend(loc="lower right")
plt.show()

In [None]:
#>>>RUN: L14.5-runcell04

mlp_2layer_rocpts = compute_ROC(Y_test,Y_test_predict_mlp_2layer,101)
mlp_3layer_rocpts = compute_ROC(Y_test,Y_test_predict_mlp_3layer,101)
lr_rocpts = compute_ROC(Y_test,Y_test_predict_lr,101)

plt.plot(1./mlp_2layer_rocpts[0],mlp_2layer_rocpts[1],'g-',label="MLP (2 hidden layers)")
plt.plot(1./mlp_3layer_rocpts[0],mlp_3layer_rocpts[1],'--',color='orange',label="MLP (3 hidden layers)")
plt.plot(1./lr_rocpts[0],lr_rocpts[1],'m--',label="Logistic Regression")
plt.xlim([-1, 2500])
plt.title("ROC (Receiver Operating Characteristic) Curve")
plt.xlabel("1/(Background Efficiency)")
plt.ylabel("True Positive Rate (TPR) aka Signal Efficiency")
plt.legend(loc="upper right")
plt.show()

In [None]:
#>>>RUN: L14.5-runcell05

class MLP3_BN_net(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.bn0 = torch.nn.BatchNorm1d(ninputs)
        self.fc1 = torch.nn.Linear(ninputs,50)
        self.act1 = torch.nn.ReLU()
        self.bn1 = torch.nn.BatchNorm1d(50)
        self.fc2 = torch.nn.Linear(50,30)
        self.act2 = torch.nn.ReLU()
        self.bn2 = torch.nn.BatchNorm1d(30)
        self.fc3 = torch.nn.Linear(30,10)
        self.act3 = torch.nn.ReLU()
        self.bn3 = torch.nn.BatchNorm1d(10)
        self.fc4 = torch.nn.Linear(10,1)
        self.output = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.bn0(x)
        x = self.fc1(x)
        x = self.act1(x)
        x = self.bn1(x)
        x = self.fc2(x)
        x = self.act2(x)
        x = self.bn2(x)
        x = self.fc3(x)
        x = self.act3(x)
        x = self.bn3(x)
        x = self.fc4(x)
        x = self.output(x)
        return x
    
class MLP3_Drop_net(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = torch.nn.Linear(ninputs,50)
        self.act1 = torch.nn.ReLU()
        self.drop1 = torch.nn.Dropout(0.1)
        self.fc2 = torch.nn.Linear(50,30)
        self.act2 = torch.nn.ReLU()
        self.drop2 = torch.nn.Dropout(0.1)
        self.fc3 = torch.nn.Linear(30,10)
        self.act3 = torch.nn.ReLU()
        self.drop3 = torch.nn.Dropout(0.1)
        self.fc4 = torch.nn.Linear(10,1)
        self.output = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.act1(x)
        x = self.drop1(x)
        x = self.fc2(x)
        x = self.act2(x)
        x = self.drop2(x)
        x = self.fc3(x)
        x = self.act3(x)
        x = self.drop3(x)
        x = self.fc4(x)
        x = self.output(x)
        return x

torch.random.manual_seed(42)  # fix a random seed for reproducibility
model_mlp_3layer_bn = MLP3_BN_net()
print(model_mlp_3layer_bn)

torch.random.manual_seed(42)  # fix a random seed for reproducibility
model_mlp_3layer_drop = MLP3_Drop_net()
print(model_mlp_3layer_drop)

In [None]:
#>>>RUN: L14.5-runcell06

history_mlp_3layer_bn = train(model_mlp_3layer_bn,trainloader,valloader,name='mlp_3layer_bn_model')
Y_test_predict_mlp_3layer_bn, Y_test = apply(model_mlp_3layer_bn, testloader)

In [None]:
#>>>RUN: L14.5-runcell07

history_mlp_3layer_drop = train(model_mlp_3layer_drop,trainloader,valloader,name='mlp_3layer_drop_model')
Y_test_predict_mlp_3layer_drop, Y_test = apply(model_mlp_3layer_drop, testloader)

In [None]:
#>>>RUN: L14.5-runcell08

mlp_3layer_rocpts = compute_ROC(Y_test,Y_test_predict_mlp_3layer,101)
mlp_3layer_bn_rocpts = compute_ROC(Y_test,Y_test_predict_mlp_3layer_bn,101)
mlp_3layer_drop_rocpts = compute_ROC(Y_test,Y_test_predict_mlp_3layer_drop,101)

fig, (ax1, ax2) = plt.subplots(1,2,figsize=(12,4))

ax1.plot(mlp_3layer_rocpts[0],mlp_3layer_rocpts[1],'--',color='orange',label="MLP (3 hidden layers)")
ax1.plot(mlp_3layer_bn_rocpts[0],mlp_3layer_bn_rocpts[1],'--',color='brown',label="MLP (3 hidden layers w/ BN)")
ax1.plot(mlp_3layer_drop_rocpts[0],mlp_3layer_drop_rocpts[1],'--',color='cyan',label="MLP (3 hidden layers w/ Dropout)")
ax1.set_title("ROC (Receiver Operating Characteristic) Curve")
ax1.set_xlabel("Bkg Eff")
ax1.set_ylabel("Sig Eff")
ax1.legend(loc="lower right")

ax2.plot(1./mlp_3layer_rocpts[0],mlp_3layer_rocpts[1],'--',color='orange',label="MLP (3 hidden layers)")
ax2.plot(1./mlp_3layer_bn_rocpts[0],mlp_3layer_bn_rocpts[1],'--',color='brown',label="MLP (3 hidden layers w/ BN)")
ax2.plot(1./mlp_3layer_drop_rocpts[0],mlp_3layer_drop_rocpts[1],'--',color='cyan',label="MLP (3 hidden layers w/ Dropout)")
ax2.set_title("ROC (Receiver Operating Characteristic) Curve")
ax2.set_xlabel("1/Bkg Eff")
ax2.set_xlim([-1, 2500])
ax2.set_ylabel("Sig Eff")
ax2.legend(loc="upper right")

plt.show()

<a name='exercises_14_5'></a>   

| [Top](#section_14_0) | [Restart Section](#section_14_5) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.5.1</span>

Which of the following statements is true about regularization in machine learning?

A) Regularization is a technique used to increase the complexity of a model to improve its performance on new data.\
B) Regularization is a technique used to prevent overfitting by employing a variety of methods, including adding a penalty term to the loss function.\
C) Regularization is a technique used to reduce the size of the training data to improve generalization performance.\
D) Regularization is a technique used to randomly drop out neurons in a neural network during training.\
E) Regularization is a technique that normalizes the mean and variance of the activations of each layer in a neural network.


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.5.2</span>

For a fixed signal efficiency of 97%, what is the fractional reduction in rejection rate going from a 3 hidden layer model to ones adding either batch normalization or dropout? Complete the code below to do this calculation, then report your answer as a list of numbers with precision 1e-2: `[reduction with batch norm, reduction with dropout]`

**HINT:** Use the function that you defined previously.

In [None]:
#>>>EXERCISE: L14.5.2
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

def frac_reduc_fpr(array1, array2, sig_eff=0.97):
    #false-positive-rate (background efficiency)
    #true-positive-rate (signal efficiency):
    array1_fpr = array1[0]
    array1_tpr = array1[1]
    array2_fpr = array2[0]
    array2_tpr = array2[1]
    
    #find array_1_fpr where array_1_tpr is closest to sig_eff
    array1_fpr_val = #YOUR CODE HERE
    
    #find array_2_fpr where array_2_tpr is closest to sig_eff
    array2_fpr_val = #YOUR CODE HERE
    
    #calculate the fractional reduction in false-positive-rate
    frac_red = #YOUR CODE HERE
    
    return frac_red


#find where the signal efficiency is 97%
print("Fractional Reduction from 3Layer to Batch Norm :",frac_reduc_fpr(mlp_3layer_rocpts, mlp_3layer_bn_rocpts, 0.97))
print("Fractional Reduction from 3Layer to Dropout:",frac_reduc_fpr(mlp_3layer_rocpts, mlp_3layer_drop_rocpts, 0.97))

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Ex-14.5.3</span>

Try including both batch normalization *and* dropout. Do you get even better performance? Complete the code below to define and run this new model, then select the best answer from the following:

A) Yes, it's definitely better to combine both regularization methods.\
B) No, combining methods was worse than models using either one separately.\
C) The combined model was maybe better than one model, but not better than both.       

In [None]:
#>>>EXERCISE: L14.5.3
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

class MLP3_BN_Dropout_net(torch.nn.Module):
    #YOUR CODE HERE

    
torch.random.manual_seed(42)  # fix a random seed for reproducibility
model_mlp_3layer_bn_drop = MLP3_BN_Dropout_net()
print(model_mlp_3layer_bn)

history_mlp_3layer_bn_drop = train(model_mlp_3layer_bn_drop,trainloader,valloader,name='mlp_3layer_bn_drop_model',nepochs=100)
Y_test_predict_mlp_3layer_bn_drop, Y_test = apply(model_mlp_3layer_bn_drop, testloader)
mlp_3layer_bn_drop_rocpts = compute_ROC(Y_test,Y_test_predict_mlp_3layer_bn_drop,501)
    
#find where the signal efficiency is 97%
print("Fractional Reduction from Batch Norm to Combined Model:",frac_reduc_fpr(mlp_3layer_bn_rocpts, mlp_3layer_bn_drop_rocpts, 0.97))
print("Fractional Reduction from Dropout to Combined Model:",frac_reduc_fpr(mlp_3layer_drop_rocpts, mlp_3layer_bn_drop_rocpts, 0.97))