<CENTER><img src="images/logos.png" style="width:50%"></CENTER>

# Searching for the Higgs boson


Below are some Feyman diagrams of ways the Higgs boson can be produced according to the Standard Model:

<br>

<CENTER><img src="https://cds.cern.ch/record/2243593/files/Figures_FeynmanHprod.png" style="width:50%"></CENTER>

Just as the Higgs can be produced in several ways, the Higgs can also decay in a number of ways or 'channels'. In this Notebook 6, and the following Notebook 7, we will be searching for the Higgs in two different decay channels.

<div class="alert alert-success">To understand the background to finding the Higgs boson, watch <a href="https://youtu.be/1nHYs-qUymo"> this </a> RAL video by Dr Kristian Harder about the Higgs and the LHC.</div>



### Before we begin:

<div class="alert alert-info">
   
- Some of the datafiles used in this notebook have millions of events - Don't be surpised if running certain event loops takes up to 10 mins or longer!

- This notebook is designed to give you an idea of how a real physics analysis is set up. Study it carefully to help with your own research!
</div>

<div class="alert alert-success">
    
The two analyses presented in this and the following notebook are inspired by the <a href="http://opendata.atlas.cern/release/2020/documentation/physics/intro.html"> prompts </a> on the ATLAS Open Data website. 

This is a great place to start for ideas for your own research!
</div> 


# Search 1: The H$\rightarrow \gamma \gamma$ channel

<CENTER><img src="./images/higgsFD.png" style="width:30%"></CENTER>

<br>
  
One of the ways the Higgs can decay is to two photons. We call this channel __H&#8594;&gamma;&gamma;__ ("Higgs to gamma gamma").


Of course, there are other ways two photons can be made in the LHC, but if we look at the entire range of invariant masses of these two photons, we should expect there to be more of them around 125 GeV, the mass of the Higgs ("bump hunting").

<div class="alert alert-success">This analysis is inspired by the prompt on the Open Data website <a href="http://opendata.atlas.cern/release/2020/documentation/physics/YY.html"> here </a>.</div>

**Contents:** <a name="c"></a>
- [ Reading in ROOT files](#0.)
- [Preparing histograms ](#1.)
- [Selecting events and filling histograms ](#2.)
- [Draw plots](#3.)
- [Over to you!](#4.)
- [One more thing we can do](#5.)
---

## 1. Reading in ROOT files <a name="0."></a>

First, import the usual libraries

In [None]:
import uproot
import hist
from hist import Hist
from TLorentzVector import TLorentzVector

Then in the usual way:

1. Open our datafile containing two photon ("diphoton" or $\gamma\gamma$) events 

2. Retrieive the TTree storing the data and 

3. Get the tree entries 


<div class="alert alert-info">If you need a reminder on how to do this, refer back to the histogramming tutorial in Notebook 3.</div>

### a) Open Root File

Using `uproot`, `uproot.open()` a sample of diphoton data, stored at 

`https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/GamGam/Data/data_A.GamGam.root`

 

In [None]:
f = uproot.open("https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/GamGam/Data/data_A.GamGam.root")

### b) Retrieve TTree

Get the TTree named `mini` from the ROOT file `f`

In [None]:
tree = f["mini"]

### c) Show the branches of the TTree data

`show()` what is stored in the TTree 

In [None]:
tree.show()

[Return to contents](#c)

---

## 2. Preparing histograms <a name="1."></a>

As in previous examples, before we are ready to fill our histograms we need to define the settings of the histograms we intend to draw

### Set up histogram 

Set up a  histogram to be filled with our __diphoton invariant masses__. Divide the histogram into $30$ bins between $105$ and $160 \; GeV$.

<details>
    <summary>Click for a hint: </summary>
    Remember how we separate bins, limits and labels in the .axis.Regular() function?
</details>

In [None]:
my_hist = ####

<details>
    <summary>Answer: </summary>
    
    my_hist = Hist(hist.axis.Regular(30, 105, 160))
</details>

[Return to contents](#c)

---

## 3. Selecting events and filling histograms <a name="2."></a>

Our strategy, for filling our diphoton invariant mass histogram is as follows:

1. Loop through each `event` in our `tree`. We can print out how many events have been processed every 100000 events to keep track of progress.

2. In each event, search for _'good quality photons'_ (more on this later).

3. If there are exactly two qood quality photons, check that they are _'well isolated'_ (again, more later).

4. If the two photons are well-isolated, extract their 4 momentum from the the $p_\rm{T}$, $\eta$, $\phi$ and energy TTree branches, and store in a TLorentz vector. Make sure to convert their transverse momentum ($p_\rm{T}$) and energy ($E$) from MeV, as is stored in the TTree, to GeV as will be displayed in the histogram.

5. Add the TLorentz vectors of the two photons together

6. Calculate the invariant mass of our two-photon system

6. Check each photon makes up a minimum fraction of the diphoton system invariant mass

8. Fill the histogram with the invariant mass of our two-photon system

<div class="alert alert-info">To simplify our code, we will be writing some custom functions to perform each of the above operations.</div>

### 1) Loop tracker

In [None]:
def trackProgress(n,m):
    """
    Function which prints the event loop progress every m events 
    
    Parameters
    ----------
    n : Number of events processed so far
    
    m : Printout event interval
    
    """
    if n == 0:
        print("Event loop tracker")
        print("------------------")
    
    if(n%m==0):
        print("%d events processed" % n)

### 2) Photon quality

In [None]:
def locateGoodPhotons(dat):
    """
    Function which returns the index of photons in the event which pass our quality requirements.
    These are:
        - Event passes photon trigger
        - Photon is identified as such, passing 'Tight' requirements 
            - This means we are very sure our photon is indeed a photon, but we might lose some photons that are 
              less obvious in the process. The opposite of this is the 'Loose' requirement, where we are less
              sure that our photon is a photon, but we are less likely to miss real ones .
        - Photon has pT > 25 GeV (or 25000 MeV)
        - Photon is in the 'central' region of ATLAS i.e. it has |eta| < 2.37
        - Photon does not fall in the 'transition region' between ATLAS's inner detector barrell
          and ECal endcap i.e. 1.37 <= |eta| <= 1.52
          
    Parameters
    ----------
    dat : array from TTree for this event
    
    """
    
    ## Checking the event passes the photon trigger
    #trigP = tree["trigP"]
    trigP = dat["trigP"]
    if trigP == True:
        
        # Initialise (set up) the variables we want to return
        goodphoton_index = [] #Indices (position in list of event's photons) of our good photons
            
        ## Loop through all the photons in the event
        photon_n = dat["photon_n"]
        for j in range(0,photon_n):
            
            ## Check photon ID
            photon_isTight = dat["photon_isTightID"][j]
            if(photon_isTight):
                photon_pt = dat["photon_pt"][j]
                # Check photon has a large enough pT
                if (j==0 and photon_pt > 35000) or (j==1 and photon_pt > 25000):
                    photon_eta = dat["photon_eta"][j]
                    # Check photon eta is in the 'central' region
                    if (abs(photon_eta) < 2.37):
                  
                      # Exclude "transition region" between ID barrell and ECal endcap
                      if (abs(photon_eta) < 1.37 or abs(photon_eta) > 1.52):

                        goodphoton_index.append(j) # Store photon's index
                    
        return goodphoton_index # Return list of good photon indices

### 3) Photon isolation

In [None]:
def photonIsolation(dat,photon_indices):
    """
    Function which returns True if all photons are well-isolated, otherwise returns false.
    
    A photon is considered 'isolated' if the transverse momentum and transverse energy in the detector, within 
    a particular radius around the photon (variables called 'ptcone30' and 'etcone20'), is below a certain threshold compared to the photon's 
    transverse momentum (don't worry too much about the details!).
    
    Parameters
    ----------
    dat : array from TTree for this event
    
    photon_indices : List containing the indices in the TTree of our photons of interest
    
    """
    
    # Loop through our list of photon indices
    for i in photon_indices:
        photon_ptcone30 = dat["photon_ptcone30"][i]
        photon_pt = dat["photon_pt"][i]
        photon_etcone20 = dat["photon_etcone20"][i]
        
        # If each photon passes isolation requirements...
        if((photon_ptcone30 / photon_pt < 0.065) and 
           (photon_etcone20 / photon_pt < 0.065)):
            continue #...keep the loop going 
        
        # If any fail, break the loop and return False
        else: 
            return False
    
    # If the loop is able to finish, i.e. all photons are well-isolated, return True
    return True

### 4) Extracting four-momentum

In [None]:
def photonFourMomentum(dat, photon_indices):
    """
    Function which returns the 4 momenta of a list of photons in an event as a list of TLorentzVectors
    
    Parameters
    ----------
    dat : array from TTree for this event
    
    photon_indices : List containing the indices in the TTree of our photons of interest
    
    """
    
    photon_four_momenta = []
    
    # Loop through our list of photon indices
    for i in photon_indices:
    
        # Initialse (set up) an empty 4 vector for each photon
        Photon_i = TLorentzVector()
    
        photon_pt = dat["photon_pt"][i]
        photon_eta = dat["photon_eta"][i]
        photon_phi = dat["photon_phi"][i]
        photon_E = dat["photon_E"][i]
        # Retrieve the photon's 4 momentum components from the tree
        # Convert from MeV to GeV where needed by dividing by 1000
        Photon_i.SetPtEtaPhiE(photon_pt/1000., photon_eta, photon_phi, photon_E/1000.)
        
        # Store photon's 4 momentum
        photon_four_momenta.append(Photon_i)
        
        
    return photon_four_momenta

### 5) Sum the 4 momenta of each photon in the event

In [None]:
def sumFourMomentum(four_momenta):
    """
    Function which sums a list of four-momenta, and returns the resultant four-momentum of the system
    
    Parameters
    ----------
    four_momenta : List of TLorentzVectors containing the four-momentum of each object in the system
    
    """
    
    # Initialise (set up) TLorentzVector for our momentum sum
    four_mom_sum = TLorentzVector()
    
    for obj in four_momenta:
        four_mom_sum += obj
        
    return four_mom_sum

### Putting it all together!

In [None]:
#Select the measurements you want in your array

sel_events = tree.arrays(["photon_ptcone30","photon_etcone20", "photon_isTightID", "photon_eta", "photon_phi",
                          "photon_n", "photon_E", "photon_pt", "trigP"])

n = 0
for event in sel_events:
       
    #1) Loop progress tracking: Print progress every 100,000 events
    trackProgress(n,100000)
    n += 1
    
    #2) Identify exactly two 'good quality photons'
    goodphoton_indices = locateGoodPhotons(event)
    if len(goodphoton_indices) == 2:
        
        #3) Check our good quality photons are well-isolated
        photons_are_isolated = photonIsolation(event, goodphoton_indices)
        
        if photons_are_isolated:
        
            #4) Convert 4-momentum from MeV to GeV
            photon_four_momenta = photonFourMomentum(event, goodphoton_indices)
            
            #5) Add the 4-momenta together
            Photon_12 = sumFourMomentum(photon_four_momenta)
            
            #6) Calculate the diphoton invariant mass
            inv_mass = Photon_12.M() #Calculated invariant mass
            
            photon_pt = event["photon_pt"]
            #7) Check each photon makes up a minimum fraction of the diphoton system invariant mass
            if ((photon_pt[0]/inv_mass) > 0.35) and ((photon_pt[1]/inv_mass) > 0.25):
                
                #8) Fill histogram with invariant mass
                my_hist.fill(inv_mass)
                

[Return to contents](#c)

---

## 4. Draw plots <a name="3."></a>

Finally, we would like to draw our diphoton invariant mass histograms and display the canvas showing our results. 

For this study, we would also like to plot the __error bars__ for each bin to illustrate the (statistical) __uncertainties__ on our measurement. This is done by default when drawing the _histogram_.

In [None]:
import matplotlib.pyplot as plt

my_hist.plot()
plt.show()

### Some questions to think about...

<div class="alert alert-warning">1. Can we say we have 'found' the Higgs based on these histograms alone? Why/why not?</div>
<br>
<details>
    <summary>Click here for hint: </summary>
    To have "found" the Higgs, we would need to see an obvious bump above the background at 125 GeV, the mass of the Higgs. Do we see this in the histogram we've just made?
</details>

<div class="alert alert-warning">2. What steps could we take to make our search for the Higgs more robust?</div>
<br>
<details>
    <summary>Click here for hint: </summary>
    Take a look in the directory containing our diphoton data. Are there more files available? What would be the effect of adding more files to the analysis?
    
    https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/GamGam/Data/
</details>

[Return to contents](#c)

---

## 5. Over to you! <a name="4."></a>

<details>
    <summary><div class="alert alert-warning">When you have thought about your answers to the questions above, click here to reveal your execise:</div> </summary><br>
    Repeat the above analysis, this time using all four diphoton datafiles, each filling the same histogram. Is the bump clearer now?
</details>

<div class="alert alert-info"> When completing this execise, it is recommended to copy/paste any code you're reusing from above into new cells, to keep the example available for reference.
<br>
    
New cells can be added above using `esc` + `a`, below using `esc` + `b`, or using the `Insert` tab at the top of the page.</div> 

<div class="alert alert-info">If you are having trouble, click the boxes below for help with the individual steps</div>

<details>
    <summary> - Loading in multiple ROOT files </summary>
    
    fa = uproot.open("https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/GamGam/Data/data_A.GamGam.root")
    fb = uproot.open("https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/GamGam/Data/data_B.GamGam.root")
    fc = uproot.open("https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/GamGam/Data/data_C.GamGam.root")
    fd = uproot.open("https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/GamGam/Data/data_D.GamGam.root")

    my_files = [fa,fb,fc,fd]
</details>

<details>
    <summary>- Setting up fresh histograms: </summary>
    
    my_hist2 = Hist(hist.axis.Regular(30, 105, 160))
    my_hist3 = Hist(hist.axis.Regular(30, 105, 160))
    my_hist4 = Hist(hist.axis.Regular(30, 105, 160))
</details>

<details>
    <summary>- Fill a histogram for each data file. </summary>

#This is ONLY ONE way to do it. We can fill a histogram for each data file. Then add all the histograms together. We have already run over the file https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/GamGam/Data/data_A.GamGam.root above, (this is fa now, we don't need to run over this). We now need to run over fb, fc and fd. This example runs over fb and fills the histogram my_hist2. You will need to run over fc and fd as well, filling histograms my_hist3 and my_hist4 respectively. This will take quite a while. If you don't have time, fill the histograms you can
    
     
    tree2 = fb["mini"]
    
    sel_events = tree2.arrays(["photon_ptcone30","photon_etcone20", "photon_isTightID", "photon_eta", "photon_phi","photon_n", "photon_E", "photon_pt", "trigP"])

    #Now repeat the analysis loop from above, making sure to change the name of the tree and the histogram
    n = 0
    for event in sel_events:

        #1) Loop progress tracking: Print progress every 100,000 events
        trackProgress(n,100000)
        n += 1

        #2) Identify exactly two 'good quality photons'
        goodphoton_indices = locateGoodPhotons(event)

        if len(goodphoton_indices) == 2:

            #3) Check our good quality photons are well-isolated
            photons_are_isolated = photonIsolation(event, goodphoton_indices)

            if photons_are_isolated:

                #4) Convert 4-momentum from MeV to GeV
                photon_four_momenta = photonFourMomentum(event, goodphoton_indices)

                #5) Add the 4-momenta together
                Photon_12 = sumFourMomentum(photon_four_momenta)

                #6) Calculate the diphoton invariant mass
                inv_mass = Photon_12.M() #Calculated invariant mass
                photon_pt = event["photon_pt"]

                #7) Check each photon makes up a minimum fraction of the diphoton system invariant mass
                if ((photon_pt[0]/inv_mass) > 0.35) and ((photon_pt[1]/inv_mass) > 0.25):

                    #8) Fill histogram with invariant mass
                    my_hist2.fill(inv_mass)   
    

    
#Feel free to add extra printouts (like above) to help with progress tracking
</details>

<details>
    <summary>- Adding and drawing the new histogram </summary>

#Let's add a few extra formatting options as we go:
    
    
    # If you didn't fill all the histograms, add the histograms you filled
    final_hist = my_hist + my_hist2 + my_hist3 + my_hist4
    
    final_hist.plot(histtype = "fill")
    plt.title("Diphoton invariant mass")
    plt.show()
    
</details>

[Return to contents](#c)

---

## 6.  One more thing we can do <a name="5."></a>

$H\rightarrow\gamma\gamma$ is a rare event, and its signal can be difficult to see over the background of two photons being produced in other ways (as your plots above show!). One way we can make this easier to see is to make a predicition for what this background looks like.

Here we can do a **data-driven** estimate of the background, by fitting the data with a cubic function, the shape we're assuming for the distribution of background diphoton events.

To do this, we can use hist's `plot_ratio` function. This function enables us to pass a function you want to fit to the histogram. The function is then fitted to the histogram and the ratio of the histogram and the fitted function is also shown. In the example below, we first define the function we want to fit. This is a third-order polynomial i.e. a cubic function.

In [None]:
#Define a function that is a third order polynomial (background) plus a Gaussian distribution (signal)

def fit_function(x, a, b, c,):
    background = (a * x) +  (b * x * x) + (c * x * x * x)
    return (background)

Now draw our fit on the same histogram data (now shown as points with error bars) using the `plot_ratio` method.

In [None]:
fig = plt.figure(figsize=(10, 8))
main_ax_artists, sublot_ax_arists = my_hist.plot_ratio(fit_function)

You can also use the options available to the method `plot_ratio` to change the appearence of the plot

In [None]:
fig = plt.figure(figsize=(10, 8))
main_ax_artists, sublot_ax_arists = my_hist.plot_ratio(
    fit_function,
    eb_ecolor="black",
    eb_mfc="black",
    eb_mec="black",
    eb_fmt="o",
    fp_c="red",
    fp_ls="-",
    fp_lw=2,
    fp_alpha=0.8,
)

<div class="alert alert-warning">Is the bump at 125 GeV easier to see now? Compare your plot to the figure from the full ATLAS below, which uses much more data.</div>

<CENTER><img src="images/Higgs.png" style="width:50%"></CENTER>

<details>
    <summary>Answer: </summary>
    Even with much more data, the bump is small! However, sometimes a small bump is all we need - this amount of extra events over the expected background was enough for the Higgs to be declared discovered in 2012, winning Peter Higgs and François Englert, the scientists who first invented its theory, the Nobel Prize.
</details>



[Return to contents](#c)

---

<div class="alert alert-success">

__Congratulations!__ You've made it through one of the hardest notebooks and sucessfully discovered the Higgs, a feat we didn't accomplish at CERN until 2012! You're definitely able for one more Higgs hunting challenge - see you in Notebook 7!
</div>