<CENTER><img src="images/logos.png" style="width:50%"></CENTER>

# Higgs Search 2: The H$\rightarrow$WW channel

The following analysis will aim to find a signal for the Higgs boson decaying to 2 W-bosons:

<br>

<CENTER><img src="https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/HIGG-2016-07/fig_01a.png" style="width:30%"></CENTER>

<br>

above its the largest background; Events coming from the SM WW-diboson background production from two quarks:

<CENTER><img src="https://cds.cern.ch/record/1484203/files/fig1b.png" style="width:30%"></CENTER>


<br>

Unlike the previous analysis, we can't do a bump-hunt when the invariant mass of the decay products is larger than the mass of the particle that's decaying ($m_H$=125 GeV and $m_W$=80 GeV x 2). Instead, we can do a  __non-resonant search__. This is done by histogramming some variable of the decay products (in this case a variable called *transverse mass*) for both data and Monte-Carlo simulations of the background, subtracting the backgrounds from the data (scaled for the fact we have many more simulated events than real events) before looking at the events, if any, left over indicating the presence of the Higgs.

<div class="alert alert-success">This analysis is inspired by the prompt on the Open Data website <a href="http://opendata.atlas.cern/release/2020/documentation/physics/DL2.html"> here</a>.</div>

**Contents:** <a name="c"></a>
- [Initial setup](#0.)
- [Reading in ROOT files](#1.)
- [Preparing histograms](#2.)
- [Selecting events and filling histograms](#3.)
- [Running your event loop](#4.)
- [Draw final plots ](#5.)
- [Optional extra exercises / 'Do your own project' ideas](#6.)
---

## 0. Initial setup <a name="0."></a>

### Importing libraries

Since this is a new notebook, we'll need to import the usual python libraries

In [None]:
import uproot
import hist
from hist import Hist
from TLorentzVector import TLorentzVector
import numpy as np
import matplotlib.pyplot as plt

### Defining functions

We'll also be reusing one of the helpful functions we wrote in Notebook 6...

In [None]:
def trackProgress(n,m):
    """
    Function which prints the event loop progress every m events 
    
    Parameters
    ----------
    n : Number of events processed so far
    
    m : Printout event interval
    
    """
    if n == 0:
        print("Event loop tracker")
        print("------------------")
    
    if(n%m==0):
        print("%d events processed" % n)

<div class="alert alert-info">Now, we follow very similar anaysis steps to Notebook 6!</div>

[Return to contents](#c)

---

## 1. Reading in ROOT files <a name="1."></a>

### Dilepton data

In [None]:
real = uproot.open("https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/2lep/Data/data_A.2lep.root")
dataTree = real["mini"]
print("Tree contains", len(dataTree["runNumber"].array()), "entries")

### qq$\rightarrow$WW background (MC)

These background events are not real data, they are created using simulations ("Monte-Carlo" in physicis-speak!).

In [None]:
bkg = uproot.open("https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/2lep/MC/mc_363492.llvv.2lep.root")
bkgTree = bkg["mini"]
print("Tree contains", len(bkgTree["runNumber"].array()), "entries") 

<div class="alert alert-info">Since many more of these simulated events are created than real data events are recorded, later on we will have to scale the size of the background to account for this.</div>

[Return to contents](#c)

---

## 2. Preparing histograms <a name="2."></a>

In [None]:
h_bgs = Hist(hist.axis.Regular(20, 60, 300, label = "Transverse mass m_{T}"))
h_dat = Hist(hist.axis.Regular(20, 60, 300, label = "Transverse mass m_{T}"))

dataTree.show()

[Return to contents](#c)

---

## 3. Selecting events and filling histograms  <a name="3."></a>

As before, we'll break our code up into separate __functions__ to make it more manageable and readable.

### 1) Rescaling the simulated events

When MC simulation is compared to data the contribution of each simulated event needs to be
    scaled ('reweighted') to account for differences in how some objects behave in simulation
    vs in data, as well as the fact that there are different numbers of events in the MC tree than 
    in the data tree.

In [None]:
def mcWeights(data,lumi=10):
    """
    When MC simulation is compared to data the contribution of each simulated event needs to be
    scaled ('reweighted') to account for differences in how some objects behave in simulation
    vs in data, as well as the fact that there are different numbers of events in the MC tree than 
    in the data tree.
    
    Parameters
    ----------
    tree : TTree entry for this event
    """
    
    XSection = data["XSection"]
    SumWeights = data["SumWeights"]
    #These values don't change from event to event
    norm = lumi*(XSection*1000)/SumWeights
    
    scaleFactor_ELE = data["scaleFactor_ELE"]
    scaleFactor_MUON = data["scaleFactor_MUON"]
    scaleFactor_LepTRIGGER = data["scaleFactor_LepTRIGGER"]
    scaleFactor_PILEUP = data["scaleFactor_PILEUP"]
    mcWeight = data["mcWeight"]
    #These values do change from event to event
    scale_factors = scaleFactor_ELE*scaleFactor_MUON*scaleFactor_LepTRIGGER*scaleFactor_PILEUP*mcWeight
    
    weight = norm*scale_factors
    return weight

### 2) Selecting 'good leptons'

W bosons do not show up directly in the ATLAS detector - they decay too quickly! Instead, we infer that they were there by the presence of their decay products -  one 'good quality' lepton and missing energy ('MET') from the non-interacting neutrino. 

In [None]:
def goodLeptons(data):
    """
    A function to return the indices of 'good leptons' (electrons or muons) in an event. This follows 
    many of the same steps as locateGoodPhotons() and photonIsolation() in Notebook 6.
    
    Parameters
    ----------
    tree : TTree entry for this event
    """
    
    #Initialise (set up) the variables we want to return
    goodlepton_index = [] #Indices (position in list of event's leptons) of our good leptons
    
    lep_n = data["lep_n"]
    ##Loop through all the leptons in the event
    for j in range(0,lep_n):
        lep_isTightID = data["lep_isTightID"][j]    
        ##Check lepton ID
        if(lep_isTightID):
            lep_ptcone30 = data["lep_ptcone30"][j]
            lep_pt = data["lep_pt"][j]
            lep_etcone20 = data["lep_etcone20"][j]
            #Check lepton isolation
            #Similar to photonIsolation() above, different thresholds
            if((lep_ptcone30 / lep_pt < 0.1) and 
               (lep_etcone20 / lep_pt < 0.1)):

                #Only central leptons 
                #Electrons and muons have slightly different eta requirements
                lep_type = data["lep_type"][j]
                lep_eta = data["lep_eta"][j]
                #Electrons: 'Particle type code' = 11
                if lep_type == 11:
                    #Check lepton eta is in the 'central' region and not in "transition region" 
                    if (np.abs(lep_eta) < 2.37) and\
                       (np.abs(lep_eta) < 1.37 or np.abs(lep_eta) > 1.52): 

                        goodlepton_index.append(j) #Store lepton's index

                #Muons: 'Particle type code' = 13
                elif (lep_type == 13) and (np.abs(lep_eta) < 2.5): #Check'central' region

                        goodlepton_index.append(j) #Store lepton's index


    return goodlepton_index #return list of good lepton indices

### Put it all together in the event loop!

In [None]:
def hWW(data,hist,mode):
    """
    Function which executes the analysis flow for the Higgs production cross-section measurement in the H->WW
    decay channel.
    
    Fills a histogram with mT(llvv) of events which pass the full set of cuts 
    
    Parameters
    ----------
    data : A Ttree containing data / background information
    
    hist : The name of the histogram to be filled with mT(llvv) values
    
    mode : A flag to tell the function if it is looping over 'data' or 'mc'
    """
    
    n = 0
    for event in data:
        
        #############################
        ### Event-level requirements
        #############################
    
        trackProgress(n,100000)
        n += 1
        
        #If event is MC: Reweight it
        if mode.lower() == 'mc': weight = mcWeights(event)
        else: weight = 1
            
        trigE = event["trigE"]
        trigM = event["trigM"]
        #If the event passes either the electron or muon trigger
        if trigE or trigM:
            
            ####Lepton preselections
            goodLeps = goodLeptons(event) #If the datafiles were not already filtered by number of leptons

            ###################################
            ### Individual lepton requirements
            ###################################

            if len(goodLeps) == 2: #Exactly two good leptons...
                lep1 = goodLeps[0] #INDICES of the good leptons
                lep2 = goodLeps[1]
                
                lep_type = event["lep_type"]
                if lep_type[lep1] != lep_type[lep2]: #... with different flavour
                    
                    lep_charge = event["lep_charge"]
                    if lep_charge[lep1] != lep_charge[lep2]: #... and opposite charge...
                        
                        lep_pt = event["lep_pt"]
                        if (lep_pt[lep1] > 22000) and (lep_pt[lep2] > 15000): #pT requirements
                            #Note: TTrees always sort objects in descending pT order
                            
                            lep_phi = event["lep_phi"]
                            if abs(lep_phi[lep1] - lep_phi[lep2]) < 1.8: #lepton separtion in phi 

                                #################################
                                ### Dilepton system requirements
                                #################################

                                #Initialse (set up) an empty 4 vector for dilepton system
                                dilep_four_mmtm = TLorentzVector()

                                #Loop through our list of lepton indices
                                for i in goodLeps:

                                    #Initialse (set up) an empty 4 vector for each lepton
                                    lep_i = TLorentzVector()
                                    
                                    lep_pt = event["lep_pt"][i]
                                    lep_eta = event["lep_eta"][i]
                                    lep_phi = event["lep_phi"][i]
                                    lep_E = event["lep_E"][i]
                                    #Retrieve the lepton's 4 momentum components from the tree
                                    lep_i.SetPtEtaPhiE(lep_pt, lep_eta, lep_phi, lep_E)

                                    #Store lepton's 4 momentum
                                    dilep_four_mmtm += lep_i
                                  
                                # Dilepton system pT > 30 GeV
                                if dilep_four_mmtm.Pt() > 30000:

                                    if (dilep_four_mmtm.M() > 10000) and (dilep_four_mmtm.M() < 55000):

                                        #####################
                                        ### MET requirements
                                        #####################
                                        
                                        met_et = event["met_et"]
                                        met_phi = event["met_phi"]
                                        #Initialse (set up) an empty 4 vector for the event's MET and fill from tree
                                        met_four_mom = TLorentzVector()
                                        met_four_mom.SetPtEtaPhiE(met_et,0,met_phi,met_et)

                                        #MET > 30 GeV
                                        if met_four_mom.Pt() > 30000:

                                            #Diffence in phi between the dilepton system and the MET < pi/2
                                            if abs(dilep_four_mmtm.Phi()-met_four_mom.Phi()) < 1.571:

                                                #####################
                                                ### Full llvv system
                                                #####################
                                                system_four_mom = dilep_four_mmtm + met_four_mom
                                                
                                                #Use the keyword weight to specify the weight of the evwnt
                                                hist.fill(system_four_mom.Mt()/1000, weight=weight)
                                            
                                        

[Return to contents](#c)

---

## 4. Running your event loop <a name="4."></a>

### Data array

Select the measurements you want in your data array

In [None]:
#Data
data = dataTree.arrays(["lep_ptcone30","lep_etcone20", "lep_isTightID", "lep_eta", "photon_phi", "lep_type",
                          "lep_n", "photon_E", "lep_E", "lep_pt", "trigP", "XSection", "SumWeights", "trigE", "trigM",
                          "scaleFactor_ELE", "scaleFactor_MUON", "scaleFactor_PILEUP", "scaleFactor_LepTRIGGER",
                        "mcWeight", "lep_charge","lep_phi", "met_et", "met_phi"])

Run your data loop

In [None]:
#Data
hWW(data,h_dat,'data')

<div class="alert alert-danger">Warning, this may take a long time!.</div>

Plot your data

In [None]:
h_dat.plot(histtype = "fill")
plt.show()

### Simulation array

Select the measurements you want in your simulation array

In [None]:
#MC
mcSim = bkgTree.arrays(["lep_ptcone30","lep_etcone20", "lep_isTightID", "lep_eta", "photon_phi", "lep_type",
                          "lep_n", "photon_E", "lep_E", "lep_pt", "trigP", "XSection", "SumWeights", "trigE", "trigM",
                          "scaleFactor_ELE", "scaleFactor_MUON", "scaleFactor_PILEUP", "scaleFactor_LepTRIGGER",
                        "mcWeight", "lep_charge","lep_phi", "met_et", "met_phi"])

Run your MC loop

In [None]:
#MC 
hWW(mcSim,h_bgs,'mc')

<div class="alert alert-danger">Warning, this may take a VERY long time!.</div>

Plot your MC backgrounds

In [None]:
#Look at MC histogram

h_bgs.plot(histtype = "fill")
plt.show()
h_bgs.sum()

[Return to contents](#c)

---

## 5. Draw final plots <a name="5."></a>

Draw the data and simulated background histograms on the same canvas

In [None]:
h_dat.plot(histtype = "fill")
h_bgs.plot(histtype = "fill")

plt.show()

Subtract the two histograms

In [None]:
h_diff = h_dat - h_bgs

Plot the subtracted histogram

In [None]:
h_diff.plot(histtype = "fill")
plt.show()

<div class="alert alert-success">This subtracted histogram is our measured Higgs signal!.</div>

[Return to contents](#c)

---

## Optional extra exercises / 'Do your own project' ideas  <a name="6."></a>

<div class="alert alert-info"> When completing these execises, it is recommended to copy/paste any code you're reusing from above into new cells, to keep the example available for reference.
<br>
    
New cells can be added above using `esc` + `a`, below using `esc` + `b`, or using the `Insert` tab at the top of the page.</div> 

a) This estimation of the strength of our Higgs signal is actually rather overoptimistic! This is because we are only accounting for one background (albeit the largest one). Another large background to the $H\rightarrow WW$ process comes from top quarks.
- Seach in the [Open Data repository](https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020) for some simiulated top quark production data.
- Initialise a histogram for your top data, apply your analysis cuts and fill the histogram
- Think of the best way to display your results on a histogram

<details>
    <summary>Click here for plotting hint 1: </summary>
    Sum your background histograms togther (similar to the subtraction) and adjust your plot legend, or..
</details>

<details>
    <summary>Click here for plotting hint 2: </summary>
    Recall hist's stacking ability...
</details>

b) The reason for imposing **cuts** in any analysis is to reduce backgrounds while keeping as many of our signal events as possible. We are now considering a new background, the production of top quarks.

Fortunately, processes involving top quarks are easy to spot, because the decay almost exclusively to **bjets** - showers of strongly interacting particles originating from a bottom quark.

- Take a look at the [prompt](http://opendata.atlas.cern/release/2020/documentation/physics/DL2.html) for this analysis on the Open Data website. You'll notice that there's one about jets that we've skipped.

    - Return to our $H\rightarrow WW$ analysis, and implement this cut to reduced the contribution from our newly-added top background.

<details>
    <summary>Click here for hint 1: </summary>
    Write a function similar to goodLeptons() that returns the indices of good jets in the event. For each jet in an event only keep it if:
        
    jet_pt > 30 GeV
</details>

<details>
    <summary>Click here for hint 2: </summary>
    Write a function similar to goodLeptons() that returns the indices of good bjets in the event. For each bjet in an event only keep it if:
    
    - jet_MV2c10 > 0.18    
    - jet_pt > 20 GeV
    
- MV2c10 is the ATLAS algorithm that tell's us how likely it is that a jet is a bjet. These kinds of algorithms are called __btaggers__.
</details>

<details>
    <summary>Click here for hint 3: </summary>
    Our btagger is another thing that performs differently on real data vs simulated MC data. This means it needs to be added to the multiplication in our mcWeights() function. 
    
The btagging scale factor changes from event-to-event, and is stored in a branch called `scaleFactor_BTAG`
</details>

<details>
    <summary>Click here for hint 4: </summary>
    
In our main event loop, only keep an event if it has:
    - Less than 2 good-jets
    - 0 good-bjets 
        
</details>

[Return to contents](#c)

---

<div class="alert alert-success">

__Congratulations!__ You've successfully completed ALL the notebooks! Very few make it this far, you should be proud of yourself! It's time to take your new scientific skills solo and start doing your own research - use the tips in the blue box below, or any of the Extra exercises / 'Do your own project' prompts from the notebooks as inspiration.
    
Well done!
</div>

<div class="alert alert-info">The two analyses presented in this notebooks are inspired by the <a href="http://opendata.atlas.cern/release/2020/documentation/physics/intro.html"> prompts</a> on the ATLAS Open Data website. 
- This is a great place to start for ideas for your own research!