# Part 4 project: Reconstructing the mass of the Higgs boson

This project follows to the knowledge acquired during the Part 4 of the tutorial. The goal is to produce a histogram and estimate the mass of the Higgs boson from simulated experimental data by the ATLAS experiment at CERN.

Let's first import some needed libraries:

In [None]:
# Import basic libraries
import copy # copy variables
import os   # manage paths
import numpy as np  # numerical operations
from scipy import stats # statistical functions
import uproot   # use of root files
import awkward as ak    # nested, variable sized data
import vector   # lorentz vectors
vector.register_awkward()
import matplotlib.pyplot as plt # plotting
import matplotlib as mpl # plotting
import tqdm # progress bars
import atlasopenmagic as atom # ATLAS Open Data package

The Higgs boson is extremely short lived (about $10^{-22}$ seconds) and immediately decays to other particles. For this scenario we will study the decays of the Higgs boson to two other elementary particles, the Z bosons.

In their turn, the Z bosons are extremely short lived as well. However, at about 3% of the time, they decay to two electrons. Electrons are also elementary particle but stable ones! Therefore, we will look at the detector for electron signatures. In particular, for **four electrons**.

<img src="../img/fig_01d.png" width="400"/>

The dataset we will use comes from the 2024 ATLAS Open Data for Research release and contains simulated event data describing the production of the Higgs boson via a gluon fusion and its decay to four leptons via a pair of Z bosons. As shown on the Feynman diagram above.

The dataset we look for is named:  
`mc20_13TeV_MC_PowhegPythia8EvtGen_NNLOPS_nnlo_30_ggH125_ZZ4l`  
and can be found at DOI:[10.7483/OPENDATA.ATLAS.Z2J9.709J](https://opendata.cern.ch/record/80012).

Using the dataset description on its name (e.g. `PowhegPythia8EvtGen_NNLOPS_nnlo_30_ggH125_ZZ4l`) we can look at the [Metadata Table](https://opendata.atlas.cern/docs/data/for_research/metadata#metadata-table) and identify its Dataset ID (DSID) being `345060`.

We will use this DSID to get a list the files of the dataset using `atom`.

In [None]:
urls_sample = atom.get_urls(345060, protocol='https')
urls_sample

Let's use a single file for the moment to speed things up.

In [None]:
filename = urls_sample[0]
filename

Open the tree we're interested in and list the branches within it. Of course, one can better inspect the file content here: https://atlas-physlite-content-opendata.web.cern.ch/.

In [None]:
tree = uproot.open({filename: "CollectionTree"})
tree.show(name_width=50, typename_width=50, interpretation_width=50)

First, we want to create a similar structure as previously. This time we care only about electrons and muons.

Create a structure including the following variables:
* Electrons
  * pt
  * eta
  * phi
  * m
  * charge
* Muons
  * pt
  * eta
  * phi
  * charge

Then you can `zip` those structures together to create the higher level `events` structure.

In [None]:
# Your code goes here

We will apply a similar object selection logic as previously:
- Lepton $p_T$ greater than 20 GeV.
- Lepton $\eta$ within a certain range ($|\eta| < 2.47$).

Define the appropriate functions:

In [None]:
# Your code goes here

Next we should process the events and apply the selection criteria. Those are:
* Exactly 4 electrons.
* Exactly 0 muons.
* The sum of the electrons charge is zero.

Define the appropriate function:

<details>
<summary><b>Hint</b></summary>
It might help you looking at the <code>processed</code> function on the lecture material.
</details>

In [None]:
# Your code goes here

How many events we have before and after the selection process?

In [None]:
# Your code goes here

Finally, make the combinations, reconstruct the Higgs candidate and plot its mass in GeV units.

In [None]:
# Your code goes here

What is the estimated mass of the reconstructed Higgs boson?

<details>
<summary><b>Hint</b></summary>

You can use some <a href="https://en.wikipedia.org/wiki/Wisdom_of_the_crowd">Wisdom of the Crowd</a>, as Francis Galton did in 1906 in Plymouth.

</details>

In [None]:
# Your code goes here

## Bonus exercise

Can you setup a routine to actually process all the events of the dataset?

You will need to open a file, do the calculation, store the result and repeat on the next file and so on.

How many events you have processed now?

In [None]:
# Your code goes here