# Exploration of the $gg \to (h^{\ast} \to) ZZ \to 4\ell$ dataset

In this tutorial, we will take some time to explore & understand the phenomenology of the off-shell Higgs production.

In [1]:
# import the packages

import pandas as pd
import numpy as np
import vector
import hist

from physics.simulation import mcfm
from physics.analysis import zz4l, zz2l2v
from physics.hstar import sigstr
from nsbi import carl

import matplotlib, matplotlib.pyplot as plt

## 1. Open the dataset

Use the `mcfm.from_csv(...)` function to open three datasetsv generated according to the different hypotheses:

1. Signal-only, $|\mathcal{M}_{gg \to h^{\ast} \to ZZ}|^2$.
2. Background-only, $|\mathcal{M}_{gg \to h^{\ast} \to ZZ}|^2$.
3. Signal-background-interference, $|\mathcal{M}_{gg \to h^{\ast} \to ZZ}|^2$.

The most complete set of features that represents an event in terms of experimental observables, which will be denoted $x$ , is the four-momenta of each of the four leptons, ordered by their $p_{\rm T}$.

$$ x \equiv (E_\ell, {\bf p}_{\ell}),\, \ell = 1,2,3,4$$

Events corresponding to each hypothesis and their lepton kinematics have been prepared for you to open. Additionally, associated with each event is also a real-valued *weight* that corresponds to the value of the differential cross-section of the event:

$$ \frac{d\sigma}{dx} = p(x) \sigma $$


In [2]:
features_4l = ['l1_pt', 'l1_eta', 'l1_phi', 'l1_energy', 'l2_pt', 'l2_eta', 'l2_phi', 'l2_energy', 'l3_pt', 'l3_eta', 'l3_phi', 'l3_energy', 'l4_pt', 'l4_eta', 'l4_phi', 'l4_energy']
events_sig = mcfm.from_csv(file_path = '/ptmp/mpp/taepa/higgs-offshell-interpretation/data/zz4l/ggZZ_sig/analyzed.csv', kinematics = features_4l, n_rows=1e5)
events_bkg = mcfm.from_csv(file_path = '/ptmp/mpp/taepa/higgs-offshell-interpretation/data/zz4l/ggZZ_bkg/analyzed.csv', kinematics = features_4l, n_rows=1e5)
events_sbi = mcfm.from_csv(file_path = '/ptmp/mpp/taepa/higgs-offshell-interpretation/data/zz4l/ggZZ_sbi/analyzed.csv', kinematics = features_4l, n_rows=1e5)

## 1. Basic histogramming

A binned visualzation of this can be performed using histograms, as should be familiar to all of us in HEP. Use the `hist` package to:

1. Define a histogram with 20 bins from $0 \leq p_{\rm T} < 200\, \rm GeV$.
2. Fill it with the leading lepton $p_{\rm T}^{\ell_1}$ as the observable, with weights.

In [5]:
l1pt_axis = hist.axis.Regular(20, 0, 200, label = 'l1pt')
h_l1pt_sig = hist.Hist(l1pt_axis)
h_l1pt_bkg = hist.Hist(l1pt_axis)
h_l1pt_sbi = hist.Hist(l1pt_axis)

h_l1pt_sig.fill( events_sig.kinematics['l1_pt'], weight = events_sig.weights )
h_l1pt_bkg.fill( events_bkg.kinematics['l1_pt'], weight = events_bkg.weights )
h_l1pt_sbi.fill( events_sbi.kinematics['l1_pt'], weight = events_sbi.weights )

Recall that the Lorentz invariant mass of a four-momentum is given by.

$$
m = \sqrt{E^2 - |{\bf{p}}|^2}.
$$

The four-vector $p$ can represent that of any of the 4 leptons, or that of the entire 4-lepton system:

$$
    m_{4\ell} = \sqrt{\left(\sum_i E_i\right)^2 - \left|\sum_i {\bf p_i}\right|^2}
$$

Let's compute this quantity using the convenient four-vector arithmetics provicded by the `vector` package. You can check your results with the correct values already saved in the event kinematics.


In [6]:
def m4l(kinematics):
    p_l1 = vector.array({'pt': kinematics['l1_pt'], 'eta': kinematics['l1_eta'], 'phi': kinematics['l1_phi'], 'energy': kinematics['l1_energy']})
    p_l2 = vector.array({'pt': kinematics['l2_pt'], 'eta': kinematics['l2_eta'], 'phi': kinematics['l2_phi'], 'energy': kinematics['l2_energy']})
    p_l3 = vector.array({'pt': kinematics['l3_pt'], 'eta': kinematics['l3_eta'], 'phi': kinematics['l3_phi'], 'energy': kinematics['l3_energy']})
    p_l4 = vector.array({'pt': kinematics['l4_pt'], 'eta': kinematics['l4_eta'], 'phi': kinematics['l4_phi'], 'energy': kinematics['l4_energy']})
    return (p_l1 + p_l2 + p_l3 + p_l4).mass

Now we histogram the 4-lepon invariant mass! Use a uniform $20\,\rm GeV$-wide bins, $180 \leq m_{4\ell} < 1000\, \rm GeV$.

In [None]:
m4l_axis = hist.axis.Regular(20, 0, 200, label = 'm4l')
h_m4l_sig = hist.Hist(m4l_axis)
h_m4l_bkg = hist.Hist(m4l_axis)
h_m4l_sbi = hist.Hist(m4l_axis)

h_m4l_sig.fill( m4l(events_sig.kinematics), weight = events_sig.weights )
h_m4l_bkg.fill( m4l(events_sig.kinematics), weight = events_bkg.weights )
h_m4l_sbi.fill( m4l(events_sig.kinematics), weight = events_sbi.weights )

## 3. Bonus: Inteference-only contribution

Obtain the $m_{4\ell}$ distribution corresponding to the interference-only hypothesis:

$$
\color{grey} \left| \mathcal{M}_{\rm S} + \mathcal{M}_{\rm B} \right|^2 = |\mathcal{M}_{\rm S}|^2 +\color{black}  2\Re ( \mathcal{M}^{\dag}_{\rm S} \mathcal{M}_{\rm B} ) \color{grey} + \left| \mathcal{M}_{\rm B} \right|^2
$$

Note that the above three datasets correspond to the terms in grey.