# Exercise 3 (solved): Higgs analysis with Awkward Arrays

In [None]:
import matplotlib.pyplot as plt
import numpy as np

import awkward as ak
import hist
import vector
vector.register_awkward()

In [None]:
events = ak.from_parquet("data/SMHiggsToZZTo4L.parquet")

<br><br><br>

## Exclusive and inclusive analysis

In [exercise-2.ipynb](exercise-2.ipynb), you did an "exclusive" study of Higgs → ZZ → eeμμ. That is to say, the collision events weren't allowed to have anything else in them, such as another Z boson decaying into electrons or muons, a heavy-flavor quark decaying into electrons or muons, or stray particles misreconstructed as though they were electrons or muons. It's giving up on good events, but it's also excluding an unknown fraction of events if we don't have a way to estimate of how often these other phenomena occur in the same event.

<br>

With Awkward Array's combinatorial functions, we now have a way to do "inclusive" analyses.

In [None]:
electron_pairs = ak.combinations(events.electron, 2)
muon_pairs = ak.combinations(events.muon, 2)

In [None]:
two_electrons_and_two_muons = ak.cartesian([electron_pairs, muon_pairs])

<br>

The structure of `two_electrons_and_two_muons` is

```
array of lists of ((electron1, electron2), (muon1, muon2))
```

In [None]:
two_electrons_and_two_muons.type.show()

<br>

How many `two_electrons_and_two_muons` (pairs of pairs) does each event have?

In [None]:
ak.num(two_electrons_and_two_muons)

<br>

How many events have at least one of these combinations? (You could use the [ak.count_nonzero](https://awkward-array.org/doc/main/reference/generated/ak.count_nonzero.html) reducer or the [ak.sum](https://awkward-array.org/doc/main/reference/generated/ak.sum.html) reducer.)

In [None]:
inclusive_candidates = ak.count_nonzero(ak.num(two_electrons_and_two_muons) > 0)
inclusive_candidates

<br>

How many events have exactly two electrons and exactly two muons?

In [None]:
exclusive_candidates = ak.count_nonzero((ak.num(events.electron) == 2) & (ak.num(events.muon) == 2))
exclusive_candidates

<br>

More of the inclusive candidates than the exclusive candidates are not real Higgs → ZZ → eeμμ events (that's what quality cuts are for), but some of the

In [None]:
inclusive_candidates / exclusive_candidates

times more candidates in the inclusive sample are real events.

Not knowing how many would prevent us from being able to measure absolute rates.

<br><br><br>

## Four electron (or four muon) final state

### First method

We can't construct candidates for Higgs → ZZ → eeee (or μμμμ) as we did with eeμμ because

In [None]:
electron_pairs1 = ak.combinations(events.electron, 2)
electron_pairs2 = ak.combinations(events.electron, 2)
pairs_of_electron_pairs = ak.cartesian([electron_pairs1, electron_pairs2])

`pairs_of_electron_pairs` has slots for four electrons, but they're filled with electrons that may or may not be distinct.

<br>

Here's a more transparent example:

In [None]:
array = ak.Array([[1, 2, 3, 4, 5]])

In [None]:
pairs1 = ak.combinations(array, 2)
pairs2 = ak.combinations(array, 2)
pairs_of_pairs = ak.cartesian([pairs1, pairs2])

In [None]:
pairs_of_pairs.to_list()

<br>

It is never the case that an electron from the decay of one Z boson is also from the decay of the other Z boson.

<br>

Instead, we could use charge (+ or ‒) as we had previously used flavor (electron or muon). To do this, we'd have to put e⁺ into a different collection (array) from e⁻.

In [None]:
eplus = events.electron[events.electron.charge > 0]
eminus = events.electron[events.electron.charge < 0]

In [None]:
eplusplus = ak.combinations(eplus, 2)
eminusminus = ak.combinations(eminus, 2)
epairs_of_pairs = ak.cartesian([eplusplus, eminusminus])

<br>

These combinations were built entirely from distinct objects, so none of the particles within a candidate are duplicates.

Demonstrating with a more transparent example:

In [None]:
array = ak.Array([[1, 2, 3, 4, 5]])

In [None]:
evens = array[array % 2 == 0]
odds = array[array % 2 == 1]
evens, odds

In [None]:
pairs1 = ak.combinations(evens, 2)
pairs2 = ak.combinations(odds, 2)
pairs_of_pairs = ak.cartesian([pairs1, pairs2])

In [None]:
pairs_of_pairs.to_list()

<br>

Instead of using [ak.unzip](https://awkward-array.org/doc/main/reference/generated/ak.unzip.html), we can look at each of the slots individually. Tuples are records with fields named `"0"`, `"1"`, etc., so we can see that the first two slots always contain e⁺ and the last two slots always contain e⁻.

In [None]:
assert ak.all(epairs_of_pairs["0", "0"].charge == 1)  # be sure to use quotation marks
assert ak.all(epairs_of_pairs["0", "1"].charge == 1)
assert ak.all(epairs_of_pairs["1", "0"].charge == -1)
assert ak.all(epairs_of_pairs["1", "1"].charge == -1)

<br>

The nested pairs are not the decay products of the two individual Z bosons, since each pair contains like-sign electrons.

In each candidate, there are two possible ways to assign electrons to Z bosons (two _interpretations_):

  1. $Z_1$'s electrons are in fields `"0", "0"` and `"1", "0"`, which would mean that $Z_2$'s electrons are in fields `"0", "1"` and `"1", "1"`.
  2. $Z_1$'s electrons are in fields `"0", "0"` and `"1", "1"`, which would mean that $Z_2$'s electrons are in fields `"0", "1"` and `"1", "0"`.

In [None]:
interpretation1_z1 = epairs_of_pairs["0", "0"] + epairs_of_pairs["1", "0"]
interpretation1_z2 = epairs_of_pairs["0", "1"] + epairs_of_pairs["1", "1"]

interpretation2_z1 = epairs_of_pairs["0", "0"] + epairs_of_pairs["1", "1"]
interpretation2_z2 = epairs_of_pairs["0", "1"] + epairs_of_pairs["1", "0"]

<br>

For each candidate index `i`, either interpretation 1 is correct or interpretation 2 is correct. Different indexes `i` and `j` can have different interpretations.

As a reminder from [exercise-2.ipynb](exercise-2.ipynb), the masses of Z → ee and Z → μμ looked like

![image.png](attachment:0c1588af-636e-44ac-9796-20544da0d829.png)

Since we didn't have any ambiguity about which particle to associate with each Z boson, the only thing to be determined was whether the Z → ee was close to being "on-shell" (mass close to 91 GeV/c$^2$) or the Z → μμ was close to being "on-shell."

<br>

The masses of the two e⁺e⁻ pairs in interpretation 1 are:

In [None]:
hist.Hist.new.Regular(60, 0, 120, name="zmass_1").Regular(60, 0, 120, name="zmass_2").Double().fill(
    zmass_1=ak.flatten(interpretation1_z1.mass),
    zmass_2=ak.flatten(interpretation1_z2.mass),
).plot2d_full();

<br>

And the masses of the two e⁺e⁻ pairs in interpretation 2 are:

In [None]:
hist.Hist.new.Regular(60, 0, 120, name="zmass_1").Regular(60, 0, 120, name="zmass_2").Double().fill(
    zmass_1=ak.flatten(interpretation2_z1.mass),
    zmass_2=ak.flatten(interpretation2_z2.mass),
).plot2d_full();

<br>

The "fog" of low-mass e⁺e⁻ pairs in each interpretation are from wrong-combinations. Put together two particles that don't come from the same decay and their mass will be a broad distribution of values.

The plots above also indicate how we can disambiguate interpretations, candidate-by-candidate. A good interpretation with have at least one Z boson close to its on-shell mass of 91 GeV/c$^2$.

In each interpretation, we can find the lowest and highest e⁺e⁻ pair mass.

In [None]:
interpretation1_zsmall = ak.where(interpretation1_z1.mass < interpretation1_z2.mass, interpretation1_z1, interpretation1_z2)
interpretation1_zbig   = ak.where(interpretation1_z1.mass > interpretation1_z2.mass, interpretation1_z1, interpretation1_z2)

interpretation2_zsmall = ak.where(interpretation2_z1.mass < interpretation2_z2.mass, interpretation2_z1, interpretation2_z2)
interpretation2_zbig   = ak.where(interpretation2_z1.mass > interpretation2_z2.mass, interpretation2_z1, interpretation2_z2)

<br>

The masses of the lowest and highest e⁺e⁻ pair mass in interpretation 1 are:

In [None]:
hist.Hist.new.Regular(60, 0, 120, name="zmass_small").Regular(60, 0, 120, name="zmass_big").Double().fill(
    zmass_small=ak.flatten(interpretation1_zsmall.mass),
    zmass_big=ak.flatten(interpretation1_zbig.mass),
).plot2d_full();

<br>

The masses of the lowest and highest e⁺e⁻ pair mass in interpretation 2 are:

In [None]:
hist.Hist.new.Regular(60, 0, 120, name="zmass_small").Regular(60, 0, 120, name="zmass_big").Double().fill(
    zmass_small=ak.flatten(interpretation2_zsmall.mass),
    zmass_big=ak.flatten(interpretation2_zbig.mass),
).plot2d_full();

<br>

If one interpretation has a highest e⁺e⁻ pair mass close to the on-shell Z mass and the other doesn't, then it is the most likely interpretation. Just as we used [ak.where](https://awkward-array.org/doc/main/reference/generated/ak.where.html) to select between pair masses to find the lowest and highest per interpretation, we can select between interpretations on a candidate-by-candidate basis.

In [None]:
interpretation1_is_best = interpretation1_zbig.mass > interpretation2_zbig.mass

In [None]:
best_interpretation_zbig = ak.where(interpretation1_is_best, interpretation1_zbig, interpretation2_zbig)
best_interpretation_zsmall = ak.where(interpretation1_is_best, interpretation1_zsmall, interpretation2_zsmall)

We are left with only one plot of the two Z bosons, in the best interpretation per candidate.

In [None]:
hist.Hist.new.Regular(60, 0, 120, name="zmass_small").Regular(60, 0, 120, name="zmass_big").Double().fill(
    zmass_small=ak.flatten(best_interpretation_zsmall.mass),
    zmass_big=ak.flatten(best_interpretation_zbig.mass),
).plot2d_full();

<br><br><br>

Since the above used "biggest mass" instead of "closest to on-shell Z mass," it biases Z candidates to the high end of the Z boson mass distribution. However, it could be reimplemented using

In [None]:
from hepunits import GeV
from particle import Particle

In [None]:
z_onshell = Particle.from_name("Z0").mass / GeV
z_onshell

In [None]:
abs(interpretation1_z1.mass - z_onshell) < abs(interpretation1_z2.mass - z_onshell);

instead of

In [None]:
interpretation1_z1.mass < interpretation1_z2.mass;

<br><br><br>

### Second method

The first method used charge (+ or ‒) as a way to reduce candidates, just as the ZZ → eeμμ analysis used flavor (electron or muon).

A more general method, that doesn't use this discriminator immediately, would be to start with

In [None]:
four_electrons = ak.combinations(events.electron, 4)

and construct interpretations of these generic combinations.

Unlike the first method, `four_electrons` are not pairs of pairs; they are one level of quadruplets.

In [None]:
four_electrons.type.show()

These 4-tuples have fields `"0"`, `"1"`, `"2"`, `"3"`.

Each 4-tuple candidate could have six interpretations:

  1. $Z_1$'s electrons are in fields `"0"` and `"1"`, which would mean that $Z_2$'s electrons are in fields `"2"` and `"3"`.
  2. $Z_1$'s electrons are in fields `"0"` and `"2"`, which would mean that $Z_2$'s electrons are in fields `"1"` and `"3"`.
  3. $Z_1$'s electrons are in fields `"0"` and `"3"`, which would mean that $Z_2$'s electrons are in fields `"1"` and `"2"`.
  4. $Z_1$'s electrons are in fields `"1"` and `"2"`, which would mean that $Z_2$'s electrons are in fields `"0"` and `"3"`.
  5. $Z_1$'s electrons are in fields `"1"` and `"3"`, which would mean that $Z_2$'s electrons are in fields `"0"` and `"2"`.
  6. $Z_1$'s electrons are in fields `"2"` and `"3"`, which would mean that $Z_2$'s electrons are in fields `"0"` and `"1"`.

But interpretation 1 is interpretation 6 if we swap labels $Z_1$ and $Z_2$; similarly for interpretations 2 and 5; similarly for interpretations 3 and 4.

So there are only three interpretations:

  1. $Z_a$'s electrons are in fields `"0"` and `"1"`, which would mean that $Z_b$'s electrons are in fields `"2"` and `"3"`.
  2. $Z_a$'s electrons are in fields `"0"` and `"2"`, which would mean that $Z_b$'s electrons are in fields `"1"` and `"3"`.
  3. $Z_a$'s electrons are in fields `"0"` and `"3"`, which would mean that $Z_b$'s electrons are in fields `"1"` and `"2"`.

We could carry out a similar analysis as in the first method, but with three interpretations instead of two, and at some point, we can filter the candidates by requiring each Z boson to decay to opposite-sign electrons.

Left as an exercise to the reader!

<br><br><br>

Go to [lesson-4.ipynb](lesson-4.ipynb) when we're all done reviewing this exercise.