# Project 1: filling histograms

## Problem statement

**Goal:** Using either the `compute_mass` device function we introduced earlier or a sample of precomputed masses, fill a 120-bin histogram from 0 to 120 $\mbox{GeV}/c^2$. (Ignore overflow and underflow bins.)

In the first method, create an array with dimensions `(10000000, 120)` and have each thread fill its own row of the two-dimensional array. The columns are histogram bins. (Alternatively, you can think of it as an "array of histograms," one for each CUDA thread.)

Afterward, sum over the rows, leaving only the columns.

When your computed `bin_values` are inserted into a histogram, like this:

```python
h = Hist.new.Reg(120, 0, 120, label="mass").Double()
h.values()[:] = bin_values

fig, ax = plt.subplots(figsize=(7, 5))
h.plot(ax=ax)
ax.set_xlim(0, 120)
ax.set_yscale("log");
```

The plot should look like this:

![image.png](attachment:53e5e7f5-8812-4f05-84a6-efea4553591c.png)

The method above uses a lot of memory; it wouldn't work if you have many histograms to fill in the same kernel.

In the second method, create a 1200-bin histogram (also from 0 to 120 $\mbox{GeV}/c^2$) as a one-dimensional array with length `1200`. All of the threads will be filling this one row of histogram bins, so they will have to be filled atomically. Declare the array dtype to be `np.uint32` and use the `nb.cuda.atomic.inc` function, or declare the array dtype to be `np.int32` and use the `nb.cuda.atomic.add` function ([documentation](https://numba.readthedocs.io/en/stable/cuda/intrinsics.html)).

When your computed `bin_values` are inserted into a histogram, like this:

```python
h = Hist.new.Reg(1200, 0, 120, label="mass").Double()
h.values()[:] = bin_values

fig, ax = plt.subplots(figsize=(7, 5))
h.plot(ax=ax)
ax.set_xlim(0, 120)
ax.set_yscale("log");
```

The plot should look like this:

![image.png](attachment:8bc22ba1-86c5-4de8-bc2c-51d52e389ffb.png)

because you have 10× as many bins.

**Stretch goal:** How do the speeds of the two methods compare (using precomputed masses, not counting transfer to or from the GPU)? How does the speed scale as you reduce the number of bins?

<br><br><br><br><br>

## Loading data

In [None]:
import math
import matplotlib.pyplot as plt
import numpy as np
import cupy as cp
import numba.cuda
import numba as nb
import uproot
import awkward as ak
from hist import Hist
ak.numba.register_and_check()

To load the raw data (30‒40 seconds) and `compute_mass` so that you can fill the histogram on the fly, use this:

In [None]:
with uproot.open("root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root:Events") as tree:
    events_pt, events_eta, events_phi, events_charge = tree.arrays(["Muon_pt", "Muon_eta", "Muon_phi", "Muon_charge"], entry_stop=10000000, how=tuple)

events = ak.to_backend(ak.zip({
    "pt": events_pt,
    "eta": events_eta,
    "phi": events_phi,
    "charge": events_charge,
}), "cuda")

@nb.cuda.jit(extensions=[ak.numba.cuda], device=True)
def compute_mass(event):
    for i, mu1 in enumerate(event):
        for mu2 in event[i + 1:]:
            if mu1.charge != mu2.charge:
                return math.sqrt(2*mu1.pt*mu2.pt*(math.cosh(mu1.eta - mu2.eta) - math.cos(mu1.phi - mu2.phi)))
    return -1

To load the precomputed masses, use this:

In [None]:
with uproot.open("../data/dimuon_mass.root:tree/mass") as branch:
    mass = cp.asarray(branch.array(library="np"))

<br><br><br><br><br>

## Participation

In [5]:
%%html
<!-- This will only work on the day of the live tutorial, November 2, 2023. -->
<div style="overflow: hidden;"><iframe src="https://app.sli.do/event/qZSuEE7Mv7EGrreMVmoAeq/embed/polls/46174ae7-ba0d-42fa-aaf1-e17091eedb43" width="100%" height="280" scrolling="no" style="border: none;"></div>

## Your work goes here!

<br><br><br><br><br>