# Event Counting Example

This notebook shows you how to open individual root files, and then do some quick exploration of the contents of those files with Awkward Array.

To start, we create a list of some example files to use. These will be accessed remotely through XRootD, so make sure that you perform a `voms-proxy-init` command before running the notebook. You can do this in a terminal window within the Jupyterlab interface.

In [1]:
import awkward as ak
import numpy as np
from coffea.nanoevents import NanoEventsFactory

redirector = "root://cmsxrootd.fnal.gov//"
files = [
    redirector+"/store/mc/RunIISummer20UL17NanoAODv9/Z1JetsToNuNu_M-50_LHEFilterPtZ-50To150_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_mc2017_realistic_v9-v2/100000/E8AEC1F0-3899-664D-84E2-A775A5D5D2B6.root",
    redirector+"/store/mc/RunIISummer20UL17NanoAODv9/Z1JetsToNuNu_M-50_LHEFilterPtZ-50To150_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_mc2017_realistic_v9-v2/230000/7C6C5ABC-F034-1945-90B9-E4906A6C1988.root",
    #Z1Jets_NuNu_ZpT_150To250_17
    redirector+"/store/mc/RunIISummer20UL17NanoAODv9/Z1JetsToNuNu_M-50_LHEFilterPtZ-150To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_mc2017_realistic_v9-v2/100000/E5093A2F-49A7-194C-AB9F-2B66DACB00A2.root",
    redirector+"/store/mc/RunIISummer20UL17NanoAODv9/Z1JetsToNuNu_M-50_LHEFilterPtZ-150To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_mc2017_realistic_v9-v2/230000/56A8AA83-6151-B14D-9F80-975164A68B14.root",
    #Z1Jets_NuNu_ZpT_250To400_17
    redirector+"/store/mc/RunIISummer20UL17NanoAODv9/Z1JetsToNuNu_M-50_LHEFilterPtZ-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_mc2017_realistic_v9-v2/100000/A7E2FD5B-6F80-E242-934D-7C9B3AEC6EE8.root",
    redirector+"/store/mc/RunIISummer20UL17NanoAODv9/Z1JetsToNuNu_M-50_LHEFilterPtZ-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_mc2017_realistic_v9-v2/100000/CE06E0D0-AD05-9548-B6CA-2C2722C73174.root",
    #Z1Jets_NuNu_ZpT_400Toinf_17
    redirector+"/store/mc/RunIISummer20UL17NanoAODv9/Z1JetsToNuNu_M-50_LHEFilterPtZ-400ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_mc2017_realistic_v9-v2/100000/1AB5032D-1B75-2241-A309-CC2A872A63FC.root",
    redirector+"/store/mc/RunIISummer20UL17NanoAODv9/Z1JetsToNuNu_M-50_LHEFilterPtZ-400ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_mc2017_realistic_v9-v2/230000/0289B401-1C1E-1244-922A-0556DAF328E2.root"
]

Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector


We can go through this list of files and print the number of events in each one using coffea's `NanoEventsFactory`. We are appending the string `":Events"` to the file names to tell the `NanoEventsFactory` which tree in the root file we want to read.

In [2]:
for f in files:
    events = NanoEventsFactory.from_root(
        f+":Events",
    ).events()
    print(ak.num(events,axis=0).compute())



35889
497627
208207
17997
5070
61090
16356
30391


The expected output is:

35889<br>
497627<br>
208207<br>
17997<br>
5070<br>
61090<br>
16356<br>
30391<br>

Now, let's grab one particular file and break down that last cell.

In [3]:
filename = redirector+"/store/mc/RunIISummer20UL17NanoAODv9/Z1JetsToNuNu_M-50_LHEFilterPtZ-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_mc2017_realistic_v9-v2/100000/A7E2FD5B-6F80-E242-934D-7C9B3AEC6EE8.root"
events = NanoEventsFactory.from_root(filename+":Events").events()
type(events)

dask_awkward.lib.core.Array

The `NanoEventsFactory` object allows us to get an `events` object, which here is a dask-awkward object. This is a delayed object, meaning that we haven't actually loaded the full file into memory. Also, if we ask for some data from it, we won't get floats or bools, but another delayed object. For example, the `ak.num` function, which counts objects, returns another dask-awkward object instead of a number.

Here, we use `axis=0` to count the outermost axis (ie: the rows of the array).

In [4]:
x = ak.num(events,axis=0)
print(type(x))
print(x)

<class 'dask_awkward.lib.core.Scalar'>
dask.awkward<numaxis0, type=Scalar, dtype=int64>


In order to get an actual number, we need to compute the object.

In [5]:
y = x.compute()
print(type(y))
print(y)

<class 'numpy.int64'>
5070


It is possible to load the entire event at once, but this is generally suboptimal from a compute perspective. It can be useful, however, to do this when you are exploring new data, and will need to do lots of small, short commands on a small number of events. This can be done by setting `delayed=False` in the `from_root` call, as follows.

In [6]:
events_eager = NanoEventsFactory.from_root(filename+":Events",delayed=False).events()
type(events_eager)

coffea.nanoevents.methods.base.NanoEventsArray

Otherwise, dask-awkward arrays will usually need a `.compute()` at the end to materialize a result.

An exception to this pattern is metadata. For example, to access the list of branches in the events, one can do `events.fields`, and this command need no `.compute()`.

In [7]:
print(f"Fields in events are: {events.fields}")

Fields in events are: ['SoftActivityJetNjets2', 'Muon', 'Electron', 'CaloMET', 'HLTriggerFinalPath', 'genTtbarId', 'SoftActivityJetHT', 'PV', 'L1', 'Jet', 'boostedTau', 'LHEScaleWeight', 'DeepMETResponseTune', 'btagWeight', 'LHEPart', 'fixedGridRhoFastjetAll', 'fixedGridRhoFastjetCentralCalo', 'SoftActivityJetHT2', 'SV', 'FsrPhoton', 'LHEPdfWeight', 'Generator', 'RawMET', 'CorrT1METJet', 'L1simulation', 'PuppiMET', 'fixedGridRhoFastjetCentral', 'GenJet', 'Tau', 'SoftActivityJet', 'LHEReweightingWeight', 'fixedGridRhoFastjetCentralNeutral', 'L1PreFiringWeight', 'ChsMET', 'genWeight', 'Pileup', 'luminosityBlock', 'LHEWeight', 'event', 'FatJet', 'MET', 'LHE', 'Flag', 'DeepMETResolutionTune', 'SoftActivityJetHT5', 'fixedGridRhoFastjetCentralChargedPileUp', 'SubGenJetAK8', 'GenIsolatedPhoton', 'run', 'GenJetAK8', 'GenVisTau', 'LowPtElectron', 'GenVtx', 'HTXS', 'L1Reco', 'SubJet', 'RawPuppiMET', 'GenMET', 'SoftActivityJetHT10', 'HLTriggerFirstPath', 'GenDressedLepton', 'SoftActivityJetNjets1

To access branches in the events array, the syntax is `events.branch_name`. To get more information on a field in the array, one can do `events.field_name?`. For example, to learn more about the Jet array,

In [8]:
events.Jet?

[0;31mType:[0m            Array
[0;31mString form:[0m     dask.awkward<Jet, npartitions=1>
[0;31mFile:[0m            /usr/local/lib/python3.12/site-packages/dask_awkward/lib/core.py
[0;31mDocstring:[0m       slimmedJets, i.e. ak4 PFJets CHS with JECs applied, after basic selection (pt > 15)
[0;31mClass docstring:[0m
Partitioned, lazy, and parallel Awkward Array Dask collection.

The class constructor is not intended for users. Instead use
factory functions like :py:func:`~dask_awkward.from_parquet`,
:py:func:`~dask_awkward.from_json`, etc.

Within dask-awkward the ``new_array_object`` factory function is
used for creating new instances.

Now, let's get two particular jets and check them out. Here, we look at the $12^\text{th}$ event and take the first and second jet in the array in that event.

In [9]:
jet1 = events_eager.Jet[12][:1]
jet2 = events_eager.Jet[12][1:2]
print(type(jet1))
print(f"Jet 1's phi: {jet1.phi[0]:.2f}")
print(f"Jet 2's phi: {jet2.phi[0]:.2f}")
print(f"Jet 1's eta: {jet1.eta[0]:.2f}")
print(f"Jet 2's eta: {jet2.eta[0]:.2f}")
print(f"The delta-R between the jets: {jet1.delta_r(jet2)[0]:.2f}") #Awkward Array has some very useful HEP-centric methods, like delta_r!
print(f"Manually calculated delta-R between the jets: {np.sqrt((jet1.phi-jet2.phi)**2+(jet1.eta-jet2.eta)**2)[0]:.2f}")

<class 'coffea.nanoevents.methods.nanoaod.JetArray'>
Jet 1's phi: -1.04
Jet 2's phi: 1.30
Jet 1's eta: -0.21
Jet 2's eta: 1.73
The delta-R between the jets: 3.04
Manually calculated delta-R between the jets: 3.04


We could also go to the $12^\text{th}$ event and look at all the jets' $\phi$, for example, at once.

In [10]:
events_eager[12].Jet.phi

Or, we can look at jets' $\phi$ across (in principle, all) events at once. This is a window into the columnar framework that we explore more in the following notebooks - the idea is that if we want to access jet $\phi$ in each event, we can refer to all those jets' $\phi$ at once.

Of course, we have 5070 events here, so we can't conveniently print them all at once.

In [11]:
events_eager.Jet.phi