In [1]:
#Set up notebook
!pip3 install awkward uproot
import awkward as ak
import uproot
import numpy as np

Collecting uproot
  Using cached uproot-5.1.1-py3-none-any.whl (340 kB)
Installing collected packages: uproot
Successfully installed uproot-5.1.1


# Reading ROOT Files and Basic Analysis Techniques
In this exercise, we will use `uproot` to open a TFile, learn to navigate its structure, and import a TTree as an awkward array. Once we have read the TTree "events", we can begin to perform our analysis of the data.

## Reading ROOT Files
The library `uproot` handles reading and writing ROOT files in python. It can read the TFile TTrees' branches into numpy arrays, pandas dataframe, or awkward arrays by setting the variable `uproot.default_library`. We will default it to awkward array for this exercise.

In [2]:
uproot.default_library = "ak" 
TFile = uproot.open("SMS_T2tt_mStop250_mLSP75_fastsim_2016_Skim_070650_37_082253_36.root")
print(TFile)

<ReadOnlyDirectory '/' at 0x7fc90c692970>


Printing the TFile object does not give much information. To access its contents we need to use methods defined by `Uproot.ReadOnlyDirectory`. The class reference can be found [here](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html). To see the contents of `TFile` we call

In [3]:
TFile.keys()

['Events;1', 'Runs;1', 'LuminosityBlocks;1', 'untagged;1']

To obtain the classtypes and number of branches for these objects we call

In [4]:
TFile.items()

[('Events;1', <TTree 'Events' (860 branches) at 0x7fc90c6929d0>),
 ('Runs;1', <TTree 'Runs' (8 branches) at 0x7fc8c44ea6a0>),
 ('LuminosityBlocks;1',
  <TTree 'LuminosityBlocks' (2 branches) at 0x7fc8c44ea9a0>),
 ('untagged;1', <TObjString 'untagged' at 0x7fc8329ff2e0>)]

We can see our ROOT file contains a TTree called "Events" with 860 branches which contains our data. We can read it without loading all the data into memory. We can call `.keys()` and `.items()` on the TTree as well to read its branch names. Let's read "Events" by passing the key in square brackets to `TFile` and check the first 5 branches.

In [5]:
Events = TFile["Events"]
print(Events.keys()[0:4])

['run', 'luminosityBlock', 'event', 'btagWeight_CSVV2']


The contents of these branches can be read as an awkward array by calling `TBranch.array()`. Let's read the branch `Muon_pt` into an awkward array.

In [6]:
Muon_pT = Events["Muon_pt"].array()
Muon_pT

We read `type: 847873 * var * float32` as follows. There are 847873 entries (events) where the values are all of type `float32`. `* var *` defines the shape of the array. The empty entries are the events with no muons. We can also create an awkward array containing multiple branches if convenient. Let's create an awkward array containing the pT, eta, and phi of the muons using `ak.zip`.

In [7]:
muons = ak.zip(
            {"pT": Muon_pT, #key is the name of the array and the value is the associated array
             "eta": Events["Muon_eta"].array(), 
             "phi": Events["Muon_phi"].array(),
             }
        )
muons

We can read the branches of our constructed array as follows by passing the key in square brackets.

In [8]:
print(muons["eta"])

[[], [0.723], [-0.927], [0.145, 0.686], [0.33], ..., [...], [0.137], [], [], []]


## Efficiency Calculation
It is often convenient to evaluate multiple selection criteria at once. Let's calculate the efficiency of the following selection: "An event must contain atleast one muon with pt > 50 and |eta| < 2.5". Let's compare the time to complete this calculation between a row-wise and columnar analysis.

In [9]:
%%time
#Row wise analysis
muons_subset = muons[0:10000]
events_pass = 0

#Check each event for atleast one muon satisfying quality criteria
for i, event in enumerate(muons_subset["pT"]):
    for j, muon_pT in enumerate(event):
        if (muon_pT > 50) and (np.abs(muons["eta"][i,j]) < 2.5):
            events_pass += 1
            break #go to next event when we find a good muon
            
efficiency = events_pass/len(muons_subset)
print("Efficiency is: ", efficiency)

Efficiency is:  0.0881
CPU times: user 438 ms, sys: 31.7 ms, total: 470 ms
Wall time: 461 ms


Now using awkward array to perform a columnar analysis of the problem:

In [10]:
%%time
#Columnar analysis
selection = ak.any((muons_subset["pT"] > 50) & (np.abs(muons_subset["eta"]) < 2.5), axis = 1)
muons_pass = muons_subset[selection]
efficiency = len(muons_pass)/len(muons_subset)
print("Efficiency is: ", efficiency)

Efficiency is:  0.0881
CPU times: user 2.21 ms, sys: 782 µs, total: 2.99 ms
Wall time: 2.73 ms


You should see the runtime reduce from seconds to milliseconds representing a 100 fold reduction in runtime. Python for loops are extremely slow as memory is being accessed per loop iteration (here per muon per event). The awkward array syntax is more human readable and left us the object `muons_pass` which contains the events which satisfy this selection criteria.  

## Scale Factor Calculation
When calculating MC event weights, you will need to calculate many different scale factors associated with the various objects in each event. For this dataset, let's calculate the event scale factor associated with the generated muons. Each muon in the dataset has an calculated scale factor, but not every muon in `muons` satisfies our quality criteria. We first must find which muons pass our selection criteria and then calculate the total muon scale factor from all those muons per event. Let's collect all the neccesarry muon variables.

In [11]:
muons = ak.zip(
            {"pt": Events["Muon_pt"].array(),
             "eta": Events["Muon_eta"].array(),
             "is_mu": Events["Muon_Stop0l"].array(),
             "muSF": Events["Muon_LooseSF"].array()
             }
)

847873

We will define a **good** as a muon with $pt > 5$, $|eta| < 2.5$, and identified as a true muon from generator level information. First, let's sort out the good muons. Once we have collected the good muons, let's multiply all the scale factors in an event. 

In [17]:
muons = muons[muons["is_mu"] & (muons["pt"] > 5) & (muons["eta"] < 2.5)]
print(muons)
mu_SF = ak.prod(muons["muSF"], axis=1)
mu_SF

[[], [], [{pt: 12.6, eta: -0.927, is_mu: True, muSF: 0.996}], ..., [], [], []]


We now have an array containing the muon scale factor for each event. Note that events without any good muons are given a scale factor of 1 with this workflow. This is because empty entries result in a value of 1 as shown below when calling `ak.prod()`, so there is no need to account for such events.

In [21]:
array = ak.Array([[],[]])
ak.prod(array, axis=1)