# ROOT without ROOT!

In this brief tutorial I'll go through opening a ROOT file, performing some basic cuts, saving our results to a dataframe and making some plots.

This tutorial is on GitHub, and can be found at https://github.com/professor-calculus/AlexDataFramesTutorial.git

If you have git lfs installed then you'll get the .root file when cloning. However, if not then scp from 
YOURUSERNAME@lxplus.cern.ch:/afs/cern.ch/work/a/atittert/public/delphes.root

![alt text](./img.jpg)

First let's load up our dependencies, we won't need many.
Spoiler: ROOT isnt one of them!

In [1]:
import math
import pandas as pd
import uproot
from tqdm import tqdm_notebook as tqdm

Now, let's open the ROOT file from Delphes:

In [2]:
tree = uproot.open('./delphes.root')["Delphes"]

We can also take a look at the branches, and the leaves within each branch etc using keys():

In [3]:
tree.keys()

[b'Event',
 b'Event_size',
 b'EventLHEF',
 b'EventLHEF_size',
 b'WeightLHEF',
 b'WeightLHEF_size',
 b'Particle',
 b'Particle_size',
 b'GenJet',
 b'GenJet_size',
 b'Jet',
 b'Jet_size',
 b'Electron',
 b'Electron_size',
 b'Photon',
 b'Photon_size',
 b'Muon',
 b'Muon_size',
 b'MissingET',
 b'MissingET_size',
 b'ScalarHT',
 b'ScalarHT_size']

In [4]:
tree['Jet'].keys()

[b'Jet.fUniqueID',
 b'Jet.fBits',
 b'Jet.PT',
 b'Jet.Eta',
 b'Jet.Phi',
 b'Jet.T',
 b'Jet.Mass',
 b'Jet.DeltaEta',
 b'Jet.DeltaPhi',
 b'Jet.Flavor',
 b'Jet.FlavorAlgo',
 b'Jet.FlavorPhys',
 b'Jet.BTag',
 b'Jet.BTagAlgo',
 b'Jet.BTagPhys',
 b'Jet.TauTag',
 b'Jet.Charge',
 b'Jet.EhadOverEem',
 b'Jet.NCharged',
 b'Jet.NNeutrals',
 b'Jet.Beta',
 b'Jet.BetaStar',
 b'Jet.MeanSqDeltaR',
 b'Jet.PTD',
 b'Jet.FracPt[5]',
 b'Jet.Tau[5]',
 b'Jet.TrimmedP4[5]',
 b'Jet.PrunedP4[5]',
 b'Jet.SoftDroppedP4[5]',
 b'Jet.NSubJetsTrimmed',
 b'Jet.NSubJetsPruned',
 b'Jet.NSubJetsSoftDropped',
 b'Jet.Constituents',
 b'Jet.Particles',
 b'Jet.Area']

Let's define some placeholder variables we'll use: arrays which will contain per-event information.

In [5]:
# Cuts: 200GeV MHT, 800GeV HT, >=3 jets, >=1 b-jet
mht_min = 400.
ht_min = 900.
njet_min = 3
nbjet_min = 2

total_n_entries = len(tree['ScalarHT.HT'])
print('Tree of {0} entries read in'.format(total_n_entries))

Tree of 2000 entries read in


### Iterating over a ROOT tree:

To iterate over the ROOT tree we use the inventively titles 'iterate()' function in uproot.

In [11]:
# Initialise the variables we'll write to:
mht = [] # Missing HT: Vector sum of jet pT
ht = [] # Scalar HT: Scalar sum of jet pT
njets = [] # Number of jets which pass certain criteris (ID etc)
nbjets = [] # Number of b-tagged jets...
event_passes_bool = [] # Does event pass cuts?
n_eventpass = 0 # Number of events which passed the cuts

for HT, JetPt, JetEta, JetPhi, JetBtag in tqdm(uproot.iterate('./delphes.root', 'Delphes', ["ScalarHT.HT", "Jet.PT", "Jet.Eta", "Jet.Phi", "Jet.BTag"], outputtype=tuple), desc='Go Go Go!'):
    for HT_i, JetPt_i, JetEta_i, JetPhi_i, JetBtag_i in zip(HT, JetPt, JetEta, JetPhi, JetBtag):
        
        # Reset some variables:
        nJet = 0
        nBJet = 0
        mht_x = 0.
        mht_y = 0.
        
        # Easy one first:
        ht.append(HT_i[0])
        
        # Loop over the jets in the event
        for JetPt_j, JetEta_j, JetPhi_j, JetBtag_j in zip(JetPt_i, JetEta_i, JetPhi_i, JetBtag_i):
            # Only include central jets with decent pT to avoid pileup contributions etc
            if JetPt_j > 40. and abs(JetEta_j) < 2.4:
                nJet += 1
                mht_x += -1. * JetPt_j * math.cos(JetPhi_j)
                mht_y += JetPt_j * math.sin(JetPhi_j)
                
                # Does this jet have a b-tag?
                if JetBtag_j:
                    nBJet += 1
        
        # Missing-HT from its components
        mht_tmp = math.sqrt(mht_x**2 + mht_y**2)
        mht.append(mht_tmp)
        
        njets.append(nJet)
        nbjets.append(nBJet)
        
        # Does this event pass all cuts?
        pass_cuts = True
        if mht_tmp < mht_min: pass_cuts = False
        if HT_i[0] < ht_min: pass_cuts = False
        if nJet < njet_min: pass_cuts = False
        if nBJet < nbjet_min: pass_cuts = False
        event_passes_bool.append(pass_cuts)
        if pass_cuts: n_eventpass += 1
        
percentage = 100.*float(n_eventpass)/float(total_n_entries)
print('{0} of {1} events ({2}%) passed all cuts'.format(n_eventpass, total_n_entries, percentage))

HBox(children=(IntProgress(value=1, bar_style='info', description='Go Go Go!', max=1, style=ProgressStyle(desc…


1176 of 2000 events (58.8%) passed all cuts


## Writing the variables to a dataframe
df.head() shows first 5 rows (events)

In [12]:
df = pd.DataFrame({
    'HT': ht,
    'MHT': mht,
    'NJets': njets,
    'NBJets': nbjets,
    'Passes_Cuts': event_passes_bool,
})
df.head()
#print(df)

Unnamed: 0,HT,MHT,NBJets,NJets,Passes_Cuts
0,1339.817749,486.756073,2,9,True
1,2539.739258,866.369924,3,8,True
2,2071.848633,636.202797,4,10,True
3,2112.030029,372.357679,3,8,False
4,1994.492554,1208.95727,5,7,True


## Save output
Save to a .txt file to access later

In [10]:
df.to_csv('DataFrame.txt', sep='\t', index=False)