Let's have a look in an output file. It is in root format, and it may well be that you will write your own ouput in a different format, but we hope that these examples help you inspect the files.

First we import the root library:

In [1]:
import ROOT


Welcome to JupyROOT 6.18/00


The derived dataset in "NanoAODOutreach" format consists a "tree" called "Events". We have to specify that name when opening the file. We can open the file directly from CERN Open data portal, and then display the first columns and rows in the dataframe:

In [2]:
df = ROOT.RDataFrame('Events', 'root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012B_TauPlusX.root')


In [3]:
display = df.Display()

In [4]:
 display.Print() 

run    | luminosityBlock | event    | HLT_IsoMu24_eta2p1 | HLT_IsoMu24 | 
194075 | 48              | 14880766 | false              | false       | 
194075 | 48              | 14844046 | true               | true        | 
194075 | 48              | 14872718 | true               | true        | 
194075 | 48              | 14772869 | false              | false       | 
194075 | 48              | 14797077 | false              | false       | 


Note that if you have produced an ouput file of your own, the tree name reflects the directory you were working on. For this example output file, it is "aod2nanoaod/Events"

In [5]:
df2 =  ROOT.RDataFrame('aod2nanoaod/Events', 'output.root')
display2 = df2.Display()
display2.Print()

run    | luminosityBlock | event      | HLT_IsoMu24_eta2p1 | HLT_IsoMu24 | 
195397 | 817             | 1044388523 | false              | false       | 
195397 | 817             | 1044431067 | false              | false       | 
195397 | 817             | 1044455851 | false              | false       | 
195397 | 817             | 1044443963 | false              | false       | 
195397 | 817             | 1044504251 | false              | false       | 


The display method shows the first columns as a default. You can find out the other columns name with:

In [7]:
print(df2.GetColumnNames())

{ "run", "luminosityBlock", "event", "HLT_IsoMu24_eta2p1", "HLT_IsoMu24", "HLT_IsoMu17_eta2p1_LooseIsoPFTau20", "PV_npvs", "PV_x", "PV_y", "PV_z", "nMuon", "Muon_pt", "Muon_eta", "Muon_phi", "Muon_mass", "Muon_charge", "Muon_pfRelIso03_all", "Muon_pfRelIso04_all", "Muon_tightId", "Muon_softId", "Muon_dxy", "Muon_dxyErr", "Muon_dz", "Muon_dzErr", "Muon_jetIdx", "Muon_genPartIdx", "nElectron", "Electron_pt", "Electron_eta", "Electron_phi", "Electron_mass", "Electron_charge", "Electron_pfRelIso03_all", "Electron_dxy", "Electron_dxyErr", "Electron_dz", "Electron_dzErr", "Electron_cutBasedId", "Electron_pfId", "Electron_jetIdx", "Electron_genPartIdx", "nTau", "Tau_pt", "Tau_eta", "Tau_phi", "Tau_mass", "Tau_charge", "Tau_decayMode", "Tau_relIso_all", "Tau_jetIdx", "Tau_genPartIdx", "Tau_idDecayMode", "Tau_idIsoRaw", "Tau_idIsoVLoose", "Tau_idIsoLoose", "Tau_idIsoMedium", "Tau_idIsoTight", "Tau_idAntiEleLoose", "Tau_idAntiEleMedium", "Tau_idAntiEleTight", "Tau_idAntiMuLoose", "Tau_idAntiMuMe

To display columns related to e.g. Electron, use the following:

In [9]:
d = df2.Display('.*Electron.*', 10)
d.Print()

nElectron | Electron_pt | Electron_eta | Electron_phi | Electron_mass | Electron_charge | 
0         |             |              |              |               |                 | 
1         | 17.5548f    | -0.493124f   | -0.454396f   | -0.00204195f  | -1              | 
0         |             |              |              |               |                 | 
1         | 19.1973f    | -2.17330f    | 1.45045f     | 0.0172910f    | -1              | 
0         |             |              |              |               |                 | 
0         |             |              |              |               |                 | 
0         |             |              |              |               |                 | 
0         |             |              |              |               |                 | 
0         |             |              |              |               |                 | 
0         |             |              |              |               |                 | 

Not all events include electrons, in this case we see that in the first 10 event, two have electrons.

We can do some plotting. This example shows how to filter the events with exactly two muons and then plot their pt.

In [10]:
%jsroot on

In [11]:
df = ROOT.RDataFrame('aod2nanoaod/Events', 'output.root')
df = df.Filter('nMuon == 2', 'Only events with exactly two muons')\
       .Define('Muon_pt_1', 'Muon_pt[0]')\
       .Define('Muon_pt_2', 'Muon_pt[1]')\
# Book a cut flow report, most important for debugging!
r = df.Report()
# Book the histograms
h1 = df.Histo1D('Muon_pt_1') # Very simple, automatic range detection
h2 = df.Histo1D(('subleading muon', 'Subleading muon pt;nMuon;Count', 20, 0, 100), 'Muon_pt_2') # The good old TH1 constructor with name, title and binning
# Compute results!



In [12]:
c = ROOT.TCanvas()
c.Divide(1, 2)
c.cd(1)
h1.Draw()
c.cd(2)
h2.Draw()
c.Draw()

In [14]:
 print('Cut-flow report:')
r.Print() 

Cut-flow report:
Only events with exactly two muons: pass=5750       all=10000      -- eff=57.50 % cumulative eff=57.50 %
