# 2. Uproot

<br><br><br><br><br>

## What a ~complete analysis looks like in Uproot/Awkward Array

Instead of starting with small steps, let's look at where this is going, what a sample analysis looks like with these tools.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import awkward as ak

import uproot
import hist

In [None]:
upfile = uproot.open("root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root")
uptree = upfile["Events"]
uptree.show()

The general strategy is to get arrays in one function call (usually slow, has to read) and use them interactively afterward.

In [None]:
muons = uptree.arrays(["Muon_pt", "Muon_eta", "Muon_phi", "Muon_charge"], cut="nMuon >= 2", how="zip", entry_stop=100000)

We've already applied an `nMuon >= 2` cut, but we can define additional cuts.

In [None]:
os_cut = muons[:, "Muon", "charge", 0] != muons[:, "Muon", "charge", 1]
os_cut

Slicing (to be described in more detail later) can remove data and reduce the structure of an array.

In [None]:
mu1 = muons[os_cut, 0, "Muon"]
mu2 = muons[os_cut, 1, "Muon"]
mu1, mu2

Make a histogram and fill it with a calculation from the array. The mini-plot is just the way this histogram type is visualized in Jupyter.

In [None]:
h1 = hist.Hist.new.Reg(120, 0, 120, name="mass").Double()

In [None]:
h1.fill(np.sqrt(2*mu1.pt*mu2.pt*(np.cosh(mu1.eta - mu2.eta) - np.cos(mu1.phi - mu2.phi))))

Plot it using Matplotlib (for logscale).

In [None]:
h1.plot()
plt.yscale("log")

<br><br><br><br><br>

## What a the same analysis looks like in PyROOT

In [None]:
import ROOT
c1 = ROOT.TCanvas()

In [None]:
rootfile = ROOT.TFile.Open("root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root")
roottree = rootfile.Get("Events")

ROOT analyses (before RDataFrame; see below) are based on an event loop. Reading and calculations are done in the loop.

This is not following the "one weird trick." That's why it's slow.

In [None]:
h2 = ROOT.TH1D("h2", "mass", 120, 0, 120)

for index, event in enumerate(roottree):
    # Analyzing a subsample means breaking out of the loop early.
    if index == 100000:
        break
    # Applying cuts means if-statements.
    if event.nMuon >= 2 and event.Muon_charge[0] != event.Muon_charge[1]:
        mu1_pt = event.Muon_pt[0]
        mu2_pt = event.Muon_pt[1]
        mu1_eta = event.Muon_eta[0]
        mu2_eta = event.Muon_eta[1]
        mu1_phi = event.Muon_phi[0]
        mu2_phi = event.Muon_phi[1]
        h2.Fill(np.sqrt(2*mu1_pt*mu2_pt*(np.cosh(mu1_eta - mu2_eta) - np.cos(mu1_phi - mu2_phi))))

In [None]:
h2.Draw()
c1.SetLogy()
c1.Draw()

<br><br><br><br><br>

## What a the same analysis looks like in old C++

By "old C++," I mean `TTree::GetEntry`. This is also a reading + calculating loop over events.

Use `ROOT.gInterpreter.Declare` to define a C++ function in Python that we can use through PyROOT.

In [None]:
ROOT.gInterpreter.Declare('''
void compute(TH1D& h3, TTree& roottree) {
    UInt_t nMuon;
    float Muon_pt[50];
    float Muon_eta[50];
    float Muon_phi[50];
    int32_t Muon_charge[50];

    roottree.SetBranchStatus("*", 0);
    roottree.SetBranchStatus("nMuon", 1);
    roottree.SetBranchStatus("Muon_pt", 1);
    roottree.SetBranchStatus("Muon_eta", 1);
    roottree.SetBranchStatus("Muon_phi", 1);
    roottree.SetBranchStatus("Muon_charge", 1);

    roottree.SetBranchAddress("nMuon", &nMuon);
    roottree.SetBranchAddress("Muon_pt", Muon_pt);
    roottree.SetBranchAddress("Muon_eta", Muon_eta);
    roottree.SetBranchAddress("Muon_phi", Muon_phi);
    roottree.SetBranchAddress("Muon_charge", Muon_charge);
    
    for (int index = 0; index < 100000; index++) {
        roottree.GetEntry(index);
        if (nMuon >= 2 && Muon_charge[0] != Muon_charge[1]) {
            float mu1_pt = Muon_pt[0];
            float mu2_pt = Muon_pt[1];
            float mu1_eta = Muon_eta[0];
            float mu2_eta = Muon_eta[1];
            float mu1_phi = Muon_phi[0];
            float mu2_phi = Muon_phi[1];
            h3.Fill(sqrt(2*mu1_pt*mu2_pt*(cosh(mu1_eta - mu2_eta) - cos(mu1_phi - mu2_phi))));
        }
    }
}
''')

In [None]:
rootfile = ROOT.TFile.Open("root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root")
roottree = rootfile.Get("Events")

h3 = ROOT.TH1D("h3", "mass", 120, 0, 120)

ROOT.compute(h3, roottree)

In [None]:
h3.Draw()
c1.SetLogy()
c1.Draw()

<br><br><br><br><br>

## What a the same analysis looks like in modern RDataFrame

This case mixes Python (for organization) with C++ (for speed).

<img src="img/rdataframe-flow.svg" style="width: 800px">

In [None]:
df = ROOT.RDataFrame("Events", "root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root")

# Each node is connected to the previous, in a chain (which can split and recombine).
df_limit = df.Range(100000)
df_2mu = df_limit.Filter("nMuon >= 2")
df_os = df_2mu.Filter("Muon_charge[0] != Muon_charge[1]")

# This node is a big C++ block.
df_mass = df_os.Define("Dimuon_mass", '''
float mu1_pt = Muon_pt[0];
float mu2_pt = Muon_pt[1];
float mu1_eta = Muon_eta[0];
float mu2_eta = Muon_eta[1];
float mu1_phi = Muon_phi[0];
float mu2_phi = Muon_phi[1];
return sqrt(2*mu1_pt*mu2_pt*(cosh(mu1_eta - mu2_eta) - cos(mu1_phi - mu2_phi)));
''')

# This one is an endpoint (action).
h4 = df_mass.Histo1D(("h3", "mass", 120, 0, 120), "Dimuon_mass")

The above just sets up the calculation (compiling the C++ strings). It runs when you evaluate `h4.Draw`.

In [None]:
h4.Draw()   # <--- This is the line that computes everything.
c1.SetLogy()
c1.Draw()

For more on RDataFrame, see [this tutorial](https://cms-opendata-workshop.github.io/workshop-lesson-root/05-rdataframe/index.html).

<br><br><br><br><br>

## Ways to get data from Uproot

Uproot provides a rather low-level view into a ROOT file, so let's start with terminology.

All of the links below go to [the documentation](https://uproot.readthedocs.io/en/latest/).

<img src="img/terminology.svg" style="width: 800px">

<br><br><br>

### Navigating TDirectories

When you [open](https://uproot.readthedocs.io/en/latest/uproot.reading.open.html) a [TFile](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyFile.html) in Uproot, you actually get a [TDirectory](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html) object.

In [21]:
import numpy as np
import awkward as ak
import uproot

In [1]:
directory = uproot.open("data/nesteddirs.root")
directory

<ReadOnlyDirectory '/' at 0x7fd7086d1e80>

That's because it's the [TDirectory](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html) that shows you all the objects that could be read.

You'll rarely need it, but the [TFile](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyFile.html) itself is accessible from every object.

In [2]:
file = directory.file
file

<ReadOnlyFile 'data/nesteddirs.root' at 0x7fd7247b6670>

In [3]:
file.file_path

'data/nesteddirs.root'

In [4]:
file.root_version

'6.08/04'

In [5]:
file.uuid

UUID('ac63575a-9ca4-11e7-9607-0100007fbeef')

The [TDirectory](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html) acts like a Python [Mapping](https://docs.python.org/3/library/collections.abc.html#collections.abc.Mapping), meaning that it has [keys](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html#keys), [values](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html#values), and [items](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html#items), and you can get any value with `directory[key_name]`.

In [6]:
directory.keys()

['one;1',
 'one/two;1',
 'one/two/tree;1',
 'one/tree;1',
 'three;1',
 'three/tree;1']

In [7]:
directory["one"]

<ReadOnlyDirectory '/one' at 0x7fd7086dd3a0>

In [8]:
directory["one/two/tree"]

<TTree 'tree' (20 branches) at 0x7fd7086dd1c0>

In [9]:
directory["one"]["two"]["tree"]

<TTree 'tree' (20 branches) at 0x7fd7086dd1c0>

In [10]:
directory.values()

[<ReadOnlyDirectory '/one' at 0x7fd7086dd3a0>,
 <ReadOnlyDirectory '/one/two' at 0x7fd7086dd4c0>,
 <TTree 'tree' (20 branches) at 0x7fd7086dd1c0>,
 <TTree 'tree' (3 branches) at 0x7fd7247b69a0>,
 <ReadOnlyDirectory '/three' at 0x7fd7086dd430>,
 <TTree 'tree' (1 branches) at 0x7fd727835880>]

In [11]:
directory.items()

[('one;1', <ReadOnlyDirectory '/one' at 0x7fd7086dd3a0>),
 ('one/two;1', <ReadOnlyDirectory '/one/two' at 0x7fd7086dd4c0>),
 ('one/two/tree;1', <TTree 'tree' (20 branches) at 0x7fd7086dd1c0>),
 ('one/tree;1', <TTree 'tree' (3 branches) at 0x7fd7247b69a0>),
 ('three;1', <ReadOnlyDirectory '/three' at 0x7fd7086dd430>),
 ('three/tree;1', <TTree 'tree' (1 branches) at 0x7fd727835880>)]

Since you'll likely want to find objects by class name without reading them, there's a fourth method: [classnames](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html#classnames).

In [12]:
directory.classnames()

{'one;1': 'TDirectory',
 'one/two;1': 'TDirectory',
 'one/two/tree;1': 'TTree',
 'one/tree;1': 'TTree',
 'three;1': 'TDirectory',
 'three/tree;1': 'TTree'}

See the documentation; there are ways to filter the output. You might need that if you have a file with a lot of histograms in it.

In [13]:
directory.classnames(recursive=False)

{'one;1': 'TDirectory', 'three;1': 'TDirectory'}

In [14]:
directory.keys(filter_classname="TT*")

['one/two/tree;1', 'one/tree;1', 'three/tree;1']

<br><br><br>

### Generic objects

ROOT (probably) has thousands of classes. Uproot does not have specialized code to recognize them all.

However, most objects are readable anyway thanks to the [TStreamerInfo](https://uproot.readthedocs.io/en/latest/uproot.streamers.Model_TStreamerInfo.html) in every ROOT file. Here's one with custom classes that Uproot couldn't possibly know about.

In [15]:
directory = uproot.open("data/icecube-supernovae.root")
directory.classnames()

{'config;1': 'TDirectory',
 'config/analysis;1': 'SN_Analysis_Configuration_t',
 'config/detector;1': 'I3Eval_t',
 'config/run;1': 'SN_File_t',
 'sn_all;1': 'TTree',
 'sn_gps;1': 'TTree',
 'sn_range;1': 'TTree',
 'sn_o2rout;1': 'TTree',
 'sn_o2cand;1': 'TTree',
 'sn_omwatch;1': 'TTree',
 'sn_sigsim;1': 'TTree'}

The classes `SN_Analysis_Configuration_t`, `I3Eval_t`, `SN_File_t` were generated from the [TStreamerInfo](https://uproot.readthedocs.io/en/latest/uproot.streamers.Model_TStreamerInfo.html).

In [16]:
directory.streamer_of("config/detector")

<TStreamerInfo for I3Eval_t version 7 at 0x7fd708305610>

In [17]:
directory.file.show_streamers("I3Eval_t")

I3Eval_t::ChannelContainer_t (v1)

Sni3DataArray (v1)

TObject (v1)
    fUniqueID: unsigned int (TStreamerBasicType)
    fBits: unsigned int (TStreamerBasicType)

I3Eval_t (v7): TObject (v1)
    theDataArray: Sni3DataArray* (TStreamerObjectAnyPointer)
    NumberOfChannels: int (TStreamerBasicType)
    NoAvailableSlices: int (TStreamerBasicType)
    AvailableDataSize: int (TStreamerBasicType)
    mGPSCardId: int (TStreamerBasicType)
    mGPSPrescale: int (TStreamerBasicType)
    mGPSEventNo: int (TStreamerBasicType)
    mScalerCardId: int (TStreamerBasicType)
    mScalerStartChannel: int (TStreamerBasicType)
    StartUTC: long (TStreamerBasicType)
    MaxChannels: int (TStreamerBasicType)
    mMaxJitterLogs: int (TStreamerBasicType)
    Channel: I3Eval_t::ChannelContainer_t* (TStreamerObjectAnyPointer)
    ChannelIDMap: map<long,int> (TStreamerSTL)
    BadChannelIDSet: set<long> (TStreamerSTL)
    ChannelID: long* (TStreamerBasicPointer)
    Deadtime: double* (TStreamerBasicPointer)
   

You can read these objects, but they have no specialized methods and all members have to be accessed through [has_member](https://uproot.readthedocs.io/en/latest/uproot.model.Model.html#has-member)/[member](https://uproot.readthedocs.io/en/latest/uproot.model.Model.html#member)/[all_members](https://uproot.readthedocs.io/en/latest/uproot.model.Model.html#all-members).

In [30]:
directory["config/detector"]

<I3Eval_t (version 7) at 0x7fd7247c3a60>

In [18]:
directory["config/detector"].all_members

{'@fUniqueID': 0,
 '@fBits': 50331648,
 'theDataArray': <Sni3DataArray (version 1) at 0x7fd70828b580>,
 'NumberOfChannels': 5160,
 'NoAvailableSlices': -1,
 'AvailableDataSize': 0,
 'mGPSCardId': 0,
 'mGPSPrescale': 20000000,
 'mGPSEventNo': 92824,
 'mScalerCardId': 0,
 'mScalerStartChannel': 0,
 'StartUTC': 272924620173109013,
 'MaxChannels': 5160,
 'mMaxJitterLogs': 20,
 'Channel': <I3Eval_t::ChannelContainer_t (version 1) at 0x7fd70828b790>,
 'ChannelIDMap': <STLMap {46612627560: 896, ..., 281410180683757: 2689} at 0x7fd70828b7f0>,
 'BadChannelIDSet': <STLSet {58348614635591, 60068372029697, ..., 258905191174588} at 0x7fd70828ba00>,
 'ChannelID': array([ 47303335284587,  20579555797555, 106634453247646, ...,
        255380957221937, 107432791511293, 280205879548048]),
 'Deadtime': array([250., 250., 250., ..., 250., 250., 250.]),
 'Efficiency': array([1.  , 1.  , 1.  , ..., 1.35, 1.35, 1.35])}

In [29]:
directory["config/detector"].member("ChannelIDMap")

<STLMap {46612627560: 896, ..., 281410180683757: 2689} at 0x7fd70828b7f0>

If a class has "Unknown" in its name or `isinstance(obj, (uproot.model.UnknownClass, uproot.model.UnknownClassVersion)`, that means that it could not be read.

(I don't know of any examples of that at the moment.)

<br><br><br>

### Histograms and graphs

Other classes have specialized interfaces, like histograms and some graphs. You can view the [axis](https://uproot.readthedocs.io/en/latest/uproot.behaviors.TH1.TH1.html#axis) [edges](https://uproot.readthedocs.io/en/latest/uproot.behaviors.TAxis.TAxis.html#edges) and the [values](https://uproot.readthedocs.io/en/latest/uproot.behaviors.TH1.TH1.html#values), but this interface is minimal.

Normally, you'd convert

   * [to_numpy](https://uproot.readthedocs.io/en/latest/uproot.behaviors.TH1.TH1.html#to-numpy): tuple of arrays (values and edges)
   * [to_boost](https://uproot.readthedocs.io/en/latest/uproot.behaviors.TH1.TH1.html#to-boost): `boost_histogram` object
   * [to_hist](https://uproot.readthedocs.io/en/latest/uproot.behaviors.TH1.TH1.html#to-hist): `hist` object (more fully featured subclass of `boost_histogram`)

In [34]:
directory = uproot.open("data/hepdata-example.root")
directory.classnames()

{'hpx;1': 'TH1F',
 'hpxpy;1': 'TH2F',
 'hprof;1': 'TProfile',
 'ntuple;1': 'TNtuple'}

In [39]:
directory["hpx"].to_numpy()

(array([2.000e+00, 3.000e+00, 1.000e+00, 1.000e+00, 2.000e+00, 4.000e+00,
        6.000e+00, 1.200e+01, 8.000e+00, 9.000e+00, 1.500e+01, 1.500e+01,
        3.100e+01, 3.500e+01, 4.000e+01, 6.400e+01, 6.400e+01, 8.100e+01,
        1.080e+02, 1.240e+02, 1.560e+02, 1.650e+02, 2.090e+02, 2.620e+02,
        2.970e+02, 3.920e+02, 4.320e+02, 4.660e+02, 5.210e+02, 6.040e+02,
        6.570e+02, 7.880e+02, 9.030e+02, 1.079e+03, 1.135e+03, 1.160e+03,
        1.383e+03, 1.458e+03, 1.612e+03, 1.770e+03, 1.868e+03, 1.861e+03,
        1.946e+03, 2.114e+03, 2.175e+03, 2.207e+03, 2.273e+03, 2.276e+03,
        2.329e+03, 2.325e+03, 2.381e+03, 2.417e+03, 2.364e+03, 2.284e+03,
        2.188e+03, 2.164e+03, 2.130e+03, 1.940e+03, 1.859e+03, 1.763e+03,
        1.700e+03, 1.611e+03, 1.459e+03, 1.390e+03, 1.237e+03, 1.083e+03,
        1.046e+03, 8.880e+02, 7.520e+02, 7.420e+02, 6.730e+02, 5.550e+02,
        5.330e+02, 3.660e+02, 3.780e+02, 2.720e+02, 2.560e+02, 2.000e+02,
        1.740e+02, 1.320e+02, 1.180e+0

In [35]:
directory["hpx"].to_hist()

In [36]:
directory["hpxpy"].to_hist()

In [37]:
directory["hprof"].to_hist()

<br><br><br>

### TTree data

That's what you're here for, most likely.