# Making your analysis more efficient with ROOT: A basic introductory course

The ROOT dataframe tutorial shows you how to analyze datasets using `ROOT::RDataFrame`. The example analysis performs the following steps:

1. Connect a ROOT dataframe to a dataset containing 66 mio. events recorded by CMS in 2012
2. Filter the events being relevant for your analysis
3. Compute the invariant mass of the selected dimuon candidates
4. Plot the invariant mass spectrum showing resonances up to the Z mass

Specific questions, which will guide you and improve your understanding of the physics and technology, **are marked bold.**


## Create a ROOT dataframe

The following ROOT dataframe is connected to a dataset named `Events` in a ROOT file. The file is not placed locally but pulled in via [XRootD](http://xrootd.org/) from a remote server [here](http://opendata.web.cern.ch/record/12341).

The dataset `Events` is a `TTree` (a "table" in first order) and has following branches (also refered to as "columns"):

| Branch name | Data type | Description |
|-------------|-----------|-------------|
| `nMuon` | `unsigned int` | Number of muons in this event |
| `Muon_pt` | `float[nMuon]` | Transverse momentum of the muons stored as an array of size `nMuon` |
| `Muon_eta` | `float[nMuon]` | Pseudo-rapidity of the muons stored as an array of size `nMuon` |
| `Muon_phi` | `float[nMuon]` | Azimuth of the muons stored as an array of size `nMuon` |
| `Muon_charge` | `int[nMuon]` | Charge of the muons stored as an array of size `nMuon` and either -1 or 1 |
| `Muon_mass` | `float[nMuon]` | Mass of the muons stored as an array of size `nMuon` |

In [1]:
import ROOT
df = ROOT.RDataFrame("Events", "root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root")

Welcome to JupyROOT 6.18/04


## Run only on a part of the dataset

The full dataset contains half a year of CMS data taking in 2012 with 66 mio events. For the purpose of this example, we use the `Range` node to run only on a small part of the dataset. This feature also comes in handy in the development phase of your analysis.

**Feel free to experiment with this parameter! How much data do you need to see all resonances from the [eta meson](https://en.wikipedia.org/wiki/Eta_meson) up to the [Z boson](https://en.wikipedia.org/wiki/W_and_Z_bosons)?**

In [2]:
df_range = df.Range(10000)

## Filter relevant events for this analysis

Physics datasets are often general purpose datasets and therefore need excessive filtering of the events for the actual analysis. Here, we implement only a simple selection based on the number of muons and the charge to cut down the dataset on events, which are relevant for our study.

**Fill in the correct expressions by replacing the `<...>` parts to select ...**

1. Events with exactly two muons
2. Events with muons of opposite charge

See the table above for the column names and the data types.

In [3]:
df_2mu = df_range.Filter("nMuon == <....>", "Events with exactly two muons")
df_os = df_2mu.Filter("Muon_charge[0] <....> Muon_charge[1]", "Muons with opposite charge")

TypeError: can not resolve method template call for 'Filter'

## What are the cuts doing?

To find out how many events your cuts are throwing away, we can book another endpoint of the graph reporting us the efficiency of the cuts.

In [4]:
report = df_os.Report()

NameError: name 'df_os' is not defined

## Compute the invariant mass of the dimuon system

Since we want to see the resonances in the mass spectrum, where dimuon events are more likely, we need to compute the invariant mass from the four-vectors of the muon candidates. ROOT provides a [helper](https://root.cern/doc/master/namespaceROOT_1_1VecOps.html) called `InvariantMass` which does this operation for you.

In [5]:
df_mass = df_os.Define("Dimuon_mass", "InvariantMass(Muon_pt, Muon_eta, Muon_phi, Muon_mass)")

NameError: name 'df_os' is not defined

## Make a histogram of the dimuon spectrum

As (almost) always in physics, we have a look at the results in the form of a histogram. Let's book a histogram as one endpoint of our computation graph.

**Where do you expect resonances in the dimuon spectrum? Adjust the plotting range accordingly! Note that the numbers refer to GeV.**

In [6]:
nbins = 100 # Number of bins
low = 100 # Lower edge of the histogram
up = 300 # Upper edge of the histogram

hist = df_mass.Histo1D(("Dimuon_mass", "Dimuon_mass", nbins, low, up), "Dimuon_mass")

NameError: name 'df_mass' is not defined

## Create a plot of the dimuon spectrum

Now, the computation graph is set up. Next, we want to set up a plot.

Note that ROOT dataframe runs over the data just in the moment you need the actual result to reduce the runtime as much as possible, which is the reason this cell can take some time. The notebook magic `%%time` measures the time spend in the full cell.

In [7]:
%%time

# Set up the canvas
ROOT.gStyle.SetTextFont(42)
c = ROOT.TCanvas("c", "", 800, 700)
c.SetLogx(); c.SetLogy()

# Add the histogram
hist.SetTitle("")
hist.SetStats(False)
hist.GetXaxis().SetTitle("m_{#mu#mu} (GeV)"); hist.GetXaxis().SetTitleSize(0.04)
hist.GetYaxis().SetTitle("N_{Events}"); hist.GetYaxis().SetTitleSize(0.04)
hist.Draw()

# Add labels
label = ROOT.TLatex(); label.SetNDC(True); label.SetTextSize(0.040)
label.DrawLatex(0.100, 0.920, "#bf{CMS Open Data}")
label.DrawLatex(0.760, 0.920, "#sqrt{s} = 8 TeV");

NameError: name 'hist' is not defined

## Look at the plot interactively

ROOT provides for the notebooks a JavaScript front-end for drawing the canvas. Click and drag on the axis to zoom in and double click to reset view.

Don't forget that you can improve the statistics by increasing the number of events given to `Range`.

In [8]:
%jsroot on
c.Draw()

## Inspect the cut-flow

As the last study, we have a look at the efficiency of the placed cuts.

In [40]:
report.Print()

Events with exactly two muons: pass=2466573    all=5000000    -- eff=49.33 % cumulative eff=49.33 %
Muons with opposite charge: pass=1875947    all=2466573    -- eff=76.05 % cumulative eff=37.52 %
