# ROOT dataframe tutorial: Stage 2

The ROOT dataframe tutorial shows you how to analyze datasets using `RDataFrame`. The example analysis performs the following steps:

1. Connect a ROOT dataframe to a dataset containing 66 mio. events recorded by CMS in 2012
2. Filter the events being relevant for your analysis
3. Compute the invariant mass of the selected dimuon candidates
4. Plot the invariant mass spectrum showing resonances up to the Z mass

The notebook runs out-of-the-box. However, you are encouraged to tweak the code to see the effect on the result! 

Specific questions, which will improve your understanding of the technology, **are marked bold.**

## Outline

The full tutorial consists of three stages and shows you how to use ROOT dataframes ...

0. ... in C++
1. ... in Python (beginner)
2. ... in Python (advanced)

## Stage 2

In the stage 1 of the tutorial, we have seen how you can inject C++ code in your computational expensive parts of the event loop. But this maybe not performant enough for your analysis. Therefore, you can learn in this part of the tutorial, how you can make your analysis toolchain even more efficient with precompiled C++ code and the C++ interpreter `cling`.

In [1]:
import ROOT

Welcome to JupyROOT 6.17/01


## Create the ROOT dataframe and filter the dataset

This code is unchanged from the previous part

In [2]:
files = ROOT.std.vector("string")()
files.push_back("root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root")
files.push_back("root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012C_DoubleMuParked.root")
df = ROOT.RDataFrame("Events", files)

In [3]:
df_2mu = df.Filter("nMuon == 2", "Events with exactly two muons")
df_os = df_2mu.Filter("Muon_charge[0] != Muon_charge[1]", "Muons with opposite charge")

## Integrate optimized C++ code in the analysis

In the following, we want to write optimized C++ code in an external library `libAnalysisCode.so` and include the functions there in the event loop.

The library is placed in the subdirectory `MyAnalysis`. Go there in a terminal and run the script `build_library.sh`. The library solely implements the function `compute_mass` such as used in the previous stages of the tutorial.

**Why is this more performant than jitting the code with `cling` such as done in stage 1 of the tutorial?**

In [4]:
%%bash
g++ -Wall -fPIC -O3 -shared AnalysisCode.cpp -o libAnalysisCode.so `root-config --cflags --libs`

To use external C++ code in the event loop, you need to introduce the library in `cling`. Each library contains two parts, the header file and the object file `lib*.so`. We register these using the following code and use them directly in the `Define` call.

In [5]:
ROOT.gInterpreter.Declare('#include "AnalysisCode.hxx"')
ROOT.gInterpreter.Load("libAnalysisCode.so")
df_mass = df_os.Define("Dimuon_mass", "compute_mass(Muon_pt, Muon_eta, Muon_phi, Muon_mass)")

## Make a histogram and draw the result

The following code is unchanged to the previous stage of the tutorial.

In [6]:
df_range = df_mass.Range(100000)

In [7]:
h = df_range.Histo1D(("Dimuon_mass", "Dimuon_mass", 30000, 0.25, 300), "Dimuon_mass")

In [8]:
report = df_range.Report()

In [9]:
ROOT.gStyle.SetOptStat(0); ROOT.gStyle.SetTextFont(42)
c = ROOT.TCanvas("c", "", 800, 700)
c.SetLogx(); c.SetLogy()
h.SetTitle("")
h.GetXaxis().SetTitle("m_{#mu#mu} (GeV)"); h.GetXaxis().SetTitleSize(0.04)
h.GetYaxis().SetTitle("N_{Events}"); h.GetYaxis().SetTitleSize(0.04)
h.Draw()

label = ROOT.TLatex(); label.SetNDC(True)
label.SetTextSize(0.040); label.DrawLatex(0.100, 0.920, "#bf{CMS Open Data}")
label.SetTextSize(0.030); label.DrawLatex(0.630, 0.920, "#sqrt{s} = 8 TeV, L_{int} = 11.6 fb^{-1}");

In [10]:
%jsroot on
c.Draw()

In [11]:
report.Print()

Events with exactly two muons: pass=132030     all=269529     -- eff=48.99 % cumulative eff=48.99 %
Muons with opposite charge: pass=100000     all=132030     -- eff=75.74 % cumulative eff=37.10 %
