# ROOT dataframe tutorial: Dimuon spectrum

The ROOT dataframe tutorial shows you how to analyze datasets using `RDataFrame`. The example analysis performs the following steps:

1. Connect a ROOT dataframe to a dataset containing 66 mio. events recorded by CMS in 2012
2. Filter the events being relevant for your analysis
3. Compute the invariant mass of the selected dimuon candidates
4. Plot the invariant mass spectrum showing resonances up to the Z mass

The notebook runs out-of-the-box. However, you are encouraged to tweak the code to see the effect on the result! 

Specific questions, which will improve your understanding of the technology, **are marked bold.**


## Using Python with advanced features

In the stage 1 of the tutorial, we have seen how you can inject C++ code in your computational expensive parts of the event loop. But this maybe not performant enough for your analysis. Therefore, you can learn in this part of the tutorial, how you can make your analysis toolchain even more efficient with precompiled C++ code and the C++ interpreter `cling`.

In [None]:
import ROOT

## Create the ROOT dataframe and filter the dataset

This code is unchanged from the previous stage. **Fill in the missing pieces!**

In [None]:
files = ROOT.std.vector("string")()
files.push_back("root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root")
files.push_back("root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012C_DoubleMuParked.root")
df = ROOT.RDataFrame("Events", files)

In [None]:
df_2mu = df.Filter("do something with nMuon", "Events with exactly two muons")
df_os = df_2mu.Filter("do something with Muon_charge", "Muons with opposite charge")

## Integrate optimized C++ code in the analysis

In the following, we want to include optimized C++ code and include the functions there in the event loop.

The file `AnalysisCode.hxx` implements the function `compute_mass`. In the cell below, you can have a look at the source code.

In [1]:
%%bash
cat AnalysisCode.hxx

#include "Math/Vector4Dfwd.h"
#include "ROOT/RVec.hxx"

// Use O3 optimization level for just-in-time compilation
#pragma cling optimize(3)

using Vec_t = const ROOT::VecOps::RVec<float>&;
float compute_mass(Vec_t pt, Vec_t eta, Vec_t phi, Vec_t mass) {
    ROOT::Math::PtEtaPhiMVector p1(pt[0], eta[0], phi[0], mass[0]);
    ROOT::Math::PtEtaPhiMVector p2(pt[1], eta[1], phi[1], mass[1]);
    return (p1 + p2).mass();
}

Next we include the implemented function in the ROOT environment using cling.

**What is `#pragma cling optimize(3)` doing and why is it interesting?**

Note that you could also compile a shared library and load it via cling.

In [None]:
ROOT.gInterpreter.Declare('#include "AnalysisCode.hxx"')

Now, you can call the C++ code in the event loop.

In [None]:
df_mass = df_os.Define("Dimuon_mass", "compute_mass(Muon_pt, Muon_eta, Muon_phi, Muon_mass)")

## Make a histogram and draw the result

The following code is unchanged to the previous stage of the tutorial.

**Can you find the reason why the time spend in the event loop may not decrease in this example though the code is now more optimized?**

In [None]:
df_range = df_mass.Range(100000)

In [None]:
nbins = 30000
low = 100
up = 300
h = df_range.Histo1D(("Dimuon_mass", "Dimuon_mass", nbins, low, up), "Dimuon_mass")

In [None]:
report = df_range.Report()

In [None]:
%%time
ROOT.gStyle.SetOptStat(0); ROOT.gStyle.SetTextFont(42)
c = ROOT.TCanvas("c", "", 800, 700)
c.SetLogx(); c.SetLogy()
h.SetTitle("")
h.GetXaxis().SetTitle("m_{#mu#mu} (GeV)"); h.GetXaxis().SetTitleSize(0.04)
h.GetYaxis().SetTitle("N_{Events}"); h.GetYaxis().SetTitleSize(0.04)
h.Draw()

label = ROOT.TLatex(); label.SetNDC(True)
label.SetTextSize(0.040); label.DrawLatex(0.100, 0.920, "#bf{CMS Open Data}")
label.SetTextSize(0.030); label.DrawLatex(0.630, 0.920, "#sqrt{s} = 8 TeV, L_{int} = 11.6 fb^{-1}");

In [None]:
%jsroot on
c.Draw()

In [None]:
report.Print()

## Additional tasks

Feel free to adapt the library and add additional functionality. For example, try to replace the jitted filters with precompiled functions from the library.