# RDataFrame basics: exercise solution

## Task description

There is a ROOT file accessible as `http://root.cern/files/teaching/Run2012BC_DoubleMuParked_Muons_reduced20x.root` which contains a TTree dataset named `Events`. It contains a slimmed down version of a real CMS OpenData dataset with muon candidates from the LHC data taking of 2012.

The original full dataset is accessible at [DOI: 10.7483/OPENDATA.CMS.YLIC.86ZZ](http://opendata.cern.ch/record/6004) and [DOI: 10.7483/OPENDATA.CMS.M5AD.Y3V3](http://opendata.cern.ch/record/6030).

After opening the dataset with `RDataFrame`:

1. Select events that contain exactly 2 muons.
2. Select events for which the 2 muons have opposite charge.
3. Define a new quantity representing the invariant mass of the muons. There is a useful `RVec` helper function that can help at this step, [`InvariantMass(pt_vec, eta_vec, phi_vec, mass_vec)`](https://root.cern.ch/doc/master/group__vecops.html#ga2c531eae910edad48bbf7319cc6d7e58).
4. Fill a histogram with the invariant mass of the dimuon system and plot the resulting distribution. Tip: the plot will look better using logarithmic scales for the axes.

What resonances do you recognize?

**Note**: if reading the dataset via HTTP takes too long, consider downloading it using a command line tool such as `wget` or `curl` and then read it from your local disk. The file is roughly 100MB in size.

## Solution

In [18]:
import ROOT

treename = "Events"
filename = "http://root.cern/files/teaching/Run2012BC_DoubleMuParked_Muons_reduced20x.root"

In [19]:
# Let's take a quick peek into the dataset
ROOT.DisableImplicitMT() # this makes the cell re-executable: Display is not available if multi-threading is enabled
ROOT.RDataFrame(treename, filename).Display().Print()

+-----+-------------+------------+-----------+-------------+----------+-------+
| Row | Muon_charge | Muon_eta   | Muon_mass | Muon_phi    | Muon_pt  | nMuon | 
+-----+-------------+------------+-----------+-------------+----------+-------+
| 0   | -1          | 1.06683f   | 0.105658f | -0.0342727f | 10.7637f | 2     | 
|     | -1          | -0.563787f | 0.105658f | 2.54262f    | 15.7365f |       | 
+-----+-------------+------------+-----------+-------------+----------+-------+
| 1   | -1          | 2.13750f   | 0.105658f | -2.68163f   | 3.43733f | 2     | 
|     | 1           | 0.938457f  | 0.105658f | -3.03828f   | 12.7959f |       | 
+-----+-------------+------------+-----------+-------------+----------+-------+
| 2   | -1          | -1.38087f  | 0.105658f | 2.61392f    | 13.7815f | 2     | 
|     | 1           | 0.559747f  | 0.105658f | -1.75092f   | 13.6462f |       | 
+-----+-------------+------------+-----------+-------------+----------+-------+
| 3   | -1          | -1.05627f  

In [20]:
# Construct multi-thread RDF for the actual processing
ROOT.EnableImplicitMT()
df = ROOT.RDataFrame("Events", "http://root.cern/files/teaching/Run2012BC_DoubleMuParked_Muons_reduced20x.root")

# Select only events with exactly two muons
df = df.Filter("nMuon == 2")
# Require opposite charge
df = df.Filter("Muon_charge[0] != Muon_charge[1]")

# Define invariant mass of the dimuon system
df = df.Define("Dimuon_mass", "InvariantMass(Muon_pt, Muon_eta, Muon_phi, Muon_mass)")

# Request histogram of dimuon mass spectrum. Note how we set titles and axis labels in one go
h = df.Histo1D(("Dimuon_mass", "Dimuon mass;m_{#mu#mu} (GeV);N_{Events}", 30000, 0.25, 300), "Dimuon_mass")

In [21]:
%jsroot on
# A quick plot of the histogram
c = ROOT.TCanvas()
h.Draw() # The event loop runs here! This cell will take a few seconds to execute.
# Switch to logarithmic axes
c.SetLogx()
c.SetLogy()
c.Draw()

Ta-dah! See our [full tutorial](https://root.cern.ch/doc/master/df102__NanoAODDimuonAnalysis_8py.html) for a more detailed plot that labels the various resonances.