# Analysis of ROOT trees using RDataFrame

RDataFrame is a modern way to analyze ROOT trees (and more!). The main principle is to express the analysis as a functional chain rather than in a procedural way. Cling compiles this functional chain into a binary expression which it can evaluate. This opens the room for optimizations. Multi-threading comes out-of-the-box.

In [None]:
{
    std::unique_ptr<TFile> branchreader(TFile::Open("./data/rdfexample.root", "READ"));
    auto testtree = static_cast<TTree *>(branchreader->Get("tracktree"));
    testtree->Print();
}

Creating a new data frame for the analysis of the track tree:

In [None]:
ROOT::RDataFrame trackframe("tracktree", "./data/rdfexample.root");
for(const auto &col : trackframe.GetColumnNames()) std::cout << "Found column: " << col << std::endl;

Define branches for pt, eta and phi. Monitor pt.

In [None]:
// your code here
auto makeeta = [](double px, double py, double pz) { TVector3 pvec(px, py, pz); return pvec.Eta(); };
auto makephi = [](double px, double py) { TVector2 pvec2(px, py); return TVector2::Phi_0_2pi(pvec2.Phi()); };
auto trackframenew = trackframe.Define("pt", "TMath::Sqrt(px*px+py*py)").Define("eta", makeeta, {"px", "py", "pz"}).Define("phi", makephi,{"px", "py"});

auto hpt = trackframenew.Histo1D({"hpt", "track pt", 100, 0., 100.}, "pt");
auto plotpt = new TCanvas("plotpt", "Plot Pt", 640, 480);
plotpt->cd();
gPad->SetLogy();
hpt->Draw("ep");
plotpt->Draw();

List Column names again after defining the new branches

In [None]:
// your code here
for(const auto &col : trackframenew.GetColumnNames()) std::cout << "Found column: " << col << std::endl;

Play a bit with the plotting. Monitor different columns. Try also 2D histograms.

In [None]:
// your code here
auto histetaphi = trackframenew.Histo2D({"hEtaPhi", "Eta-phi", 100, -0.9, 0.9, 100, 0., TMath::TwoPi()}, "eta", "phi");
auto plotetaphi = new TCanvas("plotetaphi", "Eta-Phi plot", 640, 480);
plotetaphi->cd();
histetaphi->Draw("colz");
plotetaphi->Draw();

Select tracks with at least 120 TPC clusters and 4 ITS clusters. Draw pt, eta and phi.

In [None]:
// your code here
auto goodtracks = trackframenew.Filter("TPCncls >= 120 && ITSncls >= 2");
auto hgoodpt = goodtracks.Histo1D({"hPtGood", "Pt good tracks", 100, 0., 100.}, "pt");
auto hgoodetaphi = goodtracks.Histo2D({"hEtaPhiGood", "Eta-Phi good tracks", 100, -0.9, 0.9, 100, 0, TMath::TwoPi()}, "eta", "phi");

auto plotgood = new TCanvas("plotgood", "Plot good tracks", 1200, 600);
plotgood->Divide(2,1);
plotgood->cd(1);
gPad->SetLogy();
hgoodpt->Draw("ep");
plotgood->cd(2);
hgoodetaphi->Draw("colz");
plotgood->cd();
plotgood->Draw();

Get the mean value and the standard deviation for pt, number of TPC clusters and number of ITS clusters. Use directly the functionality of the RDataFrame, don't create histograms first.

In [None]:
// your code here
auto   meanpt = trackframenew.Mean("pt"), 
       sigpt = trackframenew.StdDev("pt"), 
       meanTPCncls = trackframenew.Mean("TPCncls"),
       sigTPCncls = trackframenew.StdDev("TPCncls"),
       meanITSncls = trackframenew.Mean("ITSncls"),
       sigITSncls = trackframenew.StdDev("ITSncls");

std::cout << "Pt: mean " << *meanpt << ", sigma " << *sigpt << std::endl;
std::cout << "TPC ncls: mean " << *meanTPCncls << ", sigma " << *sigTPCncls << std::endl;
std::cout << "ITS ncls: mean " << *meanITSncls << ", sigma " << *sigITSncls << std::endl;

Create a new Tree including the new columns plus the number of ITS and TPC clusters, but without px, py and pz, and write it to your cernbox.

In [None]:
// your code here
trackframenew.Snapshot("tracktreenew", "/eos/user/m/mfasel/tracktreenew.root")