# The optimised computation library of RooFit
Author: Stephan Hageboeck

Here, we test the impact of using a computation library that is optimised for different CPU models.
When RooFit is loaded for the first time, the CPU capabilities are inspected, and a computation library targeted for that CPU is loaded.

We visualise this with a little hack here:

In [1]:
gDebug = 1;

With ROOT's `gDebug` set to 1, ROOT prints things that happen behind the scenes.Let's now use a RooFit class, which triggers the loading of the RooFit libraries.

Note that ROOT loads a library with a suffix such as `_AVX2.so` (depending on the CPU where this runs). 

We turn `gDebug` off, and now go on to run some fits.

In [2]:
RooRealVar __var;
gDebug = 0;


[1mRooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby[0m 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

In roofitcore/InitUtils.cxx:loadComputeLibrary(): Library libRooBatchCompute_AVX2 was loaded successfully


Info in <TUnixSystem::Load>: loaded library /cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Thu/x86_64-centos7-gcc8-opt/lib/libRooBatchCompute_AVX2.so, status 0
Info in <TUnixSystem::Load>: loaded library /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev3/Thu/ROOT/HEAD/x86_64-centos7-gcc8-opt/lib/libRooFitCore.so, status 1


# Let's run a speed test
Let's test a fit of the form
\[
\mathrm{Model}\left( x\, |\, \mu, \sigma, a_1, \ldots, a_5 \right) = f \cdot \mathrm{Gauss}(x\, |\, \mu, \sigma) + (1 - f) \cdot \mathrm{Bernstein}_5(x\, |\, a_1, ..., a_5)
\]

### Step 1: Create a background model (Bernstein polynomials)

In [3]:
RooRealVar  x("x", "x", -100, 50);
RooRealVar a1("a1", "a1", 0.8, 0.6, 1.2);
RooRealVar a2("a2", "a2", 0.0, -1.0, 1.0);
RooRealVar a3("a3", "a3", 0.09, 0.05, 0.4);
RooRealVar a4("a4", "a4", 0.0, 0.2, 0.8);
RooRealVar a5("a5", "a5", 0.09, 0.05, 0.5);
a4.setConstant();
a3.setConstant();
a2.setConstant();

RooBernstein bernstein("bernstein", "bernstein PDF 5 coefficients", x, RooArgSet(a1, a2, a3, a4, a5));

### Step 2: Create a signal model (Gaussian)
Note how newer RooFit versions complain that negative `sigma` don't make sense. We can still run the fit, though. The fitter has ways to recover from undefined regions.

In [4]:
RooRealVar mean("mean", "mean", 10, -100, 50);
RooRealVar sigma("sigma", "sigma", 5, -10, 50);
RooGaussian gauss("gauss", "Gaussian signal model", x, mean, sigma);



### Step 3: Create a sum model

In [5]:
RooRealVar fracSig("sigFrac", "Number of signal events", 0.1, 0, 1);
RooAddPdf model("model", "Gauss signal + Bernstein background", RooArgSet(gauss, bernstein), RooArgSet(fracSig));

We save the initial parameters of the model, so we can reuse them later

In [6]:
RooArgSet parameters(a1, a2, a3, a4, a5, mean, sigma, fracSig);
RooArgSet* initialParameters = parameters.snapshot();

# Generate data and fit model
We create a dataset by sampling events from the model

In [7]:
auto dataSet = model.generate(x, 1000000);

# Let's run some fits

### 1. Run fit with classic RooFit computation
We first load the initial parameters (so that all fits start from the same point), and then we fit the model with 2 cores (SWAN doesn't provide more).
Note that the fitter often chooses a parameter point with negative sigma, where the model is undefined. It prints a message, but it recovers and finishes the fit.

After the fit, we print the parameters and the execution time of the cell.

In [None]:
%%time
parameters = *initialParameters;
model.fitTo(*dataSet, RooFit::PrintLevel(-1),
            RooFit::NumCPU(2));
parameters.Print("V");

### 2. Run same fit with fast batch evaluation
Using `BatchMode(true)`, we switch to the faster batch computation interface.
It has completely redesigned memory access patterns, and uses optimised math functions with
vectorisation, and thus can process events much faster.

Compare the post-fit parameters and execution time with the above.
The parameters are virtually identical, but the fit is *a lot* faster.

In [None]:
%%time
parameters = *initialParameters;
model.fitTo(*dataSet, RooFit::PrintLevel(-1),
            RooFit::NumCPU(2),
            RooFit::BatchMode(true));
parameters.Print("V");

### 3. Repeat above comparisons with only one CPU

In [None]:
%%time
parameters = *initialParameters;
model.fitTo(*dataSet, RooFit::PrintLevel(-1));

In [None]:
%%time
parameters = *initialParameters;
model.fitTo(*dataSet, RooFit::PrintLevel(-1),
            RooFit::BatchMode(true));

## Plot post-fit model
Let's use jsroot to compare data and post-fit model.
Note that one should always plot the data first, so RooFit adjusts the normalisation of the PDF to match the data.

If you are viewing this notebook without a running kernel, the live display won't be available.

In [None]:
%jsroot on

TCanvas canv;
auto frame = x.frame();

dataSet->plotOn(frame);
model.plotOn(frame);
model.plotOn(frame, RooFit::Components("gauss"), RooFit::LineColor(kRed), RooFit::Name("signalOnly"));
frame->Draw();

// Let's build a legend.
// To do that, we first change the auto-generated title of the signalOnly curve:
dynamic_cast<RooCurve*>(frame->findObject("signalOnly"))->SetTitle("Signal Model");

auto leg = frame->BuildLegend();
leg->Draw();

canv.Draw();