# Simple template fitting using RooFit

<div class="alert alert-block alert-danger">
    <b>Note for contributors:</b> Remember to run <code>Kernel > Restart & Clear output</code> before adding any changes to git!</div>

In this tutorial we will perform template fits with `RooFit`. It's a statistic module build on top of (and included in) ROOT. For further information see

* [The RooFit section in the ROOT manual](https://root.cern/manual/roofit/) (also contains further links)
* [The RooFit example tutorials](https://root.cern/doc/v624/group__tutorial__roofit.html) - This is most of the time the first ressource to look at (make sure to select the right ROOT version on top)
    * e.g. relevant for this tutorial: https://root.cern/doc/v624/rf201__composite_8py.html
* [This presentation from 2008](https://root.cern/download/roofit-strasbourg-v10.pdf) - The syntax might be a partially outdated, but it's the quickest way to learn about the underlying concepts

<div class="alert alert-block alert-warning">
    <b>Note:</b> This tutorial is for <b>ROOT version 6.24</b>. Starting from ROOT version 6.26 the interface to RooFit became more pythonic.<br>
    Have a look at <a href='004_v2_template_fits_roofit_root626.ipynb'>004_v2_template_fits_roofit_root626.ipynb</a> for a version of this notebook for ROOT >= 6.26
</div>

In [None]:
import ROOT

from ROOT import (
    RooArgSet, RooArgList, RooRealVar, RooGaussian, RooChebychev, RooAddPdf
)

import numpy as np
import pandas as pd

%matplotlib inline

In [46]:
# Create some test data

data = pd.DataFrame(
    np.concatenate((
        np.random.normal(-2, 1, 1000), 
        np.random.normal(3, 2, 1000),
        -5 + 10* np.random.random_sample(2000)
    )),
    columns=["x"]
)

<div class="alert alert-block alert-info">
    <b>Note:</b> We will fit the <b>distribution</b> of the above data, so note the conceptual difference to the <code>x, y</code> data from the tutorial <code>fitting_curves_data</code>.</div>

<div class="alert alert-block alert-success">
<b>Question [medium]:</b> Can you guess what the corresponding histogram to this data will look like and why?</div>

No? Then cheat and look at the histogram:

In [None]:
data.hist(bins=30)

## First try: Fit single gaussian as signal and line as background

In RooFit we need to to define abstract quantities like observables as objects. So let's define our example observable `x` as a [`RooRealVar`](https://root.cern/doc/master/classRooRealVar.html) that ranges from -5 to 5:

In [None]:
x = RooRealVar("x", "Example observable", -5, 5)

Also, we need to create a `RooDataSet` for the observed data.

For ROOT versions below 6.26 we have to load it from a ROOT TTree.

We can use [`uproot`](https://uproot.readthedocs.io/) to convert back-and-forth between numpy/pandas and ROOT TTrees:

In [None]:
import uproot

In [None]:
with uproot.recreate("data_df.root") as f:
    f["tree"] = data

The other way round works as well:

In [None]:
with uproot.open("data_df.root") as f:
    data_reloaded = f["tree"].arrays(library="pd").set_index("index")

In [None]:
data_reloaded.x.hist(bins=30)

Now we can load the `TTree` into a `RooDataSet`

In [None]:
f = ROOT.TFile.Open("data_df.root")
tree = f.Get("tree")
dataset = ROOT.RooDataSet("data", "data", tree, RooArgSet(x))

The Gaussian model needs 2 further `RooRealVar` instances for it's parameters.

Let's define the mean with a starting value of 0 and a range from -1 to 1 and the width starting with 1  and ranging from 0.1 to 10:

In [None]:
mean = RooRealVar("mean", "Mean of Gaussian", 0, -1, 1)
sigma = RooRealVar("sigma", "Width of Gaussian", 1, 0.1, 10)

We need to passs the observable and the parameters to the constructor of the Gaussian pdf

In [None]:
pdf_sig = RooGaussian("gauss", "Gaussian signal model", x, mean, sigma)

For the Background we use a straight line. Since the normalization is always implicitly done in RooFit we only need 1 parameter for this:

In [None]:
a1 = RooRealVar("a0", "parameter for linear background", 0, -10, 10)
pdf_bkg = RooChebychev("line", "Linear background model", x, a1)

<div class="alert alert-block alert-success">
<b>Question:</b> Why is the model for the straight line called 'Chebychev'?</div>

Now we build a compound PDF from the two simple PDFs using [`RooAddPdf`](https://root.cern/doc/master/classRooAddPdf.html).

<div class="alert alert-block alert-success">
<b>Question:</b> What is a PDF?</div>

We also need parameters for the normalizations of the components. If we pass one parameter, this will be the fraction. If we pass 2 parameters this will be the absolute numbers of fitted events for each pdf. Since this is what we want to extract in the end we go for the 2 parameter form:

In [None]:
n_sig = RooRealVar("nsig", "Number of signal events", 500, 0, 100000)
n_bkg = RooRealVar("nbkg", "Number of background events", 500, 0, 100000)
pdf = RooAddPdf("pdf", "Gaussian Signal + linear Background", RooArgList(pdf_sig, pdf_bkg), RooArgList(n_sig, n_bkg))

Now we're ready to fit:

In [None]:
pdf.fitTo(dataset)

Don't be deterred by the amount of output, but let's look at the results:

In [None]:
pdf.getParameters(RooArgSet(x)).Print("v")

And plot:

In [None]:
frame = x.frame()
dataset.plotOn(frame)
pdf.plotOn(frame)

c = ROOT.TCanvas()
frame.Draw()
c.Draw()

<div class="alert alert-block alert-success">
<b>Question 2 [easy]:</b> Why are the results so terrible?</div>

## Second try: Fit two gaussians as signal

In [None]:
mean1 = RooRealVar("mean1", "Mean of Gaussian 1", -3, -5, 0)
sigma1 = RooRealVar("sigma1", "Width of Gaussian 1", 1, 0.01, 10)
gauss1 = RooGaussian("gauss1", "Gaussian signal 1", x, mean1, sigma1)

mean2 = RooRealVar("mean2", "Mean of Gaussian 2", 3, 0, 5)
sigma2 = RooRealVar("sigma2", "Width of Gaussian 2", 1, 0.01, 10)
gauss2 = RooGaussian("gauss2", "Gaussian signal 2", x, mean2, sigma2)

frac1 = RooRealVar("frac1", "Fraction of gaussian 1 in signal", 0.5, 0, 1)
pdf_sig = RooAddPdf("pdf_sig", "Total Signal", gauss1, gauss2, frac1)

# keep old parameters for this
pdf = RooAddPdf("pdf", "Gaussian Signal + linear Background", RooArgList(pdf_sig, pdf_bkg), RooArgList(n_sig, n_bkg))

In [None]:
pdf.fitTo(dataset)

In [None]:
pdf.getParameters(ROOT.RooArgSet(x)).Print("v")

In [None]:
from ROOT.RooFit import Components, LineColor, LineStyle, Name

In [None]:
frame = x.frame()

dataset.plotOn(frame)
pdf.plotOn(frame, Name("fit"))

# one has to keep a reference to these RooArgSet objects explicitly
# there seems to be a bug that these objects may be deleted by python when ROOT still want's to use them
# see https://root-forum.cern.ch/t/createintegral-gives-unexpected-result/32627 for a similar problem
# should be fixed in ROOT 6.26
ras_gauss1 = RooArgSet(gauss1)
ras_gauss2 = RooArgSet(gauss2)
ras_pdf_bkg = RooArgSet(pdf_bkg)

pdf.plotOn(frame, Components(ras_gauss1), LineColor(ROOT.kRed), Name("g1"))
pdf.plotOn(frame, Components(ras_gauss2), LineColor(ROOT.kRed+1), Name("g2"))
pdf.plotOn(frame, Components(ras_pdf_bkg), LineStyle(ROOT.kDashed), Name("bkg"))

c = ROOT.TCanvas()

frame.Draw()

legend = ROOT.TLegend(0.9, 1, 0.9, 1)
legend.AddEntry("fit", "Fit")
legend.AddEntry("g1", "Gaussian component")
legend.AddEntry("g2", "Gaussian component")
legend.AddEntry("bkg", "Background")
legend.SetLineWidth(0)
legend.Draw()

c.Draw()

In [None]:
dataset.Print("v")

## Exercise 1

<div class="alert alert-block alert-success">
<b>Exercise 1 [easy]:</b> Fit one gaussian for signal and a linear background model to the following dataset:
</div>

In [None]:
data2 = pd.DataFrame(
    np.concatenate((
        np.random.normal(-2, 1, 1000), 
        -5 + 10* np.random.random_sample(2000)
    )),
    columns=["x"]
)

## Fixing templates from MC

In the previous examples, we simply "knew" that our signal was shaped like a (two) Gaussian(s) and the background was linear.

Usually however, the situation isn't as simple and we first have to learn how our signal and background looks like by looking at MC data. Remember that in MC we always know signal from background (it's simulated data after all).

Thus, we can first fit our signal and background model to the MC, then fix the parameters. Now we have two 
PDFs $\mathrm{pdf}_\mathrm{sig}$ and $\mathrm{pdf}_\mathrm{bkg}$ and get the signal and background yields by
fitting the data with $\mu_\mathrm{sig}\mathrm{pdf}_\mathrm{sig} + \mu_\mathrm{bkg}\mathrm{pdf}_\mathrm{bkg}$.

Let's create one helper function to create a RooDataset for `x` from a pandas dataframe:

In [None]:
x = ROOT.RooRealVar("x", "Example observable", -5, 5)

In [None]:
from tempfile import NamedTemporaryFile

def to_roodataset(df):
    with NamedTemporaryFile() as f:
        with uproot.recreate(f.name) as uf:
            uf["tree"] = df
        rf = ROOT.TFile.Open(f.name)
        tree = rf.Get("tree")
        rds = ROOT.RooDataSet("", "", tree, RooArgSet(x))
        rds.convertToVectorStore()
        return rds

In [None]:
mc_signal = pd.DataFrame(
    np.random.normal(-2, 1, 1000),
    columns=["x"]
)

mc_bkg = pd.DataFrame(
    np.concatenate((
        np.random.normal(2, 1, 1000), 
        -5 + 10* np.random.random_sample(2000)
    )),
    columns=["x"]
)

data = pd.DataFrame(
    np.concatenate((
        np.random.normal(2, 1, int(1.2*1000)), 
        -5 + 10* np.random.random_sample(int(1.2*2000)),
        np.random.normal(-2, 1, int(0.3*1000))
    )),
    columns=["x"]
)

mc_signal_rds = to_roodataset(mc_signal)
mc_bkg_rds = to_roodataset(mc_bkg)
data_rds = to_roodataset(data)

In [None]:
mc_signal.hist(bins=30)

In [None]:
mc_bkg.hist(bins=30)

In [None]:
mean = ROOT.RooRealVar("mean", "Mean of Gaussian", -2, -5, 0)
sigma = ROOT.RooRealVar("sigma1", "Width of Gaussian", 1, 0.01, 10)
pdf_sig = ROOT.RooGaussian("gauss", "Gaussian signal", x, mean, sigma)

In [None]:
pdf_sig.fitTo(mc_signal_rds)

In [None]:
frame = x.frame()
mc_signal_rds.plotOn(frame)
pdf_sig.plotOn(frame)

c = ROOT.TCanvas()
frame.Draw()
c.Draw()

In [None]:
for par in pdf_sig.getParameters(x):
    par.setConstant(True)

In [None]:
pdf_sig.getParameters(x).Print("v")

In [None]:
mean_bkg = ROOT.RooRealVar("mean_bkg", "Mean of Gaussian background component", 2, 0, 5)
sigma_bkg = ROOT.RooRealVar("sigma_bkg", "Width of Gaussian background component", 1, 0.1, 10)
gauss_bkg = ROOT.RooGaussian("gauss_bkg", "Gaussian background component", x, mean_bkg, sigma_bkg)
a0 = ROOT.RooRealVar("a0", "Parameter for linear background component", 0, -10, 10)
line = ROOT.RooChebychev("line", "Linear background compontent", x, a0)
frac = frac1 = ROOT.RooRealVar("frac", "Fraction of gaussian in background", 0.5, 0, 1)
pdf_bkg = ROOT.RooAddPdf("pdf_bkg", "Background", gauss_bkg, line, frac)

In [None]:
pdf_bkg.fitTo(mc_bkg_rds)

In [None]:
frame = x.frame()
mc_bkg_rds.plotOn(frame)
pdf_bkg.plotOn(frame)

c = ROOT.TCanvas()
frame.Draw()
c.Draw()

In [None]:
for par in pdf_bkg.getParameters(x):
    par.setConstant(True)

In [None]:
pdf_bkg.getParameters(x).Print("v")

In [None]:
nsig = ROOT.RooRealVar("nsig", "Number of signal events", 1000, 0, 10000)
nbkg = ROOT.RooRealVar("nbkg", "Number of background events", 1000, 0, 10000)
pdf = ROOT.RooAddPdf("pdf", "Signal + background", RooArgList(pdf_sig, pdf_bkg), RooArgList(nsig, nbkg))

In [None]:
pdf.fitTo(data_rds)

In [None]:
frame = x.frame()
data_rds.plotOn(frame)
pdf.plotOn(frame)

c = ROOT.TCanvas()
frame.Draw()
c.Draw()

In [None]:
pdf.getParameters(x).Print("v")

In [None]:
pdf_bkg.getParameters(x).Print("v")

In [None]:
pdf_sig.getParameters(x).Print("v")