### Extended Maximum likelihood fit

Notebook showing an example of an extended maximum likelihood fit. In this case we fit a signal plus a background function.
We perform first the extended maximum likelihoof fot to determine the number of best values of signal and background events. 
We then perform a non-extended maximum likleihood fit, where we fit in this case the fraction of signal events. The fraction of backgorund events is just 1. minus the fraction of signal events. 
We show at the end that the number of signal and backgorund events extimated from a non-extended maximum likelihood fit from the fitted fraction and the total number of events results in a smaller error, since the total fluactuation on the number of events are not taken into accound. 
The under-estimation of the error become especially relevan when its corresponding fraction becomes closer to 1. 

We will use RooFit in this case for fitting. We start by defining the model using the RooFit workspace. 

In [1]:
%jsroot on

#### 1. Create RooFit model

we create the two main components of the model, a Gaussian pdf describing the signal and an Exponential pdf describing the background.
See for example the material at https://indico.desy.de/getFile.py/access?contribId=9&resId=0&materialId=slides&confId=13610

In [2]:
RooWorkspace w("w"); 
w.factory("Exponential:bkg_pdf(x[0,10], a[-0.5,-2,-0.2])");
w.factory("Gaussian:sig_pdf(x, mass[2,1,3], sigma[0.3,0.1,1.])");


[1mRooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby[0m 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt



##### 1.a Create model (S+B) for extended likelihood fit
Create now a model pdf. By using as parameter the number of signal and background events, we create an extended model, where the total number of events will be also fitted.
Roofit will do automatically in this case an extended maximum likelihood fit

In [3]:
w.factory("SUM:model(nsig[2000,0,10000]*sig_pdf, nbkg[5000,0,10000]*bkg_pdf)"); 

##### 1.b Create model (S+B) for non-extended likelihood fit using fraction of events

We create now a second model pdf, where we will use the signal and background fraction in this case. Since their sum is equal to 1, it is enough to add only the signal fraction in the fit. 
By fitting this model a non-extended maximum likelihood fit will be done

In [4]:
w.factory("SUM:model_frac(fsig[0.2,0,1]*sig_pdf, bkg_pdf)"); 

##### 1.c  Set the input parameters in the model
set the desidered value of nsig and nbackg

In [5]:
ntot=10000;
fsig=0.1;
w.var("nsig")->setVal(fsig*ntot);
w.var("nbkg")->setVal((1.-fsig)*ntot);
w.var("fsig")->setVal(fsig);

In [6]:
x  = w.var("x"); 
pdf = w.pdf("model");

#### 2. Generation of a toy data set
We generate now a data set using the previously created model. The number of events generated is given by the expected events we have input in the model. If we use the Extended option, the number of events will be generated according to a Poisson distribution. 

In [7]:
data = pdf->generate( *x); 
nevts = data->numEntries(); 
std::cout << "Generated " << nevts << " events" << std::endl;

Generated 10000 events


#### 3. Plot the data using RooFit (RooPlot class)
We plot the data using the **RooPlot** class from RooFit. We need to create the class using the *frame* function of the variable class. Note that the data will be plotted in a number of bins and range which is defined in the RooFit variable class representing *x* (**RooRealVar** class).  

In [8]:
plot = x->frame(RooFit::Title("Gaussian Signal over Exponential Background"));
data->plotOn(plot);
plot->Draw();
gPad->Draw();

Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1


#### 4. Fitting of the data (Extended Likelihood Fit)

We now fit the data using the *fitTo* function of the RooFit pdf class (**RooAbsPdf**). We specify as fitting options that we want to save the result of the fit and to use *Minuit2* as minimization algorithm to find the minimum of the NLL. 
The fit performed is an unbinned extendd maximum likelihood fit

In [9]:
r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
r->Print();

[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions will be evaluated in cache-and-track mode: (sig_pdf,bkg_pdf)
Minuit2Minimizer: Minimize with max-calls 2500 convergence for edm < 1 strategy 1
MnSeedGenerator: for initial parameters FCN = -65717.87168228
MnSeedGenerator: Initial state:   - FCN =  -65717.87168228 Edm =      2.12184 NCalls =     23
VariableMetric: start iterating until Edm is < 0.001
VariableMetric: Initial state   - FCN =  -65717.87168228 Edm =      2.12184 NCalls =     23
VariableMetric: Iteration #   0 - FCN =  -65717.87168228 Edm =      2.12184 NCalls =     23
VariableMetric: Iteration #   1 - FCN =  -65720.02706458 Edm =    0.0678746 NCalls =     34
VariableMetric: Iteration #   2 - FCN =   -65720.0990768 Edm =    0.0198736 NCalls =     46
VariableMetric: Iteration #   3 - FCN = 

Info in <Minuit2>: Minuit2Minimizer::Hesse : Hesse is valid - matrix is accurate


#### 5. Plot the result of the fit

After fitting we plot the resulting pdf normalized on the data that are observed. We plot at the same time the two different components, the signal (in red) and the background (in blue).

In [18]:
pdf->plotOn(plot);
//draw the two separate pdf's
pdf->plotOn(plot, RooFit::Components("bkg_pdf"), RooFit::LineStyle(kDashed) );
pdf->plotOn(plot, RooFit::Components("sig_pdf"), RooFit::LineColor(kRed), RooFit::LineStyle(kDashed) );
plot->Draw();
gPad->Draw();

[#1] INFO:Plotting -- RooAbsPdf::plotOn(model_frac) directly selected PDF components: (bkg_pdf)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model_frac) indirectly selected PDF components: ()
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model_frac) directly selected PDF components: (sig_pdf)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model_frac) indirectly selected PDF components: ()


### Non-Extended Likelihood Fit

We do now a non-extended likelihood fit using the fraction

In [11]:
pdf = w.pdf("model_frac");

In [12]:
r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"),RooFit::Extended(false));
r->Print();

[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions will be evaluated in cache-and-track mode: (sig_pdf,bkg_pdf)
Minuit2Minimizer: Minimize with max-calls 2000 convergence for edm < 1 strategy 1
MnSeedGenerator: for initial parameters FCN = 16383.52264794
MnSeedGenerator: Initial state:   - FCN =   16383.52264794 Edm =     0.335798 NCalls =     11
VariableMetric: start iterating until Edm is < 0.001
VariableMetric: Initial state   - FCN =   16383.52264794 Edm =     0.335798 NCalls =     11
VariableMetric: Iteration #   0 - FCN =   16383.52264794 Edm =     0.335798 NCalls =     11
VariableMetric: Iteration #   1 - FCN =   16383.29532758 Edm =     0.011745 NCalls =     21
VariableMetric: Iteration #   2 - FCN =   16383.26767864 Edm =  0.000174271 NCalls =     31
VariableMetric: After Hessian   - FCN =   16383.26767864 Edm =    0.0001913 NCalls =     56
VariableMetric: Iteration #   3 - FCN =   16383.2676786

Info in <Minuit2>: Minuit2Minimizer::Hesse : Hesse is valid - matrix is accurate


In [19]:
pdf->plotOn(plot,RooFit::LineColor(kGreen));
plot->Draw();
gPad->Draw();

#### Comparison of Extended vs Non-Extended fit

We look at the results obtained on the number of signal and background events for the extended and non-extended fit.
In particular it is interesting to note the different errors which are obtained. 

In [20]:
nsignal = w.var("fsig")->getVal()*nevts; 
err_nsignal = w.var("fsig")->getError()*nevts;
nbackg = (1.-w.var("fsig")->getVal())*nevts;
err_backg = w.var("fsig")->getError()*nevts;

In [21]:
std::cout << "From normal    ML Fit : nsignal = " << nsignal << " +/- " << err_nsignal << std::endl;
std::cout << "From extended  ML Fit : nsignal = " << w.var("nsig")->getVal() << " +/- " << w.var("nsig")->getError() << std::endl;

From normal    ML Fit : nsignal = 959.582 +/- 68.7739
From extended  ML Fit : nsignal = 959.665 +/- 69.3883


In [22]:
std::cout << "From normal    ML Fit : nbackg = " << nbackg << " +/- " << err_backg << std::endl;
std::cout << "From extended  ML Fit : nbackg = " << w.var("nbkg")->getVal() << " +/- " << w.var("nbkg")->getError() << std::endl;

From normal    ML Fit : nbackg = 9040.42 +/- 68.7739
From extended  ML Fit : nbackg = 9040.53 +/- 113.535
