# Rf 4 0 3_Weightedevts
Data and categories: using weights in unbinned datasets



**Author:** 07/2008 - Wouter Verkerke  
<i><small>This notebook tutorial was automatically generated with <a href= "https://github.com/root-project/root/blob/master/documentation/doxygen/converttonotebook.py">ROOTBOOK-izer</a> from the macro found in the ROOT repository  on Thursday, August 29, 2019 at 02:52 AM.</small></i>

In [1]:
%%cpp -d
#include "RooRealVar.h"
#include "RooDataSet.h"
#include "RooDataHist.h"
#include "RooGaussian.h"
#include "RooConstVar.h"
#include "RooFormulaVar.h"
#include "RooGenericPdf.h"
#include "RooPolynomial.h"
#include "RooChi2Var.h"
#include "RooMinimizer.h"
#include "TCanvas.h"
#include "TAxis.h"
#include "RooPlot.h"
#include "RooFitResult.h"

In [2]:
%%cpp -d
// This is a workaround to make sure the namespace is used inside functions
using namespace RooFit;

Create observable and unweighted dataset
 -------------------------------------------------------------------------------

Declare observable

In [3]:
RooRealVar x("x", "x", -10, 10);
x.setBins(40);


[1mRooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby[0m 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt



Construction a uniform pdf

In [4]:
RooPolynomial p0("px", "px", x);

Sample 1000 events from pdf

In [5]:
RooDataSet *data = p0.generate(x, 1000);

Calculate weight and make dataset weighted
 -----------------------------------------------------------------------------------

Construct formula to calculate (fake) weight for events

In [6]:
RooFormulaVar wFunc("w", "event weight", "(x*x+10)", x);

Add column with variable w to previously generated dataset

In [7]:
RooRealVar *w = (RooRealVar *)data->addColumn(wFunc);

Dataset d is now a dataset with two observable (x,w) with 1000 entries

In [8]:
data->Print();

RooDataSet::pxData[x,w] = 1000 entries


Instruct dataset wdata in interpret w as event weight rather than as observable

In [9]:
RooDataSet wdata(data->GetName(), data->GetTitle(), data, *data->get(), 0, w->GetName());

Dataset d is now a dataset with one observable (x) with 1000 entries and a sum of weights of ~430k

In [10]:
wdata.Print();

RooDataSet::pxData[x,weight:w] = 1000 entries (43238.9 weighted)


Unbinned ml fit to weighted data
 ---------------------------------------------------------------

Construction quadratic polynomial pdf for fitting

In [11]:
RooRealVar a0("a0", "a0", 1);
RooRealVar a1("a1", "a1", 0, -1, 1);
RooRealVar a2("a2", "a2", 1, 0, 10);
RooPolynomial p2("p2", "p2", x, RooArgList(a0, a1, a2), 0);

Fit quadratic polynomial to weighted data

Note: a plain maximum likelihood fit to weighted data does in general
       NOT result in correct error estimates, unless individual
       event weights represent Poisson statistics themselves.

 Fit with 'wrong' errors

In [12]:
RooFitResult *r_ml_wgt = p2.fitTo(wdata, Save());

       While the estimated values of the parameters will always be calculated taking the weights into account,
       there are multiple ways to estimate the errors of the parameters. You are advised to make an'n       explicit choice for the error calculation:
           - Either provide SumW2Error(true), to calculate a sum-of-weights-corrected HESSE error matrix
             (error will be proportional to the number of events in MC).
           - Or provide SumW2Error(false), to return errors from original HESSE error matrix
             (which will be proportional to the sum of the weights, i.e., a dataset with <sum of weights> events).
       If you want the errors to reflect the information contained in the provided simulation, choose true.
       If you want the errors to reflect the precision you would be able to obtain with an unweighted dataset
       with <sum of weights> events, choose false.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization


A first order correction to estimated parameter errors in an
 (unbinned) ML fit can be obtained by calculating the
 covariance matrix as

    V' = V C-1 V

 where V is the covariance matrix calculated from a fit
 to -logL = - sum [ w_i log f(x_i) ] and C is the covariance
 matrix calculated from -logL' = -sum [ w_i^2 log f(x_i) ]
 (i.e. the weights are applied squared)

 A fit in this mode can be performed as follows:

In [13]:
RooFitResult *r_ml_wgt_corr = p2.fitTo(wdata, Save(), SumW2Error(kTRUE));

[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
 **********
 **   10 **SET PRINT           1
 **********
 **********
 **   11 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 a1          -4.85603e-03  4.03459e-03   -1.00000e+00  1.00000e+00
     2 a2           9.86514e-02  2.41310e-03    0.00000e+00  1.00000e+01
 **********
 **   12 **SET ERR         0.5
 **********
 **********
 **   13 **SET PRINT           1
 **********
 **********
 **   14 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **   15 **MIGRAD        1000           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=119682 FROM MIGRAD    STATUS=INITIATE        4 CALLS           5 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX  

Plot weighed data and fit result
 ---------------------------------------------------------------

Construct plot frame

In [14]:
RooPlot *frame = x.frame(Title("Unbinned ML fit, binned chi^2 fit to weighted data"));

Plot data using sum-of-weights-squared error rather than poisson errors

In [15]:
wdata.plotOn(frame, DataError(RooAbsData::SumW2));

Overlay result of 2nd order polynomial fit to weighted data

In [16]:
p2.plotOn(frame);

Ml fit of pdf to equivalent unweighted dataset
 -----------------------------------------------------------------------------------------

Construct a pdf with the same shape as p0 after weighting

In [17]:
RooGenericPdf genPdf("genPdf", "x*x+10", x);

Sample a dataset with the same number of events as data

In [18]:
RooDataSet *data2 = genPdf.generate(x, 1000);

[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)


Sample a dataset with the same number of weights as data

In [19]:
RooDataSet *data3 = genPdf.generate(x, 43000);

[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)


Fit the 2nd order polynomial to both unweighted datasets and save the results for comparison

In [20]:
RooFitResult *r_ml_unw10 = p2.fitTo(*data2, Save());
RooFitResult *r_ml_unw43 = p2.fitTo(*data3, Save());

[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
 **********
 **   22 **SET PRINT           1
 **********
 **********
 **   23 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 a1          -4.85553e-03  3.00247e-02   -1.00000e+00  1.00000e+00
     2 a2           9.86520e-02  2.98987e-02    0.00000e+00  1.00000e+01
 **********
 **   24 **SET ERR         0.5
 **********
 **********
 **   25 **SET PRINT           1
 **********
 **********
 **   26 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **   27 **MIGRAD        1000           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=2766.64 FROM MIGRAD    STATUS=INITIATE        6 CALLS           7 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX 

Chi2 fit of pdf to binned weighted dataset
 ------------------------------------------------------------------------------------

Construct binned clone of unbinned weighted dataset

In [21]:
RooDataHist *binnedData = wdata.binnedClone();
binnedData->Print("v");

DataStore pxData_binned (Generated From px_binned)
  Contains 40 entries
  Observables: 
    1)  x = -1.26187  L(-10 - 10) B(40)  "x"
Binned Dataset pxData_binned (Generated From px_binned)
  Contains 40 bins with a total weight of 43238.9
  Observables:     1)  x = -1.26187  L(-10 - 10) B(40)  "x"


Perform chi2 fit to binned weighted dataset using sum-of-weights errors

 NB: Within the usual approximations of a chi2 fit, a chi2 fit to weighted
 data using sum-of-weights-squared errors does give correct error
 estimates

In [22]:
RooChi2Var chi2("chi2", "chi2", p2, *binnedData, DataError(RooAbsData::SumW2));
RooMinimizer m(chi2);
m.migrad();
m.hesse();

 **********
 **   40 **SET PRINT           1
 **********
 **********
 **   41 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 a1          -1.13263e-03  4.02065e-03   -1.00000e+00  1.00000e+00
     2 a2           9.75516e-02  2.36614e-03    0.00000e+00  1.00000e+01
 **********
 **   42 **SET ERR           1
 **********
 **********
 **   43 **SET PRINT           1
 **********
 **********
 **   44 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **   45 **MIGRAD        1000           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=32.3464 FROM MIGRAD    STATUS=INITIATE        6 CALLS           7 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO

Plot chi^2 fit result on frame as well

In [23]:
RooFitResult *r_chi2_wgt = m.save();
p2.plotOn(frame, LineStyle(kDashed), LineColor(kRed));

Compare fit results of chi2,ml fits to (un)weighted data
 ---------------------------------------------------------------------------------------------------------------

Note that ml fit on 1kevt of weighted data is closer to result of ml fit on 43kevt of unweighted data
 than to 1Kevt of unweighted data, whereas the reference chi^2 fit with SumW2 error gives a result closer to
 that of an unbinned ML fit to 1Kevt of unweighted data.

In [24]:
cout << "==> ML Fit results on 1K unweighted events" << endl;
r_ml_unw10->Print();
cout << "==> ML Fit results on 43K unweighted events" << endl;
r_ml_unw43->Print();
cout << "==> ML Fit results on 1K weighted events with a summed weight of 43K" << endl;
r_ml_wgt->Print();
cout << "==> Corrected ML Fit results on 1K weighted events with a summed weight of 43K" << endl;
r_ml_wgt_corr->Print();
cout << "==> Chi2 Fit results on 1K weighted events with a summed weight of 43K" << endl;
r_chi2_wgt->Print();

new TCanvas("rf403_weightedevts", "rf403_weightedevts", 600, 600);
gPad->SetLeftMargin(0.15);
frame->GetYaxis()->SetTitleOffset(1.8);
frame->Draw();

==> ML Fit results on 1K unweighted events

  RooFitResult: minimized FCN value: 2766.49, estimated distance to minimum: 7.63714e-07
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter    FinalValue +/-  Error   
  --------------------  --------------------------
                    a1    8.9440e-03 +/-  2.69e-02
                    a2    1.0129e-01 +/-  1.67e-02

==> ML Fit results on 43K unweighted events

  RooFitResult: minimized FCN value: 118892, estimated distance to minimum: 2.21254e-06
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter    FinalValue +/-  Error   
  --------------------  --------------------------
                    a1   -1.1326e-03 +/-  4.02e-03
                    a2    9.7552e-02 +/-  2.37e-03

==> ML Fit results on 1K weighted events with a summed weight of 43K

  RooFitRe

Draw all canvases 

In [25]:
%jsroot on
gROOT->GetListOfCanvases()->Draw()