Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] Range not considered when computing integral of RooParamHistFunc #7182

Closed
lmoneta opened this issue Feb 11, 2021 · 4 comments
Closed

[RF] Range not considered when computing integral of RooParamHistFunc #7182

lmoneta opened this issue Feb 11, 2021 · 4 comments

Comments

@lmoneta
Copy link
Member

lmoneta commented Feb 11, 2021

The range is not used in RooParamHIstFunc::analyticalIntegralWN, see https://root.cern/doc/master/RooParamHistFunc_8cxx_source.html#l00247
as reported in https://root-forum.cern.ch/t/createintegral-giving-wrong-results/43508.

Simple code to reproduce:

void testParamHistFunc() {                                                                                                                                
                                                                                                                                                          
   RooRealVar x("x","x",0,10);                                                                                                                            
   TH1D h1("h1","h1",20,0,10);                                                                                                                            
   TF1 f1("f1","gaus"); f1.SetParameters(1,5,1);                                                                                                          
   h1.FillRandom("f1",1000);                                                                                                                              
                                                                                                                                                          
   RooDataHist dh("dh","dh",x,&h1);                                                                                                                       
   RooParamHistFunc phf("phf","",x,dh);                                                                                                                   
   phf.Print("t");                                                                                                                                        
   x.setRange("R1",0,5);                                                                                                                                  
                                                                                                                                                                                                                
   auto ig1n = phf.createIntegral(x,x);                                                                                                                   
   std::cout << ig1n->getVal() << std::endl;                                                                                                              
   auto igr1 = phf.createIntegral(x,x,"R1");                                                                                                              
   std::cout << igr1->getVal() << std::endl;                                                                                                              
                                                                                                            
}        
@lmoneta lmoneta added the bug label Feb 11, 2021
@github-actions github-actions bot added this to Needs triage in Triage Feb 11, 2021
@lmoneta lmoneta self-assigned this Feb 11, 2021
@lmoneta
Copy link
Member Author

lmoneta commented Feb 11, 2021

The same problem exists also for the RooHIstFunc.
It has been reported already here
https://sft.its.cern.ch/jira/browse/ROOT-7413?filter=-2
using createRunningIntegral with a possible solution that needs to be investigated

guitargeek added a commit to guitargeek/root that referenced this issue Feb 12, 2021
The logic for summing over histogram bins in different ranges used in
RooHistPdf is also implemented in RooParamHistFunc. This means the
range is now considered when computing integrals of RooParamHistFunc.

RooParamHistFunc allows you to scale the counts in each bin with a
parameter. The interface of RooDataHist::sum was extended with a
function parameter to inject the logic of scaling the bin weight
depending on the bin index.

This commit partly fixes issue root-project#7182. We still need to implement the
range feature in RooHistFunc.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 12, 2021
The logic for summing over histogram bins in different ranges used in
RooHistPdf is also implemented in RooParamHistFunc. This means the
range is now considered when computing integrals of RooParamHistFunc.

RooParamHistFunc allows you to scale the counts in each bin with a
parameter. The interface of RooDataHist::sum was extended with a
function parameter to inject the logic of scaling the bin weight
depending on the bin index.

This commit partly fixes issue root-project#7182. We still need to implement the
range feature in RooHistFunc.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 12, 2021
The logic for summing over histogram bins in different ranges used in
RooHistPdf is also implemented in RooParamHistFunc. This means the
range is now considered when computing integrals of RooParamHistFunc.

RooParamHistFunc allows you to scale the counts in each bin with a
parameter. The interface of RooDataHist::sum was extended with a
function parameter to inject the logic of scaling the bin weight
depending on the bin index.

This commit partly fixes issue root-project#7182. We still need to implement the
range feature in RooHistFunc.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 12, 2021
The logic for summing over histogram bins in different ranges used in
RooHistPdf is also implemented in RooParamHistFunc. This means the
range is now considered when computing integrals of RooParamHistFunc.

RooParamHistFunc allows you to scale the counts in each bin with a
parameter. The interface of RooDataHist::sum was extended with a
function parameter to inject the logic of scaling the bin weight
depending on the bin index.

This commit partly fixes issue root-project#7182. We still need to implement the
range feature in RooHistFunc.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 12, 2021
The logic for summing over histogram bins in different ranges used in
RooHistPdf is also implemented in RooParamHistFunc. This means the
range is now considered when computing integrals of RooParamHistFunc.

RooParamHistFunc allows you to scale the counts in each bin with a
parameter. The interface of RooDataHist::sum was extended with a
function parameter to inject the logic of scaling the bin weight
depending on the bin index.

This commit partly fixes issue root-project#7182. We still need to implement the
range feature in RooHistFunc.
@etejedor etejedor removed this from Needs triage in Triage Feb 16, 2021
guitargeek added a commit to guitargeek/root that referenced this issue Feb 18, 2021
The logic for summing over histogram bins in different ranges used in
RooHistPdf is also implemented in RooParamHistFunc. This means the
range is now considered when computing integrals of RooParamHistFunc.

RooParamHistFunc allows you to scale the counts in each bin with a
parameter. The interface of RooDataHist::sum was extended with a
function parameter to inject the logic of scaling the bin weight
depending on the bin index.

This commit partly fixes issue root-project#7182. We still need to implement the
range feature in RooHistFunc.
guitargeek added a commit that referenced this issue Feb 19, 2021
The logic for summing over histogram bins in different ranges used in
RooHistPdf is also implemented in RooParamHistFunc. This means the
range is now considered when computing integrals of RooParamHistFunc.

RooParamHistFunc allows you to scale the counts in each bin with a
parameter. The interface of RooDataHist::sum was extended with a
function parameter to inject the logic of scaling the bin weight
depending on the bin index.

This commit partly fixes issue #7182. We still need to implement the
range feature in RooHistFunc.
@guitargeek
Copy link
Contributor

guitargeek commented Feb 22, 2021

While working on this issue, I noticed that also RooHistPdf is also not bug free. Here is what you can do to get strange results with it:

// g++ $(root-config --cflags) -o testHistPdf testHistPdf.cc $(root-config --libs) -lRooFitCore -lRooFit -g

#include "RooRealVar.h"
#include "RooHistPdf.h"
#include "RooDataHist.h"
#include "TH2D.h"
#include "TF2.h"

using namespace RooFit;

int main(int argc, char const *argv[]) {

   RooRealVar x("x","x",0, 10);
   RooRealVar y("y","y",0.05);
   TH2D h2("h2","h2",20,0,10, 30, 0, 10);
   TF2 f2("f2","y < 0.1");
   h2.FillRandom("f2",1000);

   RooArgSet argSet{x, y};

   RooDataHist dh("dh","dh",argSet,&h2);
   RooHistPdf phf("phf","",argSet,dh);
   x.setRange("R1",0,5);
   y.setRange("R1",0,10);
   auto int1 = phf.createIntegral(argSet,argSet);
   std::cout << int1->getVal() << std::endl;
   auto int2 = phf.createIntegral(argSet,argSet, "R1");
   std::cout << int2->getVal() << std::endl;

   auto int3 = phf.createIntegral(x,x);
   std::cout << int3->getVal() << std::endl;
   auto int4 = phf.createIntegral(x,x,"R1");
   std::cout << int4->getVal() << std::endl;

  return 0;
}

Obviously the integral of a constant function if you only take half of the x range should be half of the value you get for the full range, even if there is an additional variable in the slice set. However, this is what the program above outputs in ROOT master:

1
0.493
1
0.164333

The first 3 integrals are correct, but the final one (custom range for x and slice of y) gives the wrong result (0.493 expected).

I thought about how the integration should be done correctly for all of the RooHistPdf, RooHistFunc and RooParamHistFunc family. Some updates to the code to fix all the issues can be found in PR #7271.

guitargeek added a commit to guitargeek/root that referenced this issue Feb 22, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 22, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
@guitargeek guitargeek self-assigned this Feb 22, 2021
guitargeek added a commit to guitargeek/root that referenced this issue Feb 23, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 25, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 25, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 26, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 26, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 28, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 28, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 28, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 28, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation:
usage of RooHistPdf is far more common than RooHistFunc and its
analytical integration code supports more special cases. So one can
simply take the analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

As the documentation of RooDataHist::sum() states, the bin volume is
"the M-dimensional bin volume, (M = N(sumSet))". So for the bin volume,
one has take the product of the bin widths of each dimension in the sum
arg set, and apparently not the slice dimenstions.

This seems to be wrongly implemented in two overloads of
`RooDataHist::sum()`. The second overload (separate sumSet and sliceSet
but no custom ranges) calculates the bin volume of the sliceSet as
opposed to the sumSet. The third overload with the custom ranges
calculates the full volume considering all arguments, not only
the sumSet.

After implementing this correctly, `RooHistFunc::analyticalIntegral()`
gives the correct results, but `RooHistPdf` is broken in the case
where argSet is only a strict subset and there are no custom ranges
(second overload of `RooDataHist::sum()`). This can be fixed by
disabling a normalization of 1 over the bin volume that was only
enabled in this case for reasons that are not clear to me.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin normalization capabilities of `RooDataHist::sum()`
are also used in `RooParamHistFunc` now.

Further improvements could be the reduction of code duplication and
implementation of unit tests for the various analytical integral
configurations.
guitargeek added a commit to guitargeek/root that referenced this issue Feb 28, 2021
After a0fa4fa, this fixes the remaining analytical integral problems
with the RooHistPdf/RooHistFunc/RooParamHistFunc family that are
reported in issue root-project#7182 and Jira ticket ROOT-7413.

To use the same code for all three classes, the logic of
RooDataHist::sum() that is already used for RooHistPdf needs to be
changed/corrected in the following ways:
guitargeek added a commit to guitargeek/root that referenced this issue Feb 28, 2021
After a0fa4fa, this commit fixes the remaining analytical integral problems
with the RooHistFunc reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation: usage of
RooHistPdf is far more common than RooHistFunc and its analytical
integration code supports more special cases. So one can simply take the
hopefully bug free analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

Code duplication is avoided by moving the integration code into static
functions in `RooHistPdf` that `RooHistFunc` can also use.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin volume normalization capabilities of
`RooDataHist::sum()` are also used in `RooParamHistFunc` now.
guitargeek added a commit to guitargeek/root that referenced this issue Mar 1, 2021
After a0fa4fa, this commit fixes the remaining analytical integral problems
with the RooHistFunc reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation: usage of
RooHistPdf is far more common than RooHistFunc and its analytical
integration code supports more special cases. So one can simply take the
hopefully bug free analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

Code duplication is avoided by moving the integration code into static
functions in `RooHistPdf` that `RooHistFunc` can also use.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin volume normalization capabilities of
`RooDataHist::sum()` are also used in `RooParamHistFunc` now.
guitargeek added a commit to guitargeek/root that referenced this issue Mar 1, 2021
A new testRooParamHistFunc was introduced. The analytic integration of a
RooParamHistFunc is tested both for trivial and non-trivial parameters,
since the integration over subranges was problematic (as reported in
issue root-project#7182).
guitargeek added a commit to guitargeek/root that referenced this issue Mar 9, 2021
After a0fa4fa, this commit fixes the remaining analytical integral problems
with the RooHistFunc reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation: usage of
RooHistPdf is far more common than RooHistFunc and its analytical
integration code supports more special cases. So one can simply take the
hopefully bug free analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

Code duplication is avoided by moving the integration code into static
functions in `RooHistPdf` that `RooHistFunc` can also use.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin volume normalization capabilities of
`RooDataHist::sum()` are also used in `RooParamHistFunc` now.
guitargeek added a commit that referenced this issue Mar 10, 2021
After a0fa4fa, this commit fixes the remaining analytical integral problems
with the RooHistFunc reported in issue #7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation: usage of
RooHistPdf is far more common than RooHistFunc and its analytical
integration code supports more special cases. So one can simply take the
hopefully bug free analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

Code duplication is avoided by moving the integration code into static
functions in `RooHistPdf` that `RooHistFunc` can also use.

With all these changes, the problems reported in GitHub issue #7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin volume normalization capabilities of
`RooDataHist::sum()` are also used in `RooParamHistFunc` now.
guitargeek added a commit to guitargeek/root that referenced this issue Mar 10, 2021
After a0fa4fa, this commit fixes the remaining analytical integral problems
with the RooHistFunc reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation: usage of
RooHistPdf is far more common than RooHistFunc and its analytical
integration code supports more special cases. So one can simply take the
hopefully bug free analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

Code duplication is avoided by moving the integration code into static
functions in `RooHistPdf` that `RooHistFunc` can also use.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin volume normalization capabilities of
`RooDataHist::sum()` are also used in `RooParamHistFunc` now.
guitargeek added a commit that referenced this issue Mar 10, 2021
After a0fa4fa, this commit fixes the remaining analytical integral problems
with the RooHistFunc reported in issue #7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation: usage of
RooHistPdf is far more common than RooHistFunc and its analytical
integration code supports more special cases. So one can simply take the
hopefully bug free analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

Code duplication is avoided by moving the integration code into static
functions in `RooHistPdf` that `RooHistFunc` can also use.

With all these changes, the problems reported in GitHub issue #7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin volume normalization capabilities of
`RooDataHist::sum()` are also used in `RooParamHistFunc` now.
@guitargeek
Copy link
Contributor

guitargeek commented Mar 11, 2021

The RooParamHistFunc story continues.

It was noticed that the range is still not considered in the integral when you clone the integral just like in this example:

void testRooParamHistFunc2() {
   using namespace RooFit;

   RooRealVar x("x","x",0,10);
   x.setRange("R1",0,5);
   TF1 f1("f1","1");

   TH1D h1("h1","h1",10,0,10);
   h1.FillRandom("f1",50);
   TH1D h2("h2","h2",10,0,10);
   h2.FillRandom("f1",50);

   RooDataHist dh1("dh1","dh1",x,&h1);
   RooDataHist dh2("dh2","dh2",x,&h2);

   RooParamHistFunc ph("ph","",x,dh1);

   RooUniform uni("uni", "uni", RooArgList(x));
   RooRealVar frac("frac", "frac", 0.5, 0.0, 1.0);
   RooRealSumPdf model{"model", "model", ph, uni, frac};

   auto fitResult = ph.fitTo(dh2, PrintLevel(0), Save());

   auto integral = model.createIntegral(x,x,"R1");
   auto integralClone = static_cast<RooAbsReal*>(integral->cloneTree());

   cout << integral->getVal(x) << endl;
   cout << integralClone->getVal(x) << endl;
}

So this issue can't be closed yet.

manolismih pushed a commit to manolismih/root that referenced this issue Mar 12, 2021
After a0fa4fa, this commit fixes the remaining analytical integral problems
with the RooHistFunc reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation: usage of
RooHistPdf is far more common than RooHistFunc and its analytical
integration code supports more special cases. So one can simply take the
hopefully bug free analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

Code duplication is avoided by moving the integration code into static
functions in `RooHistPdf` that `RooHistFunc` can also use.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin volume normalization capabilities of
`RooDataHist::sum()` are also used in `RooParamHistFunc` now.
guitargeek added a commit to guitargeek/root that referenced this issue Mar 16, 2021
A new testRooParamHistFunc was introduced. The analytic integration of a
RooParamHistFunc is tested both for trivial and non-trivial parameters,
since the integration over subranges was problematic (as reported in
issue root-project#7182).
@guitargeek guitargeek added this to Issues in Fixed in 6.24/00 via automation Mar 16, 2021
@guitargeek
Copy link
Contributor

Fixed in master by #7198, #7271, and #7478.

Fixed in 6.24 by the #7198, #7448, and #7479. The fist PR is the same as for master because that was before the 6.24 branch was created, and the other two PRs are backports.

guitargeek added a commit that referenced this issue Mar 23, 2021
A new testRooParamHistFunc was introduced. The analytic integration of a
RooParamHistFunc is tested both for trivial and non-trivial parameters,
since the integration over subranges was problematic (as reported in
issue #7182).
nicknagi pushed a commit to nicknagi/root that referenced this issue Mar 30, 2021
The logic for summing over histogram bins in different ranges used in
RooHistPdf is also implemented in RooParamHistFunc. This means the
range is now considered when computing integrals of RooParamHistFunc.

RooParamHistFunc allows you to scale the counts in each bin with a
parameter. The interface of RooDataHist::sum was extended with a
function parameter to inject the logic of scaling the bin weight
depending on the bin index.

This commit partly fixes issue root-project#7182. We still need to implement the
range feature in RooHistFunc.
nicknagi pushed a commit to nicknagi/root that referenced this issue Mar 30, 2021
After a0fa4fa, this commit fixes the remaining analytical integral problems
with the RooHistFunc reported in issue root-project#7182 and Jira ticket ROOT-7413.

The starting point of this commit is the following observation: usage of
RooHistPdf is far more common than RooHistFunc and its analytical
integration code supports more special cases. So one can simply take the
hopefully bug free analytical integration code from RooHistPdf and
substitute it for the existing code in RooHistFunc.

However, there is one change that needs to be made when copy-pasting:
the `correctForBinSize` in `RooDataHist::sum()` needs to be enabled
because integrating the pdf and the function described by the histogram
shape is slightly different:

 - RooHistPdf: the integral of an empirical PDF is simply the integral
   of histogram counts divided by a normalization factor
 - RooHistFunc/RooParamHistFunc: here, the bin counts need to be
   multiplied by the bin volume

Code duplication is avoided by moving the integration code into static
functions in `RooHistPdf` that `RooHistFunc` can also use.

With all these changes, the problems reported in GitHub issue root-project#7182
and Jira ticket ROOT-7413 are fixed.

Furthermore, the bin volume normalization capabilities of
`RooDataHist::sum()` are also used in `RooParamHistFunc` now.
nicknagi pushed a commit to nicknagi/root that referenced this issue Mar 30, 2021
A new testRooParamHistFunc was introduced. The analytic integration of a
RooParamHistFunc is tested both for trivial and non-trivial parameters,
since the integration over subranges was problematic (as reported in
issue root-project#7182).
@guitargeek guitargeek changed the title Range not considered when computing integral of RooParamHistFunc [RF] Range not considered when computing integral of RooParamHistFunc Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

3 participants