-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RF] Determine asymptotically correct uncertainties for extended unbinned maximum likelihood fits #14751
[RF] Determine asymptotically correct uncertainties for extended unbinned maximum likelihood fits #14751
Conversation
Can one of the admins verify this patch? |
@phsft-bot build |
Starting build on |
Build failed on ROOT-ubuntu2004/python3. Failing tests: |
roofit/roofitcore/src/FitHelpers.cxx
Outdated
@@ -66,7 +67,19 @@ constexpr int extendedFitDefault = 2; | |||
/// \param[in] data The dataset that was used for the fit. | |||
int calcAsymptoticCorrectedCovariance(RooAbsReal &pdf, RooMinimizer &minimizer, RooAbsData const &data) | |||
{ | |||
// Calculated corrected errors for weighted likelihood fits | |||
RooFormulaVar logpdf("logpdf", "-log(pdf)", "-log(@0)", pdf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use RooAbsPdf::getLogVal()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe RooDerivative would not take that as an argument, so I do not think this is possible directly for my use case. I am happy if someone with more internal knowledge on roofit has a concrete suggestion how to get a RooAbsPdf which is the log of an existing pdf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I have changed the sign here -log() -> +log(). While the overall sign does not matter, the relative sign to the extended term is important. Since I derive +log(Nexpected), the sign here needs to be positive as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for helping out, @langenbruch!
I have a few change requests. Also, is it possible to implement a unit test here?
https://github.com/root-project/root/blob/master/roofit/roofitcore/test/testTestStatistics.cxx
Test Results 10 files 10 suites 2d 4h 41m 26s ⏱️ Results for commit 7eda224. ♻️ This comment has been updated with latest results. |
Thanks for the review @guitargeek , I implemented the requests. Please see attached the adapted tutorial rf611_weightedfits.C to demonstrate the extended functionality, which could also replace the existing one. I would prefer to not write any additional unit tests. |
bd24dcc
to
bac0431
Compare
bac0431
to
5f21d08
Compare
@phsft-bot build |
Starting build on |
Thank you very much @langenbruch for the tutorial and the PR updates! I will take it from here, integrating the tutorial and maybe writing some tests, and then I'll merge the PR. |
if (dynamic_cast<RooDataHist const *>(&data)) { | ||
oocoutW(&pdf, InputArguments) | ||
<< "RooAbsPdf::fitTo(" << pdf.GetName() | ||
<< ") WARNING: Asymptotic error correction is requested for a binned data set. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @langenbruch
Thank you for the PR. I don't agree with this statement. A chi2 fit can give a more correct parameter estimate, but it can also give more biased results, especially in bins with low statistics. We could instead warn that other methods for estimating uncertainty (bootstrapping using toys) are preferable.
Also, can you please elaborate more why you think the asymptotic procedure does not work for a binned data set. A binned data set is statistically equivalent to a weighted unbinned data set. It is not more a question on the number of bins to be large enough in order to be in the asymptotic regime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @lmoneta , first of all let me say that I am fine with removing the second part of the sentence about the chi2 fit, as my main point is that I do not want people to use the code for a binned fit. I do not know in detail how roofit handles binned fits internally, and do not want to assume responsibility for people using the code for things it was not intended to do.
On your comment about chi2 fits being potentially biased at low statistics I have to say that I do not disagree in general, but it is slightly besides the point. The asymptotically correct method relies on asymptotic statistics, ie. it is valid in the high statistics limit. In this limit I believe you will indeed get the usual chi2 expression when evaluating the expectation values. The asymptotically correct method is not a method of coverage correction at low statistics, for this a proper Neyman construction would be required (both in the weighted and the unweighted case). The point is that the previous Sumw2 method does not provide asymptotically correct intervals for unbinned fits, even in the high statistics limit (as you can easily see when you run the example I implemented).
for (std::size_t k = 0; k < floated.size(); k++) { | ||
for (std::size_t l = 0; l < floated.size(); l++) { | ||
num(k, l) += data.weightSquared() * diffs[k] * diffs[l] / (prob * prob); | ||
num(k, l) += data.weightSquared() * (diffs[k] + diffs_expected[k]) * (diffs[l] + diffs_expected[l]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @langenbruch! To you somewhere have a paper to quote for this formula, or some notes? I can't find it in https://arxiv.org/abs/1911.01303, which is currently the reference cited in the RooFit documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @guitargeek, the approach is exactly the same as in my paper, performing a Taylor expansion of the logarithmic likelihood (see Sec. 2.1), and then calculating the appropriate expectation values. Since the calculation for the extended term may not be completely straightforward, I wrote down the explicit steps in the attached PDF.
extended_weighted_fits.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the effort in writing up the document with the calculations! We should make the document more visible from the ROOT documentation, in case someone is interested in understanding the details. Would that be okay for you?
I don't think GitHub is not the best place to upload this. Can we put it on the ROOT website in some folder to link from the RooAbsPdf doxygen? Or you would prefer something else?
Before, the epsilon was scaled with the parameter range. This makes the RooDerivative also work in case of unbounded parameters. The alternative would have been to consider the range only if the parameter is bounded, but this makes the behavior more unpredictable.
Added correct treatment of extended term in asymptotically correct method for uncertainty determination in the presence of weights. This improvement will allow for extended unbinned maximum likelihood fits to use the asymptotically correct method. - Added extended term to asymptotically correct method to allow for weighted extended maximum likelihood fits - Usage of logarithmic PDFs for derivatives may have some numerical benefits - Fixed RooDerivative to not use the parameter range in the step size calculation in case the parameter supplied has no range. This now allows to use the method (and RooDerivative in general) for parameters that do not specify a range. - Added warning in case of binned fits. In this case a chi2 fit using the std. treatment of weights should be preferrable.
Also restructure the code to make the tutorial more flexible.
5f21d08
to
7eda224
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much again for your contribution Christoph! I'm merging it in time for the next ROOT release.
This Pull request:
Added correct treatment of extended term in asymptotically correct method for uncertainty determination in the presence of weights. This improvement will allow for extended unbinned maximum likelihood fits to use the asymptotically correct method.
Changes or fixes:
-Added extended term to asymptotically correct method to allow for weighted extended maximum likelihood fits
-Usage of logarithmic PDFs for derivatives may have some numerical benefits
-Fixed RooDerivative to not use the parameter range in the step size calculation in case the parameter supplied has no range. This now allows to use the method (and RooDerivative in general) for parameters that do not specify a range.
-Added warning in case of binned fits. In this case a chi2 fit using the std. treatment of weights should be preferrable.
Checklist:
This PR fixes #
resolves https://its.cern.ch/jira/browse/ROOT-10827
resolves https://its.cern.ch/jira/browse/ROOT-10866
related #11660
note #11112 can not be fixed since the resulting minuit covariance matrix is not positive definite (set PrintLevel(0) to see this). Calculations with the inverse Hessians are thus not reliable. High stats fit show the expected behaviour.