Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] Determine asymptotically correct uncertainties for extended unbinned maximum likelihood fits #14751

Merged
merged 5 commits into from
May 22, 2024

Conversation

langenbruch
Copy link
Contributor

@langenbruch langenbruch commented Feb 16, 2024

This Pull request:

Added correct treatment of extended term in asymptotically correct method for uncertainty determination in the presence of weights. This improvement will allow for extended unbinned maximum likelihood fits to use the asymptotically correct method.

Changes or fixes:

-Added extended term to asymptotically correct method to allow for weighted extended maximum likelihood fits
-Usage of logarithmic PDFs for derivatives may have some numerical benefits
-Fixed RooDerivative to not use the parameter range in the step size calculation in case the parameter supplied has no range. This now allows to use the method (and RooDerivative in general) for parameters that do not specify a range.
-Added warning in case of binned fits. In this case a chi2 fit using the std. treatment of weights should be preferrable.

Checklist:

  • tested changes locally

This PR fixes #
resolves https://its.cern.ch/jira/browse/ROOT-10827
resolves https://its.cern.ch/jira/browse/ROOT-10866
related #11660
note #11112 can not be fixed since the resulting minuit covariance matrix is not positive definite (set PrintLevel(0) to see this). Calculations with the inverse Hessians are thus not reliable. High stats fit show the expected behaviour.

@phsft-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@lmoneta
Copy link
Member

lmoneta commented Feb 16, 2024

@phsft-bot build

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-ubuntu2004/python3.
Running on root-ubuntu-2004-1.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@@ -66,7 +67,19 @@ constexpr int extendedFitDefault = 2;
/// \param[in] data The dataset that was used for the fit.
int calcAsymptoticCorrectedCovariance(RooAbsReal &pdf, RooMinimizer &minimizer, RooAbsData const &data)
{
// Calculated corrected errors for weighted likelihood fits
RooFormulaVar logpdf("logpdf", "-log(pdf)", "-log(@0)", pdf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use RooAbsPdf::getLogVal()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe RooDerivative would not take that as an argument, so I do not think this is possible directly for my use case. I am happy if someone with more internal knowledge on roofit has a concrete suggestion how to get a RooAbsPdf which is the log of an existing pdf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I have changed the sign here -log() -> +log(). While the overall sign does not matter, the relative sign to the extended term is important. Since I derive +log(Nexpected), the sign here needs to be positive as well.

Copy link
Contributor

@guitargeek guitargeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for helping out, @langenbruch!

I have a few change requests. Also, is it possible to implement a unit test here?
https://github.com/root-project/root/blob/master/roofit/roofitcore/test/testTestStatistics.cxx

Copy link

github-actions bot commented Feb 17, 2024

Test Results

    10 files      10 suites   2d 4h 41m 26s ⏱️
 2 637 tests  2 637 ✅ 0 💤 0 ❌
24 906 runs  24 906 ✅ 0 💤 0 ❌

Results for commit 7eda224.

♻️ This comment has been updated with latest results.

@langenbruch
Copy link
Contributor Author

langenbruch commented Feb 18, 2024

Thank you so much for helping out, @langenbruch!

I have a few change requests. Also, is it possible to implement a unit test here? https://github.com/root-project/root/blob/master/roofit/roofitcore/test/testTestStatistics.cxx

Thanks for the review @guitargeek , I implemented the requests. Please see attached the adapted tutorial rf611_weightedfits.C to demonstrate the extended functionality, which could also replace the existing one. I would prefer to not write any additional unit tests.
rf611_weightedfits.tar.gz

@guitargeek
Copy link
Contributor

@phsft-bot build

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@guitargeek
Copy link
Contributor

Thank you very much @langenbruch for the tutorial and the PR updates! I will take it from here, integrating the tutorial and maybe writing some tests, and then I'll merge the PR.

if (dynamic_cast<RooDataHist const *>(&data)) {
oocoutW(&pdf, InputArguments)
<< "RooAbsPdf::fitTo(" << pdf.GetName()
<< ") WARNING: Asymptotic error correction is requested for a binned data set. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @langenbruch
Thank you for the PR. I don't agree with this statement. A chi2 fit can give a more correct parameter estimate, but it can also give more biased results, especially in bins with low statistics. We could instead warn that other methods for estimating uncertainty (bootstrapping using toys) are preferable.
Also, can you please elaborate more why you think the asymptotic procedure does not work for a binned data set. A binned data set is statistically equivalent to a weighted unbinned data set. It is not more a question on the number of bins to be large enough in order to be in the asymptotic regime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @lmoneta , first of all let me say that I am fine with removing the second part of the sentence about the chi2 fit, as my main point is that I do not want people to use the code for a binned fit. I do not know in detail how roofit handles binned fits internally, and do not want to assume responsibility for people using the code for things it was not intended to do.

On your comment about chi2 fits being potentially biased at low statistics I have to say that I do not disagree in general, but it is slightly besides the point. The asymptotically correct method relies on asymptotic statistics, ie. it is valid in the high statistics limit. In this limit I believe you will indeed get the usual chi2 expression when evaluating the expectation values. The asymptotically correct method is not a method of coverage correction at low statistics, for this a proper Neyman construction would be required (both in the weighted and the unweighted case). The point is that the previous Sumw2 method does not provide asymptotically correct intervals for unbinned fits, even in the high statistics limit (as you can easily see when you run the example I implemented).

for (std::size_t k = 0; k < floated.size(); k++) {
for (std::size_t l = 0; l < floated.size(); l++) {
num(k, l) += data.weightSquared() * diffs[k] * diffs[l] / (prob * prob);
num(k, l) += data.weightSquared() * (diffs[k] + diffs_expected[k]) * (diffs[l] + diffs_expected[l]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @langenbruch! To you somewhere have a paper to quote for this formula, or some notes? I can't find it in https://arxiv.org/abs/1911.01303, which is currently the reference cited in the RooFit documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @guitargeek, the approach is exactly the same as in my paper, performing a Taylor expansion of the logarithmic likelihood (see Sec. 2.1), and then calculating the appropriate expectation values. Since the calculation for the extended term may not be completely straightforward, I wrote down the explicit steps in the attached PDF.
extended_weighted_fits.pdf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the effort in writing up the document with the calculations! We should make the document more visible from the ROOT documentation, in case someone is interested in understanding the details. Would that be okay for you?

I don't think GitHub is not the best place to upload this. Can we put it on the ROOT website in some folder to link from the RooAbsPdf doxygen? Or you would prefer something else?

@guitargeek guitargeek added this to the 6.32/00 milestone May 13, 2024
guitargeek and others added 2 commits May 22, 2024 03:17
Before, the epsilon was scaled with the parameter range.

This makes the RooDerivative also work in case of unbounded parameters.

The alternative would have been to consider the range only if the
parameter is bounded, but this makes the behavior more unpredictable.
Added correct treatment of extended term in asymptotically correct
method for uncertainty determination in the presence of weights. This
improvement will allow for extended unbinned maximum likelihood fits to
use the asymptotically correct method.

  - Added extended term to asymptotically correct method to allow for
    weighted extended maximum likelihood fits

  - Usage of logarithmic PDFs for derivatives may have some numerical
    benefits

  - Fixed RooDerivative to not use the parameter range in the step size
    calculation in case the parameter supplied has no range. This now
    allows to use the method (and RooDerivative in general) for
    parameters that do not specify a range.

  - Added warning in case of binned fits. In this case a chi2 fit using
    the std. treatment of weights should be preferrable.
Copy link
Contributor

@guitargeek guitargeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much again for your contribution Christoph! I'm merging it in time for the next ROOT release.

@guitargeek guitargeek merged commit f228412 into root-project:master May 22, 2024
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants