Toys seems to give strange results #892

darousso · 2020-06-04T02:25:17Z

Question

I have noticed that the toys hypotest seems to give some strange values even if the asymptotics gives regular looking values.

In general the cls_exp upper sigma values seem to be all saturated at 1 for all signal points, and this does not seem to improve when increasing the number of toys.

Relevant Issues and Pull Requests

An example signal point model is the following:

{
    "channels": [
        {
            "name": "channel1",
            "samples": [
                {
                    "data": [1.364790054231882],
                    "modifiers": [
                        {"data": None, "name": "lumi", "type": "lumi"},
                        {"data": None, "name": "mu_Sig", "type": "normfactor"},
                        {
                            "data": {"hi": 1.228925751097454, "lo": 0.7710742489025461},
                            "name": "ucs_StopRHadron_1100_100000",
                            "type": "normsys",
                        },
                    ],
                    "name": "StopRHadron_1100_100000",
                },
                {
                    "data": [0.43],
                    "modifiers": [
                        {
                            "data": [0.16],
                            "name": "staterror_channel1",
                            "type": "staterror",
                        }
                    ],
                    "name": "Bkg",
                },
            ],
        }
    ],
    "measurements": [
        {
            "config": {
                "parameters": [
                    {
                        "auxdata": [1.0],
                        "bounds": [[0.5, 1.5]],
                        "inits": [1.0],
                        "name": "lumi",
                        "sigmas": [0.017],
                    }
                ],
                "poi": "mu_Sig",
            },
            "name": "meas",
        }
    ],
    "observations": [{"data": [0.0], "name": "channel1"}],
    "version": "1.0.0",
}

The following results are obtained:

>>>pyhf.infer.hypotest(1,[ndata]+model.config.auxdata,model, return_expected_set = True,calctype='asymptotics')

cls_obs=0.15568651504095096
cls_exp_vec=0.029689999899319162,0.0865343296156578,0.2282261734784785,0.49778548932756395,0.8050249023058168

>>>pyhf.infer.hypotest(1,[ndata]+model.config.auxdata,model, return_expected_set = True,calctype='toybased',ntoys=10000)

cls_obs=0.25570635410240594
cls_exp_vec=[0.25637368514886794,0.2772397094430993,1.0,1.0,1.0]

>>>pyhf.infer.hypotest(1,[ndata]+model.config.auxdata,model, return_expected_set = True,calctype='toybased',ntoys=2000000)

cls_obs=0.25505163552228566
cls_exp_vec=[0.25517989834993693,0.2721046189699792,1.0,1.0,1.0]

Do let me know if you need more information about environment or the actual files I am running (it is in an ipynb)

matthewfeickert · 2020-06-04T03:44:58Z

@darousso Thanks for your question and for using pyhf. As you're asking about a feature that hasn't been released yet with official support and isn't in the public API it would be helpful to get as much information as possible from you. Can you please give us the following information?

Version of pyhf used (here we need to know the branch and commit number that you're on as you're not using an official release)
A runnable script that reproduces the above results. (If this is easier to give us through a Gist (public or private) that is fine too).

As we haven't made a release with pseudoexperiment generation support yet it would be good to understand this now so that if there are issues we can resolve them before we publicly release this. So your question is very important and we appreciate you bringing it up (also for being a bold beta tester 🚀).

darousso · 2020-06-05T01:36:55Z

@matthewfeickert

Thanks for your super quick response!

I downloaded it via pip3 install --user git+https://github.com/scikit-hep/pyhf.git@toycalc

If I do pip3 show pyhf I get:

Name: pyhf
Version: 0.4.2.dev55
Summary: (partial) pure python histfactory implementation
Home-page: https://github.com/scikit-hep/pyhf
Author: Lukas Heinrich, Matthew Feickert, Giordon Stark
Author-email: lukas.heinrich@cern.ch, matthew.feickert@cern.ch, gstark@cern.ch
License: Apache
Location: /var/clus/usera/drousso/.local/lib/python3.6/site-packages
Requires: scipy, click, tqdm, jsonschema, jsonpatch, pyyaml
Required-by:

Python Notebook has been attached as a txt file since github is not letting me upload it (sorry).
Toys Debugging.txt

Thank you so much for helping look into this! And also thank you for developing pyhf (it is so much easier to use than HistFitter...)!

matthewfeickert · 2020-06-05T02:18:55Z

I downloaded it via pip3 install --user git+https://github.com/scikit-hep/pyhf.git@toycalc

Ah okay great. This actually has all the information we need, along with your notebook which you gave us. We can start debugging from here — thanks!

This might take some time to better understand the actual behavior, but once we get the pseudoexperiment generation support into master we can point you to a new dev release that you can install from TestPyPI. 👍

matthewfeickert · 2020-06-05T02:24:17Z

Python Notebook has been attached as a txt file since github is not letting me upload it

@darousso sorry, can you actually send us the notebook another way, or put it somewhere that we can access it through CERN? Unforunatley it seems to have gotten some of the cell formatting strangely messed up when I renamed the .txt to .ipynb and opened it.

darousso · 2020-06-05T02:42:46Z

@matthewfeickert Sorry, I am not exactly sure how best to attach it, so I have just directly sent it to your CERN email. Let me know if that's alright!

matthewfeickert · 2020-06-05T02:47:55Z

I have just directly sent it to your CERN email.

Perfect. Thanks!

matthewfeickert · 2020-06-05T03:21:17Z

Okay, I've just ripped the spec out and made everything a runnable Python script that I put in this Gist. This can serve as a debugging source, but I am able to replicate your results with the spec and running with JAX as you were:

# Asymptotics

cls_obs: [0.15568652]
cls_exp: [[0.02969   ]
 [0.08653433]
 [0.22822617]
 [0.49778549]
 [0.8050249 ]]

# Toys
                                                                                                                                                                                                            
cls_obs: [0.24197901]
cls_exp: [[0.25022422]
 [0.26773903]
 [1.        ]
 [1.        ]
 [1.        ]]

annmwang · 2020-06-08T20:04:45Z

Our analysis is also looking at using toys with pyhf, so I'm interested in this answer as well!

matthewfeickert · 2020-06-08T20:37:03Z

This will definitely be addressed along the way to the release of v0.5.0. We can't promise full support for anything though until it is in an official release, but v0.5.0 should be the next release that comes out, and the aim of the dev team is to have this be done sooner rather than later.

darousso · 2020-07-14T20:32:33Z

Out of curiosity, in what timeframe is v0.5.0 planned to be released?

matthewfeickert · 2020-07-15T04:39:17Z

Out of curiosity, in what timeframe is v0.5.0 planned to be released?

We don't have set timeframes, but it is a priority for us to get out. The 3 of us have been a bit swamped at the moment with preparing various software tutorials and conferences and there are some additional PRs that will need to go in as well before we can release v0.5.0 for technical reasons. This hasn't been abandoned though — this is just one of the last pieces in the chain.

cranmer · 2020-07-20T19:00:12Z

Note: very small numbers so asymptotics may not set in yet.

The systematic variations ucs_StopRHadron_1100_100000 are both less than the nominal, which may cause some strange behavior.

Is there a nice way to plot a histogram of the test statistic from the toys?

Is there a nice way to plot the values of the parameters being used to generate the toys (vs. mu)?

Does the aux data vary in the Toy generation?

darousso · 2020-09-08T13:13:30Z

Hello,

Sorry, am a bit confused with the roadmaps, has the fix been implemented in the current release?

-David

matthewfeickert · 2020-09-08T14:18:20Z

has the fix been implemented in the current release?

No, the related PR hasn't been merged.

am a bit confused with the roadmaps

This is scheduled for minor release v0.6.0.

darousso · 2020-09-08T14:22:07Z

Ah perfect, thanks!

darousso · 2020-10-05T13:48:34Z

(Sorry to bother again, I have just learned that our analysis intends on requesting EB in December, would the minor release be out by then?)

matthewfeickert · 2020-10-05T22:12:04Z

(Sorry to bother again, I have just learned that our analysis intends on requesting EB in December, would the minor release be out by then?)

hi @darousso, thanks for checking in. Yes, that is very much our plan.

We also now have an announcement mailing list that you can join to be updated on upcoming releases and general announcements: https://groups.google.com/group/pyhf-announcements/subscribe. If you join you'll be able to know a few days in advance of when you can expect v0.6.0 on PyPI.

lukasheinrich · 2020-10-29T02:44:42Z

hi @darousso, I think this is actually expected/understood. can you try to plot a brazil band for your setup? specificaclly with sufficiently high mu (say up to mu=5). At some point the upper sigmas should come down

matthewfeickert · 2020-10-29T02:58:29Z

@darousso following up on what @lukasheinrich is pointing to, this is probably related to the discussion in PR #966.

kratsg · 2020-10-29T03:44:43Z

below I show brazil bands for scans of µ \in [0.1, 5, 50]

asymptotics

toys (1k toys per µ)

lukasheinrich · 2020-10-29T10:51:28Z

thanks @kratsg these look reasonable. I should create a "learn notebook" to explain what the issue is. in short, since half oof the test stat distribution under signal hypothesis is a delta function CL_s+b can be either 1.0 or in the inverval [0.5.,0.0].. so CLs values can exhibit this type of discontinuity

in any case for an upper limit you're interested in the region where CLs ~0.05 so µ between 2 and 4 in the above plot

lukasheinrich · 2020-10-29T23:47:29Z

hi @darousso,

there'll be more explanation, but generally your setup seems ok. Using the additions in #1160 this is what your example looks like

existing comparison between toys and yours (slightly apples to oranges)
( clip = False)

adjusting asymptotics to make apples to apples comparison

VERSION 1 clip = True, inclusive_pvalue = False

VERSION 2 clip = True, inclusive_pvalue = True

lukasheinrich · 2020-11-09T13:02:20Z

the fixes in #1126 should also make these plots look nicer

darousso · 2020-11-23T18:54:36Z

Hi everyone, thank you so much for doing this!! Really sorry for the late reply, have been trying my best to try playing around with things out on my side with the new stuff.

Sorry, I am quite new to limit setting so I am a bit confused by the above discussion, I am assuming from the above that for the hypotest, the issue lies with the fact that curves are not constrained to be 1 at mu=0, which creates this saturation which is fixed with this "inclusive_pvalue=False" parameter in the learn_toys_funky branch? What does this parameter do? (The hypotest function won't accept the "clip" parameter for toys, I am assuming that is just for asymptotics?) Is the inclusive_pvalue the only thing I need to change in the new setup?

kratsg · 2021-02-04T14:08:32Z

As #1162 is merged, all that's left is #1160 / #993 (for the clipping).

Sorry, I am quite new to limit setting so I am a bit confused by the above discussion, I am assuming from the above that for the hypotest, the issue lies with the fact that curves are not constrained to be 1 at mu=0, which creates this saturation which is fixed with this "inclusive_pvalue=False" parameter in the learn_toys_funky branch? What does this parameter do? (The hypotest function won't accept the "clip" parameter for toys, I am assuming that is just for asymptotics?) Is the inclusive_pvalue the only thing I need to change in the new setup?

The fundamental issue is that for the test statistics here, you're looking at the distribution of the test statistic in the signal+background hypothesis to be a delta function (1.0) or in the interval of [0.5, 0.0] which is a bit discontinuous which is what you're seeing. One annoying issue at the same time is that the pvalues may not be ordered in the same way as the test statistics (when computing intervals: #1162). So this needed to get fixed at the same time.

We'll be trying to flesh out/clarify the API when 0.6.0 is released and we finish up all the remaining PRs.

matthewfeickert · 2021-02-11T23:26:08Z

I'm moving this to v0.6.1 even though PR #993 mostly resolves it as PR #1160 should be discussed more on what parts are needed. for the time being that has been moved to v0.6.1. PR #1160 also has a stub of a learn notebook on clipping.

darousso · 2021-03-06T00:25:39Z

Hi @kratsg and @matthewfeickert , thank you so much for all your help on this front, and my sincerest apologies for the late reply!

I have tried running things with the latest pyhf version (as well as with the learn_toys_funky branch back in November when there was the inclusive_pvalue option), I definitely see an improvement to last time (as back then I couldn't even get an exclusion curve), but I am still noticing some strange behaviour with the toys calculator, as it looks like the upper sigma bands have collapsed and the experimental curve is hugging the expected. Would you happen to know what I may be doing incorrect? (I notice the clipping option is only available for asymptotics?)

output = pyhf.infer.hypotest(mupoi,
                                         data,model, 
                                         return_expected_set = return_expected_set,
                                         calctype=calctype,
                                         ntoys=ntoys
                                        )

Once again, my most sincerest apologies if I am screwing up something really simple...

(I realize I shouldn't be putting ATLAS-specific plots here even if the results are already public. It is here: https://gitlab.cern.ch/drousso/dvjetsanalysis/-/blob/e9a82569b44a28b32d1f9668f3afc4f541a89641/DVMuonLimits/TestCondorMET_100000/Exclusion_Plot.png )

obrandt1 · 2021-03-17T14:05:33Z

Dear @matthewfeickert et al,
just a friendly question if you had time to look into this issue already?
Many thanks,
Oleg

darousso · 2021-07-20T20:14:25Z

Hello,

Sorry for bugging again, I just thought it may be more useful if I actually provided some example workspaces and a description of the behaviour in comparing pyhf and Histfitter in a couple slides.

https://indico.cern.ch/event/1059464/contributions/4452621/attachments/2281721/3883412/pyhf%20histfitter%20toys%20comparison%202021%2007%2020.pdf

Is the asymmetric bands expected behaviour and this is instead a bug in HistFitter?

-David

matthewfeickert · 2021-08-25T01:09:23Z

Note to devs : This Issue needs to be the focus of v0.6.4. I think as soon as this is resolved we should then move forward with releasing v0.6.4 and then move everything else left on the board to v0.7.0 and v07.1

matthewfeickert · 2021-09-28T03:47:12Z

@darousso With PR #1610 in I can try to revisit this in the back half of the week once I'm out of workshops. As was mentioned, the asymptotic approximaitons may not be valid here, so it probably makes sense to do comparison between HistFitter and pyhf pseudo-experiments and to reproduce @kratsg plots from earlier (#892 (comment)) with the updated code.

hepstr · 2022-02-23T12:06:34Z

Dear authors, this problem seems to persist in pyhf 0.6.3. Has there been any progress on it?
We have an analysis at an advanced stage that is impacted by it.

matthewfeickert · 2022-02-24T06:55:37Z

Dear authors, this problem seems to persist in pyhf 0.6.3. Has there been any progress on it? We have an analysis at an advanced stage that is impacted by it.

This needs to get revisited for the v0.7.0 release. Can you either share a minimal reproducible example of how your analysis is being affected, and/or see if your problem still exists if you use the dev release on TestPyPI?

hepstr · 2022-03-01T10:46:02Z

Minimal reproducible example:

import json
import matplotlib.pyplot as plt
import pyhf
from pyhf.contrib.viz import brazil
pyhf.set_backend('jax')

j = json.loads('{"channels": [{"name": "channel1", "samples": [{"data": [10.0, 0.2], "modifiers": [{"data": null, "name": "lumi", "type": "lumi"}, {"data": null, "name": "mu_sig", "type": "normfactor"}, {"data": {"hi": 1.1, "lo": 0.9}, "name": "bsm_uncerts", "type": "normsys"}], "name": "bsm_signal"}, {"data": [0.62, 1.94], "modifiers": [{"data": [0.31, 0.97], "name": "staterror_channel1", "type": "staterror"}, {"data": {"hi_data": [0.64, 2.033], "lo_data": [0.6, 1.847]}, "name": "non_linearity", "type": "histosys"}], "name": "bkg"}]}], "measurements": [{"config": {"parameters": [{"auxdata": [1.0], "bounds": [[0.5, 1.5]], "inits": [1.0], "name": "lumi", "sigmas": [0.017]}], "poi": "mu_sig"}, "name": "meas"}], "observations": [{"data": [0, 0], "name": "channel1"}], "version": "1.0.0"}')
ws = pyhf.Workspace(j)
model = ws.model()
data = ws.data(model)

poi_vals = [0.0, 0.1, 0.2, 0.3, 0.5, 1.0]
results = [
    pyhf.infer.hypotest(
        test_poi, data, model, test_stat="qtilde", return_expected_set=True, calctype='toybased', ntoys=10000
    )
    for test_poi in poi_vals
]

fig, ax = plt.subplots()
fig.set_size_inches(7, 5)
brazil.plot_results(poi_vals, results, ax=ax)
fig.savefig(f"analysis_example.pdf")
plt.close(fig)

The resulting plot looks like the attached one.

hepstr · 2022-03-01T10:57:17Z

…and running the exact same example as above in the current dev version of pyhf, 0.6.4.dev77, yields the following, very different result:

hepstr · 2022-03-04T14:25:26Z

Dear @matthewfeickert, all,
is there any insight into this? We may need to find an alternative to pyhf if we cannot resolve this, but that would of course be a pain.

Thank you,
Stefan

alexander-held · 2022-03-04T16:42:18Z

The difference pointed out above bisects to 9fbbbf9 (#1639) and affects the asymptotic calculation, see the examples below done with asymptotics:

5ea4e0a:

9fbbbf9:

matthewfeickert · 2022-03-04T16:44:41Z

c.f. Issue #1720 also. We have regression fixing to do.

alexander-held · 2022-03-04T16:59:25Z

As an intermediate solution, you can switch your staterror to shapesys in the workspace. The shapesys modifier does not seem to be affected. That does however change your constraint term from Gaussian to Poisson (#760).

kratsg · 2022-08-30T18:36:04Z

This should be currently fixed in master. Can we get confirmation?

kratsg · 2023-09-20T16:20:20Z

We've gotten no confirmation. Given that nobody's run into issues with the fixes provided by @lukasheinrich this will get closed.

matthewfeickert added question Further information is requested user request Request coming form a pyhf user labels Jun 4, 2020

kratsg added this to To do in v0.5.0 via automation Jun 23, 2020

matthewfeickert added this to To do in v0.6.0 via automation Jul 23, 2020

matthewfeickert removed this from To do in v0.5.0 Jul 23, 2020

lukasheinrich mentioned this issue Nov 9, 2020

Toys giving mysterious results #1161

Closed

kratsg moved this from To do to In progress in v0.6.0 Jan 20, 2021

matthewfeickert added this to To do in v0.6.1 via automation Feb 11, 2021

matthewfeickert removed this from In progress in v0.6.0 Feb 11, 2021

matthewfeickert removed this from To do in v0.6.1 Mar 5, 2021

matthewfeickert added this to To do in v0.6.2 via automation Mar 5, 2021

matthewfeickert added this to To do in v0.6.3 via automation Jun 21, 2021

matthewfeickert removed this from To do in v0.6.2 Jun 21, 2021

matthewfeickert removed this from To do in v0.6.3 Aug 25, 2021

matthewfeickert added this to To do in v0.6.4 via automation Aug 25, 2021

matthewfeickert removed this from To do in v0.6.4 Oct 22, 2021

matthewfeickert added this to To do in v0.7.0 via automation Oct 22, 2021

kratsg removed this from To do in v0.7.0 Sep 25, 2022

kratsg closed this as completed Sep 20, 2023

Toys seems to give strange results #892

Toys seems to give strange results #892

Comments

darousso commented Jun 4, 2020 • edited by matthewfeickert Loading

Question

Relevant Issues and Pull Requests

matthewfeickert commented Jun 4, 2020 • edited Loading

darousso commented Jun 5, 2020 • edited by matthewfeickert Loading

matthewfeickert commented Jun 5, 2020 • edited Loading

matthewfeickert commented Jun 5, 2020

darousso commented Jun 5, 2020

matthewfeickert commented Jun 5, 2020

matthewfeickert commented Jun 5, 2020 • edited Loading

annmwang commented Jun 8, 2020

matthewfeickert commented Jun 8, 2020

darousso commented Jul 14, 2020

matthewfeickert commented Jul 15, 2020

cranmer commented Jul 20, 2020

darousso commented Sep 8, 2020

matthewfeickert commented Sep 8, 2020 • edited Loading

darousso commented Sep 8, 2020

darousso commented Oct 5, 2020

matthewfeickert commented Oct 5, 2020

lukasheinrich commented Oct 29, 2020 • edited Loading

matthewfeickert commented Oct 29, 2020

kratsg commented Oct 29, 2020

lukasheinrich commented Oct 29, 2020 • edited Loading

lukasheinrich commented Oct 29, 2020 • edited by kratsg Loading

lukasheinrich commented Nov 9, 2020

darousso commented Nov 23, 2020

kratsg commented Feb 4, 2021

matthewfeickert commented Feb 11, 2021 • edited Loading

darousso commented Mar 6, 2021 • edited Loading

obrandt1 commented Mar 17, 2021

darousso commented Jul 20, 2021

matthewfeickert commented Aug 25, 2021 • edited Loading

matthewfeickert commented Sep 28, 2021

hepstr commented Feb 23, 2022 • edited Loading

matthewfeickert commented Feb 24, 2022

hepstr commented Mar 1, 2022

hepstr commented Mar 1, 2022

hepstr commented Mar 4, 2022

alexander-held commented Mar 4, 2022

matthewfeickert commented Mar 4, 2022 • edited Loading

alexander-held commented Mar 4, 2022

kratsg commented Aug 30, 2022

kratsg commented Sep 20, 2023

darousso commented Jun 4, 2020 •

edited by matthewfeickert

Loading

matthewfeickert commented Jun 4, 2020 •

edited

Loading

darousso commented Jun 5, 2020 •

edited by matthewfeickert

Loading

matthewfeickert commented Jun 5, 2020 •

edited

Loading

matthewfeickert commented Jun 5, 2020 •

edited

Loading

matthewfeickert commented Sep 8, 2020 •

edited

Loading

lukasheinrich commented Oct 29, 2020 •

edited

Loading

lukasheinrich commented Oct 29, 2020 •

edited

Loading

lukasheinrich commented Oct 29, 2020 •

edited by kratsg

Loading

matthewfeickert commented Feb 11, 2021 •

edited

Loading

darousso commented Mar 6, 2021 •

edited

Loading

matthewfeickert commented Aug 25, 2021 •

edited

Loading

hepstr commented Feb 23, 2022 •

edited

Loading

matthewfeickert commented Mar 4, 2022 •

edited

Loading