Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Toys seems to give strange results #892

Closed
darousso opened this issue Jun 4, 2020 · 41 comments
Closed

Toys seems to give strange results #892

darousso opened this issue Jun 4, 2020 · 41 comments
Labels
question Further information is requested user request Request coming form a pyhf user

Comments

@darousso
Copy link

darousso commented Jun 4, 2020

Question

I have noticed that the toys hypotest seems to give some strange values even if the asymptotics gives regular looking values.

In general the cls_exp upper sigma values seem to be all saturated at 1 for all signal points, and this does not seem to improve when increasing the number of toys.

Relevant Issues and Pull Requests

An example signal point model is the following:

{
    "channels": [
        {
            "name": "channel1",
            "samples": [
                {
                    "data": [1.364790054231882],
                    "modifiers": [
                        {"data": None, "name": "lumi", "type": "lumi"},
                        {"data": None, "name": "mu_Sig", "type": "normfactor"},
                        {
                            "data": {"hi": 1.228925751097454, "lo": 0.7710742489025461},
                            "name": "ucs_StopRHadron_1100_100000",
                            "type": "normsys",
                        },
                    ],
                    "name": "StopRHadron_1100_100000",
                },
                {
                    "data": [0.43],
                    "modifiers": [
                        {
                            "data": [0.16],
                            "name": "staterror_channel1",
                            "type": "staterror",
                        }
                    ],
                    "name": "Bkg",
                },
            ],
        }
    ],
    "measurements": [
        {
            "config": {
                "parameters": [
                    {
                        "auxdata": [1.0],
                        "bounds": [[0.5, 1.5]],
                        "inits": [1.0],
                        "name": "lumi",
                        "sigmas": [0.017],
                    }
                ],
                "poi": "mu_Sig",
            },
            "name": "meas",
        }
    ],
    "observations": [{"data": [0.0], "name": "channel1"}],
    "version": "1.0.0",
}

The following results are obtained:

>>>pyhf.infer.hypotest(1,[ndata]+model.config.auxdata,model, return_expected_set = True,calctype='asymptotics')

cls_obs=0.15568651504095096
cls_exp_vec=0.029689999899319162,0.0865343296156578,0.2282261734784785,0.49778548932756395,0.8050249023058168
>>>pyhf.infer.hypotest(1,[ndata]+model.config.auxdata,model, return_expected_set = True,calctype='toybased',ntoys=10000)

cls_obs=0.25570635410240594
cls_exp_vec=[0.25637368514886794,0.2772397094430993,1.0,1.0,1.0]
>>>pyhf.infer.hypotest(1,[ndata]+model.config.auxdata,model, return_expected_set = True,calctype='toybased',ntoys=2000000)

cls_obs=0.25505163552228566
cls_exp_vec=[0.25517989834993693,0.2721046189699792,1.0,1.0,1.0]

Do let me know if you need more information about environment or the actual files I am running (it is in an ipynb)

@matthewfeickert matthewfeickert added question Further information is requested user request Request coming form a pyhf user labels Jun 4, 2020
@matthewfeickert
Copy link
Member

matthewfeickert commented Jun 4, 2020

@darousso Thanks for your question and for using pyhf. As you're asking about a feature that hasn't been released yet with official support and isn't in the public API it would be helpful to get as much information as possible from you. Can you please give us the following information?

  • Version of pyhf used (here we need to know the branch and commit number that you're on as you're not using an official release)
  • A runnable script that reproduces the above results. (If this is easier to give us through a Gist (public or private) that is fine too).

As we haven't made a release with pseudoexperiment generation support yet it would be good to understand this now so that if there are issues we can resolve them before we publicly release this. So your question is very important and we appreciate you bringing it up (also for being a bold beta tester 🚀).

@darousso
Copy link
Author

darousso commented Jun 5, 2020

@matthewfeickert

Thanks for your super quick response!

I downloaded it via pip3 install --user git+https://github.com/scikit-hep/pyhf.git@toycalc

If I do pip3 show pyhf I get:

Name: pyhf
Version: 0.4.2.dev55
Summary: (partial) pure python histfactory implementation
Home-page: https://github.com/scikit-hep/pyhf
Author: Lukas Heinrich, Matthew Feickert, Giordon Stark
Author-email: lukas.heinrich@cern.ch, matthew.feickert@cern.ch, gstark@cern.ch
License: Apache
Location: /var/clus/usera/drousso/.local/lib/python3.6/site-packages
Requires: scipy, click, tqdm, jsonschema, jsonpatch, pyyaml
Required-by:

Python Notebook has been attached as a txt file since github is not letting me upload it (sorry).
Toys Debugging.txt

Thank you so much for helping look into this! And also thank you for developing pyhf (it is so much easier to use than HistFitter...)!

@matthewfeickert
Copy link
Member

matthewfeickert commented Jun 5, 2020

I downloaded it via pip3 install --user git+https://github.com/scikit-hep/pyhf.git@toycalc

Ah okay great. This actually has all the information we need, along with your notebook which you gave us. We can start debugging from here — thanks!

This might take some time to better understand the actual behavior, but once we get the pseudoexperiment generation support into master we can point you to a new dev release that you can install from TestPyPI. 👍

@matthewfeickert
Copy link
Member

Python Notebook has been attached as a txt file since github is not letting me upload it

@darousso sorry, can you actually send us the notebook another way, or put it somewhere that we can access it through CERN? Unforunatley it seems to have gotten some of the cell formatting strangely messed up when I renamed the .txt to .ipynb and opened it.

@darousso
Copy link
Author

darousso commented Jun 5, 2020

@matthewfeickert Sorry, I am not exactly sure how best to attach it, so I have just directly sent it to your CERN email. Let me know if that's alright!

@matthewfeickert
Copy link
Member

I have just directly sent it to your CERN email.

Perfect. Thanks!

@matthewfeickert
Copy link
Member

matthewfeickert commented Jun 5, 2020

Okay, I've just ripped the spec out and made everything a runnable Python script that I put in this Gist. This can serve as a debugging source, but I am able to replicate your results with the spec and running with JAX as you were:

# Asymptotics

cls_obs: [0.15568652]
cls_exp: [[0.02969   ]
 [0.08653433]
 [0.22822617]
 [0.49778549]
 [0.8050249 ]]

# Toys
                                                                                                                                                                                                            
cls_obs: [0.24197901]
cls_exp: [[0.25022422]
 [0.26773903]
 [1.        ]
 [1.        ]
 [1.        ]]

@annmwang
Copy link

annmwang commented Jun 8, 2020

Our analysis is also looking at using toys with pyhf, so I'm interested in this answer as well!

@matthewfeickert
Copy link
Member

This will definitely be addressed along the way to the release of v0.5.0. We can't promise full support for anything though until it is in an official release, but v0.5.0 should be the next release that comes out, and the aim of the dev team is to have this be done sooner rather than later.

@kratsg kratsg added this to To do in v0.5.0 via automation Jun 23, 2020
@darousso
Copy link
Author

Out of curiosity, in what timeframe is v0.5.0 planned to be released?

@matthewfeickert
Copy link
Member

Out of curiosity, in what timeframe is v0.5.0 planned to be released?

We don't have set timeframes, but it is a priority for us to get out. The 3 of us have been a bit swamped at the moment with preparing various software tutorials and conferences and there are some additional PRs that will need to go in as well before we can release v0.5.0 for technical reasons. This hasn't been abandoned though — this is just one of the last pieces in the chain.

@cranmer
Copy link

cranmer commented Jul 20, 2020

Note: very small numbers so asymptotics may not set in yet.

The systematic variations ucs_StopRHadron_1100_100000 are both less than the nominal, which may cause some strange behavior.

Is there a nice way to plot a histogram of the test statistic from the toys?

Is there a nice way to plot the values of the parameters being used to generate the toys (vs. mu)?

Does the aux data vary in the Toy generation?

@matthewfeickert matthewfeickert added this to To do in v0.6.0 via automation Jul 23, 2020
@matthewfeickert matthewfeickert removed this from To do in v0.5.0 Jul 23, 2020
@darousso
Copy link
Author

darousso commented Sep 8, 2020

Hello,

Sorry, am a bit confused with the roadmaps, has the fix been implemented in the current release?

-David

@matthewfeickert
Copy link
Member

matthewfeickert commented Sep 8, 2020

has the fix been implemented in the current release?

No, the related PR hasn't been merged.

am a bit confused with the roadmaps

This is scheduled for minor release v0.6.0.

@darousso
Copy link
Author

darousso commented Sep 8, 2020

Ah perfect, thanks!

@darousso
Copy link
Author

darousso commented Oct 5, 2020

(Sorry to bother again, I have just learned that our analysis intends on requesting EB in December, would the minor release be out by then?)

@matthewfeickert
Copy link
Member

(Sorry to bother again, I have just learned that our analysis intends on requesting EB in December, would the minor release be out by then?)

hi @darousso, thanks for checking in. Yes, that is very much our plan.

We also now have an announcement mailing list that you can join to be updated on upcoming releases and general announcements: https://groups.google.com/group/pyhf-announcements/subscribe. If you join you'll be able to know a few days in advance of when you can expect v0.6.0 on PyPI.

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Oct 29, 2020

hi @darousso, I think this is actually expected/understood. can you try to plot a brazil band for your setup? specificaclly with sufficiently high mu (say up to mu=5). At some point the upper sigmas should come down

@matthewfeickert
Copy link
Member

@darousso following up on what @lukasheinrich is pointing to, this is probably related to the discussion in PR #966.

@kratsg
Copy link
Contributor

kratsg commented Oct 29, 2020

below I show brazil bands for scans of µ \in [0.1, 5, 50]

asymptotics

Screen Shot 2020-10-28 at 11 35 45 PM

toys (1k toys per µ)

Screen Shot 2020-10-28 at 11 41 49 PM

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Oct 29, 2020

thanks @kratsg these look reasonable. I should create a "learn notebook" to explain what the issue is. in short, since half oof the test stat distribution under signal hypothesis is a delta function CL_s+b can be either 1.0 or in the inverval [0.5.,0.0].. so CLs values can exhibit this type of discontinuity

in any case for an upper limit you're interested in the region where CLs ~0.05 so µ between 2 and 4 in the above plot

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Oct 29, 2020

hi @darousso,

there'll be more explanation, but generally your setup seems ok. Using the additions in #1160 this is what your example looks like

existing comparison between toys and yours (slightly apples to oranges)
( clip = False)

adjusting asymptotics to make apples to apples comparison

VERSION 1 clip = True, inclusive_pvalue = False

VERSION 2 clip = True, inclusive_pvalue = True

@lukasheinrich
Copy link
Contributor

the fixes in #1126 should also make these plots look nicer

@darousso
Copy link
Author

Hi everyone, thank you so much for doing this!! Really sorry for the late reply, have been trying my best to try playing around with things out on my side with the new stuff.

Sorry, I am quite new to limit setting so I am a bit confused by the above discussion, I am assuming from the above that for the hypotest, the issue lies with the fact that curves are not constrained to be 1 at mu=0, which creates this saturation which is fixed with this "inclusive_pvalue=False" parameter in the learn_toys_funky branch? What does this parameter do? (The hypotest function won't accept the "clip" parameter for toys, I am assuming that is just for asymptotics?) Is the inclusive_pvalue the only thing I need to change in the new setup?

@kratsg kratsg moved this from To do to In progress in v0.6.0 Jan 20, 2021
@kratsg
Copy link
Contributor

kratsg commented Feb 4, 2021

As #1162 is merged, all that's left is #1160 / #993 (for the clipping).

Sorry, I am quite new to limit setting so I am a bit confused by the above discussion, I am assuming from the above that for the hypotest, the issue lies with the fact that curves are not constrained to be 1 at mu=0, which creates this saturation which is fixed with this "inclusive_pvalue=False" parameter in the learn_toys_funky branch? What does this parameter do? (The hypotest function won't accept the "clip" parameter for toys, I am assuming that is just for asymptotics?) Is the inclusive_pvalue the only thing I need to change in the new setup?

The fundamental issue is that for the test statistics here, you're looking at the distribution of the test statistic in the signal+background hypothesis to be a delta function (1.0) or in the interval of [0.5, 0.0] which is a bit discontinuous which is what you're seeing. One annoying issue at the same time is that the pvalues may not be ordered in the same way as the test statistics (when computing intervals: #1162). So this needed to get fixed at the same time.

We'll be trying to flesh out/clarify the API when 0.6.0 is released and we finish up all the remaining PRs.

@matthewfeickert matthewfeickert added this to To do in v0.6.1 via automation Feb 11, 2021
@matthewfeickert matthewfeickert removed this from In progress in v0.6.0 Feb 11, 2021
@matthewfeickert
Copy link
Member

matthewfeickert commented Feb 11, 2021

I'm moving this to v0.6.1 even though PR #993 mostly resolves it as PR #1160 should be discussed more on what parts are needed. for the time being that has been moved to v0.6.1. PR #1160 also has a stub of a learn notebook on clipping.

@matthewfeickert matthewfeickert removed this from To do in v0.6.1 Mar 5, 2021
@matthewfeickert matthewfeickert added this to To do in v0.6.2 via automation Mar 5, 2021
@darousso
Copy link
Author

darousso commented Mar 6, 2021

Hi @kratsg and @matthewfeickert , thank you so much for all your help on this front, and my sincerest apologies for the late reply!

I have tried running things with the latest pyhf version (as well as with the learn_toys_funky branch back in November when there was the inclusive_pvalue option), I definitely see an improvement to last time (as back then I couldn't even get an exclusion curve), but I am still noticing some strange behaviour with the toys calculator, as it looks like the upper sigma bands have collapsed and the experimental curve is hugging the expected. Would you happen to know what I may be doing incorrect? (I notice the clipping option is only available for asymptotics?)

output = pyhf.infer.hypotest(mupoi,
                                         data,model, 
                                         return_expected_set = return_expected_set,
                                         calctype=calctype,
                                         ntoys=ntoys
                                        )

Once again, my most sincerest apologies if I am screwing up something really simple...

(I realize I shouldn't be putting ATLAS-specific plots here even if the results are already public. It is here: https://gitlab.cern.ch/drousso/dvjetsanalysis/-/blob/e9a82569b44a28b32d1f9668f3afc4f541a89641/DVMuonLimits/TestCondorMET_100000/Exclusion_Plot.png )

@obrandt1
Copy link

Dear @matthewfeickert et al,
just a friendly question if you had time to look into this issue already?
Many thanks,
Oleg

@matthewfeickert matthewfeickert added this to To do in v0.6.3 via automation Jun 21, 2021
@matthewfeickert matthewfeickert removed this from To do in v0.6.2 Jun 21, 2021
@darousso
Copy link
Author

Hello,

Sorry for bugging again, I just thought it may be more useful if I actually provided some example workspaces and a description of the behaviour in comparing pyhf and Histfitter in a couple slides.

https://indico.cern.ch/event/1059464/contributions/4452621/attachments/2281721/3883412/pyhf%20histfitter%20toys%20comparison%202021%2007%2020.pdf

Is the asymmetric bands expected behaviour and this is instead a bug in HistFitter?

-David

@matthewfeickert matthewfeickert removed this from To do in v0.6.3 Aug 25, 2021
@matthewfeickert matthewfeickert added this to To do in v0.6.4 via automation Aug 25, 2021
@matthewfeickert
Copy link
Member

matthewfeickert commented Aug 25, 2021

Note to devs : This Issue needs to be the focus of v0.6.4. I think as soon as this is resolved we should then move forward with releasing v0.6.4 and then move everything else left on the board to v0.7.0 and v07.1

@matthewfeickert
Copy link
Member

@darousso With PR #1610 in I can try to revisit this in the back half of the week once I'm out of workshops. As was mentioned, the asymptotic approximaitons may not be valid here, so it probably makes sense to do comparison between HistFitter and pyhf pseudo-experiments and to reproduce @kratsg plots from earlier (#892 (comment)) with the updated code.

@matthewfeickert matthewfeickert removed this from To do in v0.6.4 Oct 22, 2021
@matthewfeickert matthewfeickert added this to To do in v0.7.0 via automation Oct 22, 2021
@hepstr
Copy link

hepstr commented Feb 23, 2022

Dear authors, this problem seems to persist in pyhf 0.6.3. Has there been any progress on it?
We have an analysis at an advanced stage that is impacted by it.

@matthewfeickert
Copy link
Member

Dear authors, this problem seems to persist in pyhf 0.6.3. Has there been any progress on it? We have an analysis at an advanced stage that is impacted by it.

This needs to get revisited for the v0.7.0 release. Can you either share a minimal reproducible example of how your analysis is being affected, and/or see if your problem still exists if you use the dev release on TestPyPI?

@hepstr
Copy link

hepstr commented Mar 1, 2022

Minimal reproducible example:

import json
import matplotlib.pyplot as plt
import pyhf
from pyhf.contrib.viz import brazil
pyhf.set_backend('jax')

j = json.loads('{"channels": [{"name": "channel1", "samples": [{"data": [10.0, 0.2], "modifiers": [{"data": null, "name": "lumi", "type": "lumi"}, {"data": null, "name": "mu_sig", "type": "normfactor"}, {"data": {"hi": 1.1, "lo": 0.9}, "name": "bsm_uncerts", "type": "normsys"}], "name": "bsm_signal"}, {"data": [0.62, 1.94], "modifiers": [{"data": [0.31, 0.97], "name": "staterror_channel1", "type": "staterror"}, {"data": {"hi_data": [0.64, 2.033], "lo_data": [0.6, 1.847]}, "name": "non_linearity", "type": "histosys"}], "name": "bkg"}]}], "measurements": [{"config": {"parameters": [{"auxdata": [1.0], "bounds": [[0.5, 1.5]], "inits": [1.0], "name": "lumi", "sigmas": [0.017]}], "poi": "mu_sig"}, "name": "meas"}], "observations": [{"data": [0, 0], "name": "channel1"}], "version": "1.0.0"}')
ws = pyhf.Workspace(j)
model = ws.model()
data = ws.data(model)

poi_vals = [0.0, 0.1, 0.2, 0.3, 0.5, 1.0]
results = [
    pyhf.infer.hypotest(
        test_poi, data, model, test_stat="qtilde", return_expected_set=True, calctype='toybased', ntoys=10000
    )
    for test_poi in poi_vals
]

fig, ax = plt.subplots()
fig.set_size_inches(7, 5)
brazil.plot_results(poi_vals, results, ax=ax)
fig.savefig(f"analysis_example.pdf")
plt.close(fig)

The resulting plot looks like the attached one.

analysis_example

@hepstr
Copy link

hepstr commented Mar 1, 2022

…and running the exact same example as above in the current dev version of pyhf, 0.6.4.dev77, yields the following, very different result:

analysis_example_0-6-4-dev77

@hepstr
Copy link

hepstr commented Mar 4, 2022

Dear @matthewfeickert, all,
is there any insight into this? We may need to find an alternative to pyhf if we cannot resolve this, but that would of course be a pain.

Thank you,
Stefan

@alexander-held
Copy link
Member

The difference pointed out above bisects to 9fbbbf9 (#1639) and affects the asymptotic calculation, see the examples below done with asymptotics:

5ea4e0a:
analysis_example

9fbbbf9:
analysis_example

@matthewfeickert
Copy link
Member

matthewfeickert commented Mar 4, 2022

c.f. Issue #1720 also. We have regression fixing to do.

@alexander-held
Copy link
Member

As an intermediate solution, you can switch your staterror to shapesys in the workspace. The shapesys modifier does not seem to be affected. That does however change your constraint term from Gaussian to Poisson (#760).

@kratsg
Copy link
Contributor

kratsg commented Aug 30, 2022

This should be currently fixed in master. Can we get confirmation?

@kratsg kratsg removed this from To do in v0.7.0 Sep 25, 2022
@kratsg
Copy link
Contributor

kratsg commented Sep 20, 2023

We've gotten no confirmation. Given that nobody's run into issues with the fixes provided by @lukasheinrich this will get closed.

@kratsg kratsg closed this as completed Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested user request Request coming form a pyhf user
Projects
Status: Done
Development

No branches or pull requests

9 participants