Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Toys giving mysterious results #1161

Closed
kpachal opened this issue Oct 30, 2020 · 18 comments · Fixed by #1162
Closed

Toys giving mysterious results #1161

kpachal opened this issue Oct 30, 2020 · 18 comments · Fixed by #1162
Assignees
Labels
question Further information is requested user request Request coming form a pyhf user

Comments

@kpachal
Copy link

kpachal commented Oct 30, 2020

Hi pyhf friends!

I'm playing with the newly supported toys in pyhf v0.5.4.dev3 - thank you so much for introducing them. I'm doing a mu scan to obtain an upper limit in a single bin region with 0 observed events. An example script is given at the bottom.

If I run my scan using asymptotics (see line 51 of the script),
val = pyhf.infer.hypotest(mu_seed[0], data, model, qtilde=True, return_expected_set=True,return_tail_probs=True)
I get this plot of the result:
result_mu

Which, other than being a better-than-zero result, is fine. If I run using toys (line 53 of the script),
val = pyhf.infer.hypotest(mu_seed[0], data, model, calctype='toybased', ntoys=int(sys.argv[1]),return_expected_set=True,return_tail_probs=True)
I get this output instead:

result_mu

Which ...... yeah. 😬

I had a poke around to see what might be going on and the underlying s-plus-b and b-only distributions look super weird. For that run of toys (10k), here's the two distributions, with the CLb and CLs+b values that were calculated from them, for one tested mu value of 0.552:

bonly_0p552

splusb_0p552

There's a ton of repetition of the exact same values and what looks like some kind of default upper value in the b-only distribution, but what it is changes with mu.

Any idea what might be going on here? I've filed this as a bug but it could also be that I'm just using this wrong.

Many thanks for your help, and again for adding the toys!

Cheers,
Kate

==================

Example script:

#!/usr/env/python

from multiprocessing import Pool

import sys
import pyhf

import matplotlib.pyplot as plt
import pyhf.contrib.viz.brazil

nSig = 4.166929245
errSig = 0
nBkg = 0.11
errBkgUp = 0.20
errBkgDown = 0.11

model_json = {
    "channels": [
        { "name": "SR_combined",
          "samples": [
            { "name": "signal",
              "data": [nSig],
              "modifiers": [ { "name": "mu", "type": "normfactor", "data": None},
              ]
            },
            { "name": "background",
              "data": [nBkg],
              "modifiers": [ {"name": "uncorr_bkguncrt", "type": "normsys", "data": {"hi": errBkgUp, "lo": errBkgDown}} ]
            }
          ]
        }
    ],
    "observations": [
        { "name": "SR_combined", "data": [0.0] }
    ],
    "measurements": [
        { "name": "Measurement", "config": {"poi": "mu", "parameters": []} }
    ],
    "version": "1.0.0"
}

ws = pyhf.Workspace(model_json)
model = ws.model()
data = ws.data(model)

def getCLS(mu_seed):
    
    # Asymptotics
    #val = pyhf.infer.hypotest(mu_seed[0], data, model, qtilde=True, return_expected_set=True,return_tail_probs=True) 
    # Toys
    val = pyhf.infer.hypotest(mu_seed[0], data, model, calctype='toybased', ntoys=int(sys.argv[1]),return_expected_set=True,return_tail_probs=True)
    return val

pool = Pool(processes=40)

final_norms = [1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3.0,3.2,3.4,3.6]
muInput = [i/nSig for i in final_norms]
nSeeds = 1

results = {}
mu_seedList = []
for seed in range(nSeeds):
    for mu in muInput:
        mu_seedList.append( [mu, seed] )

print("Seed; mu value; nSignal events; CLs")
results = pool.map(getCLS, mu_seedList)
result_plottable = []
CLsb_list = []
CLb_list = []
for result in results :
  plottable = ((result[0],result[-1]))
  result_plottable.append(plottable)
  CLsb_list.append(result[1][0])
  CLb_list.append(result[1][1])

fig, ax = plt.subplots()
fig.set_size_inches(5, 5)
ax.set_xlabel(r"$\mu$ (POI)")
ax.set_ylabel(r"$\mathrm{CL}_{s}$")
pyhf.contrib.viz.brazil.plot_results(ax, muInput, result_plottable)
plt.plot(muInput, CLsb_list, 'bo-')
plt.plot(muInput, CLb_list, 'ro-')
plt.savefig("result_mu.eps")
@lukasheinrich
Copy link
Contributor

lukasheinrich commented Oct 30, 2020

hm I wonder whether there is an issue with the way the bands are computed or plotted .. I mean certainly the ordering of the percentiles is not being observed. (why is the green band not in the yellow?!) (note to us: this could be a useful check to do before returning to the user, but I don't really see how this could ever happen.. but apparently it does)

I'd suggest to focus on a single µ value first . just dumping sth here

clearly the s.pvalue at 1.5 should be smaller than the b.pvalue, so I'm looking at that.

the nominal expected / observered look ok. That there is a discreteness to the test statistics is normal since they corresspond to indiividual counts and become noticable in the low stats regime.

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Oct 30, 2020

I think the culprit might be the test statistic value at 5. which is a delta peak (the other peaks have more internal structure so the pvalues are morer behaved) as kate points:

image

mu hunch is that there are slight numeric differences in the values associated with "zero" observed events. But since we create the "expected test statistics" from the bkg only sample, the positions from which we take the p-value are tuned to the b-only distribution, while the s+b distribution has undefined behavior towards those diffeernces

rounding to 6 decimal places in the test stat

s.samples = np.round(s.samples,6)
b.samples = np.round(b.samples,6)

seems to reinstate the true order of CLs values

edit: or rather its's probably enough to round the test stat cut values

@lukasheinrich
Copy link
Contributor

@kpachal : I just looked in more detail at your wspace. Your bkg ssystematic is configured such that the -1 and +1 settings both poinit downwards. Is this intended?

@kpachal
Copy link
Author

kpachal commented Oct 30, 2020

Hi Lukas,

Oh - no, definitely not intentional!! I was trying to use this asymmetric uncertainty because having a single component with size 0.2 was causing crashes, so I thought that splitting it up and making it never negative would fix things. But I guess it's implemented wrong? What about it means they both point downwards?

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Oct 30, 2020

the way it's in your workspace you have

nBkg = 0.11
errBkgUp = 0.20
errBkgDown = 0.11

which translates to:

nBkg = 0.11
nBkg_up = 0.20 * nBkg = 0.022
nBkg_dn = 0.11 * nBkg = 0.0121

i.e. the up/dn values in the spec are multiplicative leading to both values being smaller than the nominal val. This doesn't solve your issue yet, but I wanted to point this out.

@kpachal
Copy link
Author

kpachal commented Oct 30, 2020

Ooooh ok thank you very much! That makes it weirder that it was crashing with a single uncertainty of size 0.2 🤨 but will definitely investigate

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Oct 30, 2020

hi @kpachal: can you test w/ branch fix_for_kate I think the plot should look much saner now

if you check #1162 you see what a workspace looks like w/ a symmetric 10% uncert (e.g. up: 1.1, dn: 0.9)

@kpachal
Copy link
Author

kpachal commented Nov 2, 2020

Hi Lukas, thanks a lot! I did give it a shot yesterday with that branch and the results unfortunately still look strange to me. I then tested the exact same code you use in #1162 (as far as I can tell without a direct copy/paste) and don't get the same plot you show. I got this:
image

with this code:

import pyhf

nSig = 4.166929245
errSig = 0
nBkg = 0.1
errBkgUp = 1.1
errBkgDown = 0.9

model_json = {
    "channels": [
        {
            "name": "SR_combined",
            "samples": [
                {
                    "name": "signal",
                    "data": [nSig],
                    "modifiers": [
                        {"name": "mu", "type": "normfactor", "data": None},
                    ],
                },
                {
                    "name": "background",
                    "data": [nBkg],
                    "modifiers": [
                        {
                            "name": "uncorr_bkguncrt",
                            "type": "normsys",
                            "data": {"hi": errBkgUp, "lo": errBkgDown},
                        }
                    ],
                },
            ],
        }
    ],
    "observations": [{"name": "SR_combined", "data": [0.0]}],
    "measurements": [
        {"name": "Measurement", "config": {"poi": "mu", "parameters": []}}
    ],
    "version": "1.0.0",
}

ws = pyhf.Workspace(model_json)
model = ws.model()
data = ws.data(model)

import numpy
mutests = numpy.linspace(0,2.5,11)
results = [pyhf.infer.hypotest(mu, data, model, calctype='toybased', return_expected_set=True, ntoys=10000) for mu in mutests]

import matplotlib.pyplot as plt
from pyhf.contrib.viz.brazil import plot_results
f,ax = plt.subplots()
plot_results(ax,mutests,results)
plt.savefig("lukas_version.eps")

I'm pretty sure the branch installed correctly, but it is then a little weird that I don't get the same thing you do with it. I'll keep poking at it.

@kpachal
Copy link
Author

kpachal commented Nov 2, 2020

Update: I think I still am getting the delta function effect for the highest value in the b-only plot, so maybe I have just failed to set this up as I should have...

Screenshot 2020-11-02 at 09 43 46

@kpachal
Copy link
Author

kpachal commented Nov 3, 2020

Hi @lukasheinrich , just to confirm, I checked that I did indeed pick up the changes from the fix_for_kate branch properly and the results still look as above for me.

@lukasheinrich
Copy link
Contributor

hi @kpachal I put some updates into #1162 . tldr: adding lumi uncert i think becomes important

@lukasheinrich
Copy link
Contributor

this should now be definitively fixed by #1162

@kpachal
Copy link
Author

kpachal commented Nov 9, 2020

Thank you very much @lukasheinrich !! I'll be testing it today, very excited for it 😄

@kpachal
Copy link
Author

kpachal commented Nov 9, 2020

Just wanted to say thanks again - I have tested the modified code and the results do make more sense to me. To report a few here, running the signal point i've demonstrated before with the full size background uncertainty and no signal uncertainty I get this result:

Screenshot 2020-11-09 at 19 23 14

which looks very similar to the example you have in the MR and I think very reasonable.

With a 100% signal uncertainty (as we have at this point in reality) I get an unusual looking result with 10k events:
Screenshot 2020-11-09 at 19 23 51

The giant green band is not necessarily wrong, I need to break this down and look more closely at the test statistic distributions, and in fact the mu limit > 1 is what we expect to see at this point. So it's funny looking but possibly OK, and I will do some more tests.

Thanks again!! This is a huge improvement.

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Nov 9, 2020

hi @kpachal can you scan towards higher µ (given that the uncertainties are higher the upper limit/0.05 crossing opint will be weaker) .. maybe upp to µ=5. That the values saturate all the way to CLs=1 has ssomething to do w/ #1160 #892 see e.g. plot at the bottom of where the green band ssaturates up to µ=0.5 or so #892 (comment)

@kpachal
Copy link
Author

kpachal commented Nov 10, 2020

Hi Lukas,

Sure - attached see up to 4.5. Pretty solidly over the observed limit here but we're still looking pretty green 😉

Screenshot 2020-11-10 at 08 41 13

I also attach some b and s+b distributions i made with an earlier plot (mu value 0.864):
splusb_0p864 bonly_0p864
There's one tall spike at zero behind the red line in s+b.

Here's the model I'm using now:

errSig = 4.166929245
nBkg = 0.11
errBkgUp = 2.0
errBkgDown = 0.001

model_json = {
    "channels": [
        { "name": "SR_combined",
          "samples": [
            { "name": "signal",
              "data": [nSig],
              "modifiers": [ { "name": "mu", "type": "normfactor", "data": None},
                              {"name": "uncorr_siguncrt", "type": "shapesys", "data": [errSig]} 
              ]
            },
            { "name": "background",
              "data": [nBkg],
              "modifiers": [ {"name": "uncorr_bkguncrt", "type": "normsys", "data": {"hi": errBkgUp, "lo": errBkgDown}} ]
            }
          ]
        }
    ],
    "observations": [
        { "name": "SR_combined", "data": [0.0] }
    ],
    "measurements": [
        { "name": "Measurement", "config": {"poi": "mu", "parameters": []} }
    ],
    "version": "1.0.0"
}

I did check the documentation when setting up that signal uncertainty, but it's still possible I've messed this one up too and made it 400% ...

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Nov 10, 2020

the poissonian constraint of shapesys is probably not what you want, since for low counts the constraint term is asymmetric. Most people uses normsys (with a gaussian constraint) a la

                        {
                            "data": {
                                "hi": 1.5,
                                "lo": 0.5
                            },
                            "name": "uncorr_siguncrt",
                            "type": "normsys"
                        }

for 50% uncert. But in this regime the interaction between constraint terms and the main likelihood become tricky because adding a 100% in this manner would mean that 15% of the time you predict a signal yield of 0 and so a scenario indistinguishablel from b-only is only a 1-sigma pull away. I'm sure the stat committee has some recommendation here (or previous analyses)

FWIW this is my plot with normssys and 50% uncert on signall and 10% on bkg

notice the cell where the expected data for -1,0,+1 settings of the signal uncert is shown (2.1,4.1,6.1) for bkg=0.1 and signall 4+/-2

note also that I have a 1% lumi uncertainty

image

{
    "channels": [
        {
            "name": "SR_combined",
            "samples": [
                {
                    "data": [
                        4
                    ],
                    "modifiers": [
                        {
                            "data": null,
                            "name": "lumi",
                            "type": "lumi"
                        },
                        {
                            "data": null,
                            "name": "mu",
                            "type": "normfactor"
                        },
                        {
                            "data": {
                                "hi": 1.5,
                                "lo": 0.5
                            },
                            "name": "uncorr_siguncrt",
                            "type": "normsys"
                        }

                    ],
                    "name": "signal"
                },
                {
                    "data": [
                        0.1
                    ],
                    "modifiers": [
                        {
                            "data": null,
                            "name": "lumi",
                            "type": "lumi"
                        },
                        {
                            "data": {
                                "hi": 1.1,
                                "lo": 0.9
                            },
                            "name": "uncorr_bkguncrt",
                            "type": "normsys"
                        }
                    ],
                    "name": "background"
                }
            ]
        }
    ],
    "measurements": [
        {
            "config": {
                "parameters": [
                    {
                        "auxdata": [
                            1.0
                        ],
                        "bounds": [
                            [
                                0.95,
                                1.05
                            ]
                        ],
                        "inits": [
                            1.0
                        ],
                        "name": "lumi",
                        "sigmas": [
                            0.01
                        ]
                    },
                    {
                        "bounds": [
                            [
                                0.0,
                                10.0
                            ]
                        ],
                        "inits": [
                            1.0
                        ],
                        "name": "mu"
                    }
                ],
                "poi": "mu"
            },
            "name": "Measurement"
        }
    ],
    "observations": [
        {
            "data": [
                0
            ],
            "name": "SR_combined"
        }
    ],
    "version": "1.0.0"
}

@kpachal
Copy link
Author

kpachal commented Nov 10, 2020

Hi Lukas,

Thanks for the tip on the uncertainty format! I'll dig around and see if I can find a stat forum recommendation for how to handle 100% uncertainties.

Cheers,
Kate

@matthewfeickert matthewfeickert added question Further information is requested user request Request coming form a pyhf user labels Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested user request Request coming form a pyhf user
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants