Toys giving mysterious results #1161

kpachal · 2020-10-30T13:08:31Z

Hi pyhf friends!

I'm playing with the newly supported toys in pyhf v0.5.4.dev3 - thank you so much for introducing them. I'm doing a mu scan to obtain an upper limit in a single bin region with 0 observed events. An example script is given at the bottom.

If I run my scan using asymptotics (see line 51 of the script),
val = pyhf.infer.hypotest(mu_seed[0], data, model, qtilde=True, return_expected_set=True,return_tail_probs=True)
I get this plot of the result:

Which, other than being a better-than-zero result, is fine. If I run using toys (line 53 of the script),
val = pyhf.infer.hypotest(mu_seed[0], data, model, calctype='toybased', ntoys=int(sys.argv[1]),return_expected_set=True,return_tail_probs=True)
I get this output instead:

Which ...... yeah. 😬

I had a poke around to see what might be going on and the underlying s-plus-b and b-only distributions look super weird. For that run of toys (10k), here's the two distributions, with the CLb and CLs+b values that were calculated from them, for one tested mu value of 0.552:

There's a ton of repetition of the exact same values and what looks like some kind of default upper value in the b-only distribution, but what it is changes with mu.

Any idea what might be going on here? I've filed this as a bug but it could also be that I'm just using this wrong.

Many thanks for your help, and again for adding the toys!

Cheers,
Kate

==================

Example script:

#!/usr/env/python

from multiprocessing import Pool

import sys
import pyhf

import matplotlib.pyplot as plt
import pyhf.contrib.viz.brazil

nSig = 4.166929245
errSig = 0
nBkg = 0.11
errBkgUp = 0.20
errBkgDown = 0.11

model_json = {
    "channels": [
        { "name": "SR_combined",
          "samples": [
            { "name": "signal",
              "data": [nSig],
              "modifiers": [ { "name": "mu", "type": "normfactor", "data": None},
              ]
            },
            { "name": "background",
              "data": [nBkg],
              "modifiers": [ {"name": "uncorr_bkguncrt", "type": "normsys", "data": {"hi": errBkgUp, "lo": errBkgDown}} ]
            }
          ]
        }
    ],
    "observations": [
        { "name": "SR_combined", "data": [0.0] }
    ],
    "measurements": [
        { "name": "Measurement", "config": {"poi": "mu", "parameters": []} }
    ],
    "version": "1.0.0"
}

ws = pyhf.Workspace(model_json)
model = ws.model()
data = ws.data(model)

def getCLS(mu_seed):
    
    # Asymptotics
    #val = pyhf.infer.hypotest(mu_seed[0], data, model, qtilde=True, return_expected_set=True,return_tail_probs=True) 
    # Toys
    val = pyhf.infer.hypotest(mu_seed[0], data, model, calctype='toybased', ntoys=int(sys.argv[1]),return_expected_set=True,return_tail_probs=True)
    return val

pool = Pool(processes=40)

final_norms = [1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3.0,3.2,3.4,3.6]
muInput = [i/nSig for i in final_norms]
nSeeds = 1

results = {}
mu_seedList = []
for seed in range(nSeeds):
    for mu in muInput:
        mu_seedList.append( [mu, seed] )

print("Seed; mu value; nSignal events; CLs")
results = pool.map(getCLS, mu_seedList)
result_plottable = []
CLsb_list = []
CLb_list = []
for result in results :
  plottable = ((result[0],result[-1]))
  result_plottable.append(plottable)
  CLsb_list.append(result[1][0])
  CLb_list.append(result[1][1])

fig, ax = plt.subplots()
fig.set_size_inches(5, 5)
ax.set_xlabel(r"$\mu$ (POI)")
ax.set_ylabel(r"$\mathrm{CL}_{s}$")
pyhf.contrib.viz.brazil.plot_results(ax, muInput, result_plottable)
plt.plot(muInput, CLsb_list, 'bo-')
plt.plot(muInput, CLb_list, 'ro-')
plt.savefig("result_mu.eps")

The text was updated successfully, but these errors were encountered:

lukasheinrich · 2020-10-30T13:42:20Z

hm I wonder whether there is an issue with the way the bands are computed or plotted .. I mean certainly the ordering of the percentiles is not being observed. (why is the green band not in the yellow?!) (note to us: this could be a useful check to do before returning to the user, but I don't really see how this could ever happen.. but apparently it does)

I'd suggest to focus on a single µ value first . just dumping sth here

clearly the s.pvalue at 1.5 should be smaller than the b.pvalue, so I'm looking at that.

the nominal expected / observered look ok. That there is a discreteness to the test statistics is normal since they corresspond to indiividual counts and become noticable in the low stats regime.

lukasheinrich · 2020-10-30T14:06:15Z

I think the culprit might be the test statistic value at 5. which is a delta peak (the other peaks have more internal structure so the pvalues are morer behaved) as kate points:

mu hunch is that there are slight numeric differences in the values associated with "zero" observed events. But since we create the "expected test statistics" from the bkg only sample, the positions from which we take the p-value are tuned to the b-only distribution, while the s+b distribution has undefined behavior towards those diffeernces

rounding to 6 decimal places in the test stat

s.samples = np.round(s.samples,6)
b.samples = np.round(b.samples,6)

seems to reinstate the true order of CLs values

edit: or rather its's probably enough to round the test stat cut values

lukasheinrich · 2020-10-30T15:09:10Z

@kpachal : I just looked in more detail at your wspace. Your bkg ssystematic is configured such that the -1 and +1 settings both poinit downwards. Is this intended?

kpachal · 2020-10-30T15:13:15Z

Hi Lukas,

Oh - no, definitely not intentional!! I was trying to use this asymmetric uncertainty because having a single component with size 0.2 was causing crashes, so I thought that splitting it up and making it never negative would fix things. But I guess it's implemented wrong? What about it means they both point downwards?

lukasheinrich · 2020-10-30T15:21:08Z

the way it's in your workspace you have

nBkg = 0.11
errBkgUp = 0.20
errBkgDown = 0.11

which translates to:

nBkg = 0.11
nBkg_up = 0.20 * nBkg = 0.022
nBkg_dn = 0.11 * nBkg = 0.0121

i.e. the up/dn values in the spec are multiplicative leading to both values being smaller than the nominal val. This doesn't solve your issue yet, but I wanted to point this out.

kpachal · 2020-10-30T15:45:51Z

Ooooh ok thank you very much! That makes it weirder that it was crashing with a single uncertainty of size 0.2 🤨 but will definitely investigate

lukasheinrich · 2020-10-30T16:10:16Z

hi @kpachal: can you test w/ branch fix_for_kate I think the plot should look much saner now

if you check #1162 you see what a workspace looks like w/ a symmetric 10% uncert (e.g. up: 1.1, dn: 0.9)

kpachal · 2020-11-02T08:11:48Z

Hi Lukas, thanks a lot! I did give it a shot yesterday with that branch and the results unfortunately still look strange to me. I then tested the exact same code you use in #1162 (as far as I can tell without a direct copy/paste) and don't get the same plot you show. I got this:

with this code:

import pyhf

nSig = 4.166929245
errSig = 0
nBkg = 0.1
errBkgUp = 1.1
errBkgDown = 0.9

model_json = {
    "channels": [
        {
            "name": "SR_combined",
            "samples": [
                {
                    "name": "signal",
                    "data": [nSig],
                    "modifiers": [
                        {"name": "mu", "type": "normfactor", "data": None},
                    ],
                },
                {
                    "name": "background",
                    "data": [nBkg],
                    "modifiers": [
                        {
                            "name": "uncorr_bkguncrt",
                            "type": "normsys",
                            "data": {"hi": errBkgUp, "lo": errBkgDown},
                        }
                    ],
                },
            ],
        }
    ],
    "observations": [{"name": "SR_combined", "data": [0.0]}],
    "measurements": [
        {"name": "Measurement", "config": {"poi": "mu", "parameters": []}}
    ],
    "version": "1.0.0",
}

ws = pyhf.Workspace(model_json)
model = ws.model()
data = ws.data(model)

import numpy
mutests = numpy.linspace(0,2.5,11)
results = [pyhf.infer.hypotest(mu, data, model, calctype='toybased', return_expected_set=True, ntoys=10000) for mu in mutests]

import matplotlib.pyplot as plt
from pyhf.contrib.viz.brazil import plot_results
f,ax = plt.subplots()
plot_results(ax,mutests,results)
plt.savefig("lukas_version.eps")

I'm pretty sure the branch installed correctly, but it is then a little weird that I don't get the same thing you do with it. I'll keep poking at it.

kpachal · 2020-11-02T08:43:53Z

Update: I think I still am getting the delta function effect for the highest value in the b-only plot, so maybe I have just failed to set this up as I should have...

kpachal · 2020-11-03T13:26:24Z

Hi @lukasheinrich , just to confirm, I checked that I did indeed pick up the changes from the fix_for_kate branch properly and the results still look as above for me.

lukasheinrich · 2020-11-04T11:54:56Z

hi @kpachal I put some updates into #1162 . tldr: adding lumi uncert i think becomes important

lukasheinrich · 2020-11-09T00:58:58Z

this should now be definitively fixed by #1162

kpachal · 2020-11-09T08:23:19Z

Thank you very much @lukasheinrich !! I'll be testing it today, very excited for it 😄

kpachal · 2020-11-09T18:33:02Z

Just wanted to say thanks again - I have tested the modified code and the results do make more sense to me. To report a few here, running the signal point i've demonstrated before with the full size background uncertainty and no signal uncertainty I get this result:

which looks very similar to the example you have in the MR and I think very reasonable.

With a 100% signal uncertainty (as we have at this point in reality) I get an unusual looking result with 10k events:

The giant green band is not necessarily wrong, I need to break this down and look more closely at the test statistic distributions, and in fact the mu limit > 1 is what we expect to see at this point. So it's funny looking but possibly OK, and I will do some more tests.

Thanks again!! This is a huge improvement.

lukasheinrich · 2020-11-09T18:37:01Z

hi @kpachal can you scan towards higher µ (given that the uncertainties are higher the upper limit/0.05 crossing opint will be weaker) .. maybe upp to µ=5. That the values saturate all the way to CLs=1 has ssomething to do w/ #1160 #892 see e.g. plot at the bottom of where the green band ssaturates up to µ=0.5 or so #892 (comment)

kpachal · 2020-11-10T07:56:52Z

Hi Lukas,

Sure - attached see up to 4.5. Pretty solidly over the observed limit here but we're still looking pretty green 😉

I also attach some b and s+b distributions i made with an earlier plot (mu value 0.864):

There's one tall spike at zero behind the red line in s+b.

Here's the model I'm using now:

errSig = 4.166929245
nBkg = 0.11
errBkgUp = 2.0
errBkgDown = 0.001

model_json = {
    "channels": [
        { "name": "SR_combined",
          "samples": [
            { "name": "signal",
              "data": [nSig],
              "modifiers": [ { "name": "mu", "type": "normfactor", "data": None},
                              {"name": "uncorr_siguncrt", "type": "shapesys", "data": [errSig]} 
              ]
            },
            { "name": "background",
              "data": [nBkg],
              "modifiers": [ {"name": "uncorr_bkguncrt", "type": "normsys", "data": {"hi": errBkgUp, "lo": errBkgDown}} ]
            }
          ]
        }
    ],
    "observations": [
        { "name": "SR_combined", "data": [0.0] }
    ],
    "measurements": [
        { "name": "Measurement", "config": {"poi": "mu", "parameters": []} }
    ],
    "version": "1.0.0"
}

I did check the documentation when setting up that signal uncertainty, but it's still possible I've messed this one up too and made it 400% ...

lukasheinrich · 2020-11-10T09:59:55Z

the poissonian constraint of shapesys is probably not what you want, since for low counts the constraint term is asymmetric. Most people uses normsys (with a gaussian constraint) a la

                        {
                            "data": {
                                "hi": 1.5,
                                "lo": 0.5
                            },
                            "name": "uncorr_siguncrt",
                            "type": "normsys"
                        }

for 50% uncert. But in this regime the interaction between constraint terms and the main likelihood become tricky because adding a 100% in this manner would mean that 15% of the time you predict a signal yield of 0 and so a scenario indistinguishablel from b-only is only a 1-sigma pull away. I'm sure the stat committee has some recommendation here (or previous analyses)

FWIW this is my plot with normssys and 50% uncert on signall and 10% on bkg

notice the cell where the expected data for -1,0,+1 settings of the signal uncert is shown (2.1,4.1,6.1) for bkg=0.1 and signall 4+/-2

note also that I have a 1% lumi uncertainty

{
    "channels": [
        {
            "name": "SR_combined",
            "samples": [
                {
                    "data": [
                        4
                    ],
                    "modifiers": [
                        {
                            "data": null,
                            "name": "lumi",
                            "type": "lumi"
                        },
                        {
                            "data": null,
                            "name": "mu",
                            "type": "normfactor"
                        },
                        {
                            "data": {
                                "hi": 1.5,
                                "lo": 0.5
                            },
                            "name": "uncorr_siguncrt",
                            "type": "normsys"
                        }

                    ],
                    "name": "signal"
                },
                {
                    "data": [
                        0.1
                    ],
                    "modifiers": [
                        {
                            "data": null,
                            "name": "lumi",
                            "type": "lumi"
                        },
                        {
                            "data": {
                                "hi": 1.1,
                                "lo": 0.9
                            },
                            "name": "uncorr_bkguncrt",
                            "type": "normsys"
                        }
                    ],
                    "name": "background"
                }
            ]
        }
    ],
    "measurements": [
        {
            "config": {
                "parameters": [
                    {
                        "auxdata": [
                            1.0
                        ],
                        "bounds": [
                            [
                                0.95,
                                1.05
                            ]
                        ],
                        "inits": [
                            1.0
                        ],
                        "name": "lumi",
                        "sigmas": [
                            0.01
                        ]
                    },
                    {
                        "bounds": [
                            [
                                0.0,
                                10.0
                            ]
                        ],
                        "inits": [
                            1.0
                        ],
                        "name": "mu"
                    }
                ],
                "poi": "mu"
            },
            "name": "Measurement"
        }
    ],
    "observations": [
        {
            "data": [
                0
            ],
            "name": "SR_combined"
        }
    ],
    "version": "1.0.0"
}

kpachal · 2020-11-10T17:51:05Z

Hi Lukas,

Thanks for the tip on the uncertainty format! I'll dig around and see if I can find a stat forum recommendation for how to handle 100% uncertainties.

Cheers,
Kate

lukasheinrich mentioned this issue Oct 30, 2020

fix: Report expected p-values based on their quantiles under background hypotheses #1162

Merged

1 task

matthewfeickert assigned lukasheinrich Nov 9, 2020

matthewfeickert added question Further information is requested user request Request coming form a pyhf user labels Nov 10, 2020

lukasheinrich mentioned this issue Nov 11, 2020

reduce overhead for simple models #1174

Open

kratsg closed this as completed in #1162 Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Toys giving mysterious results #1161

Toys giving mysterious results #1161

kpachal commented Oct 30, 2020 •

edited

Loading

lukasheinrich commented Oct 30, 2020 •

edited

Loading

lukasheinrich commented Oct 30, 2020 •

edited

Loading

lukasheinrich commented Oct 30, 2020

kpachal commented Oct 30, 2020

lukasheinrich commented Oct 30, 2020 •

edited

Loading

kpachal commented Oct 30, 2020

lukasheinrich commented Oct 30, 2020 •

edited

Loading

kpachal commented Nov 2, 2020

kpachal commented Nov 2, 2020

kpachal commented Nov 3, 2020

lukasheinrich commented Nov 4, 2020

lukasheinrich commented Nov 9, 2020

kpachal commented Nov 9, 2020

kpachal commented Nov 9, 2020

lukasheinrich commented Nov 9, 2020 •

edited

Loading

kpachal commented Nov 10, 2020

lukasheinrich commented Nov 10, 2020 •

edited

Loading

kpachal commented Nov 10, 2020

Toys giving mysterious results #1161

Toys giving mysterious results #1161

Comments

kpachal commented Oct 30, 2020 • edited Loading

lukasheinrich commented Oct 30, 2020 • edited Loading

lukasheinrich commented Oct 30, 2020 • edited Loading

lukasheinrich commented Oct 30, 2020

kpachal commented Oct 30, 2020

lukasheinrich commented Oct 30, 2020 • edited Loading

kpachal commented Oct 30, 2020

lukasheinrich commented Oct 30, 2020 • edited Loading

kpachal commented Nov 2, 2020

kpachal commented Nov 2, 2020

kpachal commented Nov 3, 2020

lukasheinrich commented Nov 4, 2020

lukasheinrich commented Nov 9, 2020

kpachal commented Nov 9, 2020

kpachal commented Nov 9, 2020

lukasheinrich commented Nov 9, 2020 • edited Loading

kpachal commented Nov 10, 2020

lukasheinrich commented Nov 10, 2020 • edited Loading

kpachal commented Nov 10, 2020

kpachal commented Oct 30, 2020 •

edited

Loading

lukasheinrich commented Oct 30, 2020 •

edited

Loading

lukasheinrich commented Oct 30, 2020 •

edited

Loading

lukasheinrich commented Oct 30, 2020 •

edited

Loading

lukasheinrich commented Oct 30, 2020 •

edited

Loading

lukasheinrich commented Nov 9, 2020 •

edited

Loading

lukasheinrich commented Nov 10, 2020 •

edited

Loading