Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Report expected p-values based on their quantiles under background hypotheses #1162

Merged
merged 31 commits into from
Jan 20, 2021

Conversation

lukasheinrich
Copy link
Contributor

@lukasheinrich lukasheinrich commented Oct 30, 2020

Description

Fixes issue reported by @kpachal

This changes the way we report expected limits

we used to

  • find quantiles in test stat distribution [q1,q2,....q5]
  • report CLs/p-values for those test stats [p1,p2,...p5]

however this can lead to orderings in which the p-values are not monotonic as the relationship between CLs and test statistic value is not necessarily monotonic..

we now

  • compute a sample of Cls values based on test statistics sampled from toys
  • report exected CLs/p-values as quantiles of that those test stats [p1,p2,...p5]

This is also what ROOT does. It might present some future questions regarding #966 but is ok for now.

import pyhf
import json
import numpy as np
import matplotlib.pyplot as plt
import pyhf.contrib.viz.brazil as brazil

w = pyhf.Workspace(json.loads('''
{
    "channels": [
        {
            "name": "SR_combined",
            "samples": [
                {
                    "data": [4],
                    "modifiers": [
                        {"data": null,"name": "lumi","type": "lumi"},
                        {"data": null,"name": "mu","type": "normfactor"}
                    ],
                    "name": "signal"
                },
                {
                    "data": [0.1],
                    "modifiers": [
                        {"data": null,"name": "lumi","type": "lumi"},
                        {"data": {"hi": 2.0,"lo": 0.5},"name": "uncorr_bkguncrt","type": "normsys"}
                    ],
                    "name": "background"
                }
            ]
        }
    ],
    "measurements": [
        {
            "config": {
                "parameters": [
                    {"auxdata": [1.0],"bounds": [[0.5,1.5]],"inits": [1.0],"name": "lumi","sigmas": [0.01]},
                    {"bounds": [[0.0,10.0]],"inits": [1.0],"name": "mu"}
                ],
                "poi": "mu"
            },
            "name": "Measurement"
        }
    ],
    "observations": [{"data": [0],"name": "SR_combined"}
    ],
    "version": "1.0.0"
}
'''))
m = w.model()
d = w.data(m)

mutests = np.linspace(0.5,1.0,4)
results = [pyhf.infer.hypotest(mu,d,m,return_expected_set=True, calctype='toybased', ntoys = 15000) for mu in mutests]

f,ax = plt.subplots()
brazil.plot_results(ax,mutests,results)
plt.semilogy()
plt.ylim(1e-3,1)
plt.xlim(0,1.35)

produces this plot

image

ReadTheDocs build: https://pyhf.readthedocs.io/en/fix_for_kate/api.html#inference

For the PR Assignees:

  • Summarize commit messages into a comprehensive review of the PR
* Add two ROOT validation scripts for toys
* Use same initialization parameters for b and s+b toys
* Leave (expected) p-value calculation to calculators
   - Add pvalue and expected_pvalues to calculator API
* Toy-based calculator returns expected values based on CLs distribution quantiles

Co-authored-by: Matthew Feickert <matthew.feickert@cern.ch>
Co-authored-by: Giordon Stark <kratsg@gmail.com>

@kratsg
Copy link
Contributor

kratsg commented Oct 30, 2020

Tests need to get fixed. Probably also have a good test to assert correct behavior.

@lukasheinrich
Copy link
Contributor Author

lukasheinrich commented Oct 30, 2020

it's not completely fixed though: i.e. the green band is not within the yellow one.. still the numerics is on open issue

image

@lukasheinrich
Copy link
Contributor Author

lukasheinrich commented Nov 4, 2020

I added a validation script StandardHypoTestDemo.py which roughly follows StandardHypoTestDemo.C except that it configures it for exclusion instead of discovery.

With this we can xcheck toys against ROOT.

for the workspace 'xmlimport_input_bkg.json' (which si xmlimport_input_bkg excecpt with data put on bkg expectation) it looks like this

image

https://root.cern.ch/doc/v614/StandardHypoTestDemo_8C.html

The pyhf code is

image

@lukasheinrich
Copy link
Contributor Author

ok some more important points

  • in this regime lumi uncertainty becomes important so one must
    • add a lumi modifier to each
    • add a lumi sigma into the parameter config

image

once this is done, the delta peak gets broadened (left ROOT right toys)

image

the brazil band doesn't look too great still, but it's looking better - trying w/ more toys

image

This is the workspace

{
    "channels": [
        {
            "name": "SR_combined",
            "samples": [
                {
                    "data": [
                        1
                    ],
                    "modifiers": [
                        {
                            "data": null,
                            "name": "lumi",
                            "type": "lumi"
                        },
                        {
                            "data": null,
                            "name": "mu",
                            "type": "normfactor"
                        }
                    ],
                    "name": "signal"
                },
                {
                    "data": [
                        0.1
                    ],
                    "modifiers": [
                        {
                            "data": null,
                            "name": "lumi",
                            "type": "lumi"
                        },
                        {
                            "data": {
                                "hi": 1.1,
                                "lo": 0.9
                            },
                            "name": "uncorr_bkguncrt",
                            "type": "normsys"
                        }
                    ],
                    "name": "background"
                }
            ]
        }
    ],
    "measurements": [
        {
            "config": {
                "parameters": [
                    {
                        "auxdata": [
                            1.0
                        ],
                        "bounds": [
                            [
                                0.5,
                                1.5
                            ]
                        ],
                        "inits": [
                            1.0
                        ],
                        "name": "lumi",
                        "sigmas": [
                            0.1
                        ]
                    },
                    {
                        "bounds": [
                            [
                                0.0,
                                10.0
                            ]
                        ],
                        "inits": [
                            1.0
                        ],
                        "name": "mu"
                    }
                ],
                "poi": "mu"
            },
            "name": "Measurement"
        }
    ],
    "observations": [
        {
            "data": [
                0
            ],
            "name": "SR_combined"
        }
    ],
    "version": "1.0.0"
}

@lgtm-com
Copy link

lgtm-com bot commented Nov 4, 2020

This pull request introduces 2 alerts when merging a8c2f29 into 81c9adb - view on LGTM.com

new alerts:

  • 2 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Nov 6, 2020

This pull request introduces 2 alerts when merging d4d1898 into a3b34a5 - view on LGTM.com

new alerts:

  • 2 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Nov 8, 2020

This pull request introduces 4 alerts when merging 63eaa6d into e4011ff - view on LGTM.com

new alerts:

  • 3 for Variable defined multiple times
  • 1 for Unused local variable

@codecov
Copy link

codecov bot commented Nov 8, 2020

Codecov Report

Merging #1162 (26ada72) into master (0e71f2f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1162   +/-   ##
=======================================
  Coverage   97.48%   97.48%           
=======================================
  Files          63       63           
  Lines        3733     3740    +7     
  Branches      530      531    +1     
=======================================
+ Hits         3639     3646    +7     
  Misses         55       55           
  Partials       39       39           
Flag Coverage Δ
unittests 97.48% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pyhf/infer/intervals.py 100.00% <ø> (ø)
src/pyhf/infer/__init__.py 100.00% <100.00%> (ø)
src/pyhf/infer/calculators.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e71f2f...26ada72. Read the comment docs.

@lukasheinrich lukasheinrich changed the title fix: unify inits feat: report expected p-values based on their quantiles under background hypotheses Nov 9, 2020
@lukasheinrich lukasheinrich changed the title feat: report expected p-values based on their quantiles under background hypotheses fix: report expected p-values based on their quantiles under background hypotheses Nov 9, 2020
@lukasheinrich
Copy link
Contributor Author

lukasheinrich commented Nov 9, 2020

this leaving this for later

this is now the plot w/ ROOT

and this w/ pyhf

image

which should be equivalent within the toy sample variance (20k)

@kratsg kratsg added the feat/enhancement New feature or request label Jan 19, 2021
@matthewfeickert matthewfeickert added the tests pytest label Jan 20, 2021
Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice rebasing work @kratsg.

So a concern that I have here, that I myself contributed to when working on this weeks ago, is that we're introducing inconsistent return types for CLs-like values now. Previously, all CLs-like / p-value-like values were 0-d tensors

# Ensure that all CL values are 0-d tensors

However, now we have things like hypotest returning 0-d tensors and things like pvalues and expected_pvalues returning floats only for NumPy and for all other backends returning 0-d tensors, yet in all the examples we're showing CLs-like / p-value-like values being returned.

Example: Run the docstring example from this PR for expected_pvalues for various backends and note the return type of CLs_exp_band[0].

import pyhf

for backend in ["numpy", "pytorch", "jax"]:
    print(f"\nbackend: {backend}")
    pyhf.set_backend(backend)
    model = pyhf.simplemodels.hepdata_like(
        signal_data=[12.0, 11.0], bkg_data=[50.0, 52.0], bkg_uncerts=[3.0, 7.0]
    )
    observations = [51, 48]
    data = observations + model.config.auxdata
    mu_test = 1.0
    asymptotic_calculator = pyhf.infer.calculators.AsymptoticCalculator(
        data, model, test_stat="qtilde"
    )
    _ = asymptotic_calculator.teststatistic(mu_test)
    sig_plus_bkg_dist, bkg_dist = asymptotic_calculator.distributions(mu_test)
    CLsb_exp_band, CLb_exp_band, CLs_exp_band = asymptotic_calculator.expected_pvalues(
        sig_plus_bkg_dist, bkg_dist
    )
    print(f"CLs expected band: {CLs_exp_band}")
    print(
        f"of type: {type(CLs_exp_band[0])} and shape {pyhf.tensorlib.shape(CLs_exp_band[0])}"
    )

gives


backend: numpy
CLs expected band: [0.0026062609501074576, 0.01382005356161206, 0.06445320535890459, 0.23525643861460702, 0.573036205919389]
of type: <class 'numpy.float64'> and shape ()

backend: pytorch
CLs expected band: [tensor(0.0026), tensor(0.0138), tensor(0.0645), tensor(0.2353), tensor(0.5730)]
of type: <class 'torch.Tensor'> and shape ()

backend: jax
CLs expected band: [DeviceArray(0.00260626, dtype=float64), DeviceArray(0.01382005, dtype=float64), DeviceArray(0.0644532, dtype=float64), DeviceArray(0.23525643, dtype=float64), DeviceArray(0.57303619, dtype=float64)]
of type: <class 'jax.interpreters.xla._DeviceArray'> and shape ()

We should try to come to a consensus on what the return type for a CLs-like / p-value-like value should be. From looking back at PR #944 and Issue #714, I think that the CLs-like values should be 0-d tensor as it allows for us to sidestep this difference in backend behavior (I believe this is the motivation for the current behavior). While conceptually it makes sense to have a p-value-like just be a float to emphasize the scalar nature, having it be a 0-d tensor make it appear to a user as still having scalar like behavior.

The validation notebooks here are fine for now and we can revise them when we address Issue #1241.

@kratsg all the notebook stuff is great though. 👍

docs/api.rst Show resolved Hide resolved
src/pyhf/infer/__init__.py Outdated Show resolved Hide resolved
src/pyhf/infer/calculators.py Outdated Show resolved Hide resolved
Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLs-like / p-value-like return type problem is now moved to Issue #1268 and will be resolved in a follow up PR.

@matthewfeickert matthewfeickert added the API Changes the public API label Jan 20, 2021
@kratsg kratsg merged commit fed49ca into master Jan 20, 2021
@kratsg kratsg deleted the fix_for_kate branch January 20, 2021 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Changes the public API docs Documentation related feat/enhancement New feature or request fix A bug fix tests pytest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update ToDo on readme to indicate that we've added support for toys Toys giving mysterious results
3 participants