Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shapesys modifier correlated between samples #1967

Open
1 task done
alexander-held opened this issue Aug 28, 2022 · 2 comments · Fixed by #1977
Open
1 task done

Shapesys modifier correlated between samples #1967

alexander-held opened this issue Aug 28, 2022 · 2 comments · Fixed by #1977
Assignees
Labels
bug Something isn't working

Comments

@alexander-held
Copy link
Member

Summary

Is there a meaningful way to correlate shapesys modifiers across samples? If not, models where this is done should be flagged as invalid.

This is somewhat related to #1899.

OS / Environment

n/a

Steps to Reproduce

spec = {
    "channels": [
        {
            "name": "SR",
            "samples": [
                {
                    "data": [50],
                    "modifiers": [
                        {
                            "data": [9],
                            "name": "abc",
                            "type": "shapesys",
                        },
                        {
                            "data": None,
                            "name": "Signal strength",
                            "type": "normfactor",
                        },
                    ],
                    "name": "Signal",
                },
                {
                    "data": [150],
                    "modifiers": [
                        {
                            "data": [7],
                            "name": "abc",
                            "type": "shapesys",
                        }
                    ],
                    "name": "Background",
                },
            ],
        }
    ],
    "measurements": [{"config": {"parameters": [], "poi": ""}, "name": "meas"}],
    "observations": [{"data": [160], "name": "SR"}],
    "version": "1.0.0",
}

import pyhf

ws = pyhf.Workspace(spec)
model = ws.model()
data = ws.data(model)
pyhf.set_backend("numpy", "minuit")
fit_result = pyhf.infer.mle.fit(data, model, return_uncertainties=True)
for par_name, par_res in zip(model.config.par_names(), fit_result):
    print(f"{par_name}: {par_res[0]:.3f} +/- {par_res[1]:.3f}")

File Upload (optional)

No response

Expected Results

The script above prints

Signal strength: 0.200 +/- 0.291
abc[0]: 1.000 +/- 0.047

as result. A change to the shapesys data for the signal sample has no impact on the result at all, while changing it for the background does change the result.

I believe there is just a single Poisson rate to be set, as there is just a single parameter controlling both modifiers that is being constrained. The parameter does seem to correctly scale both samples, but there is just a single constraint term. I do not know whether it would be more sensible to create one constraint term per sample and keep the parameter effect correlated, or to catch this scenario and raise a warning.

Actual Results

no warnings raised about model being potentially invalid

pyhf Version

pyhf-0.7.0rc2.dev30

Code of Conduct

  • I agree to follow the Code of Conduct
@alexander-held alexander-held added bug Something isn't working needs-triage Needs a maintainer to categorize and assign labels Aug 28, 2022
@matthewfeickert matthewfeickert removed the needs-triage Needs a maintainer to categorize and assign label Sep 7, 2022
@kratsg
Copy link
Contributor

kratsg commented Dec 5, 2023

Re-opening due to a user request from Luis's talk during the 2023 pyhf workshop.

@kratsg kratsg reopened this Dec 5, 2023
@alexander-held
Copy link
Member Author

After thinking some more about this following a talk at the pyhf workshop https://indico.cern.ch/event/1294577/contributions/5677127/, I think there is a meaningful way to correlate these modifiers across samples. Conceptually, this would be similar to staterror, but with some important difference in behavior.

A staterror term in a bin only needs a single float to keep track of auxdata (constraint term width for the Gaussian essentially). This is because all the per-sample uncertainties are summed together, and then all samples vary with that total MC statistical uncertainty.

For shapesys, the way I am thinking about this would be to not combine uncertainties per sample in the same way, but only correlate the nuisance parameter. That would be similar to e.g. histosys in a single bin, but histosys is always a unit Gaussian, so the relevant auxdata is always the same no matter which different histosys modifiers across samples are correlated. The data that is per-sample is the data in the histosys modifier itself. For a shapesys, the (Poisson) constraint term width would differ per sample, so we would need a sample-specific auxdata to track that. Currently this does not exist conceptually within pyhf as far as I know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants