Pruning of gammas #662

alexander-held · 2019-12-01T03:00:40Z

Description

HistFactory includes a threshold setting for the size of statistical uncertainties related to predicted distributions, for example

<StatErrorConfig RelErrorThreshold="0.01" ConstraintType="Poisson" />

taken from here.
For consistency with other frameworks, it would be convenient if that feature was supported in pyhf and no gammas would be created in bins where the total relative statistical uncertainty of the model per bin is below the threshold.

Is your feature request related to a problem? Please describe.

A small inconsistency is introduced when comparing a fit performed within pyhf to a fit performed with other frameworks that support this pruning. The difference generally should be small, as the purpose of the pruning is to remove parameters that do not matter. Excluding all inconsistencies helps track down the differences in fit results when they are observed.

Describe the solution you'd like

I would like gammas to be created only for bins above the threshold.

Describe alternatives you've considered

An alternative would be manually pruning of the model created by pyhf.

Relevant Issues and Pull Requests

None that I am aware of

Additional context

None

The text was updated successfully, but these errors were encountered:

lukasheinrich · 2019-12-01T05:28:03Z

@alexander-held can you remind me of what ROOT does? does it just not create those pars, or just set them constant (the latter would be easier for us to implement)

alexander-held · 2019-12-02T17:44:45Z

ROOT creates these parameters and sets them to constant, this is implemented in HistoToWorkspaceFactoryFast::createStatConstraintTerms(), a few lines down towards the end of the function.

lukasheinrich · 2019-12-02T17:52:48Z

I think in this case it's quite straight forward using #653

kratsg · 2019-12-02T21:47:33Z

We'd have to know which parameters to set to constant, but it probably shouldn't be strictly up to pyhf to determine how to "prune" (is it really pruning if it's still involved in the fit, but fixed?)

alexander-held · 2019-12-02T22:06:32Z

There is a default threshold of 5% in ROOT here in StatErrorConfig, though I think setting the default to 0% is a good idea. In general I think it would be convenient to have some function for the user to overwrite the default threshold, and the threshold is then applied when the likelhood is built.

lukasheinrich · 2019-12-02T22:25:18Z

Since this is part of the HiFa spec, we should add it. possibly on a
channel-by-channel basis

{
"name": "mychannel",
"samples": [ .... ],
"config": {
  "staterror": {"type": "poisson", "relthresh": 0.05}
}

lukasheinrich · 2019-12-02T22:26:12Z

(though thinking of it, it's not really a channel-level setting, singe the NPs that are affected by it can be shared, but it could just be part of the reduction of the paramsets)

kratsg · 2019-12-03T05:09:45Z

(though thinking of it, it's not really a channel-level setting, singe the NPs that are affected by it can be shared, but it could just be part of the reduction of the paramsets)

I've been thinking more. Part of my problem is that adding something like this is a bit magical. What should be done is to add the StatErrorConfig as part of the sample (which is... a bit annoying to do since we only want one copy of the configuration).

The stat error isn't really that shared, although how does it work if channel A ttbar has a high staterr and channel B ttbar has a low staterr -- but I guess it's a per-channel basis, but it can be shared and agh...

Can we perhaps make pruning part of something more higher-level than rather as part of the fit/model-building procedure? e.g. the pruning should perhaps be done on the spec, before loading it into a workspace/model...

lukasheinrich · 2019-12-03T09:58:44Z

no they are per-channel (but shared across samples) so what I said above, that they can be shared across channels is wrong (I miixed this up with shapesys)..

alexander-held · 2019-12-03T14:22:50Z

I don't think the spec needs to show what ends up pruned and what not. The pruning can be applied when building the likelihood. To achieve that, the threshold either needs to be in the spec, or specified via a function when calling pyhf (though that would make it different from ROOT, where it is in the spec).

If there are multiple channels with very different stat. uncertainties, still a single threshold is in principle enough, it might cause everything in one channels to be pruned but not in another. I guess it is possible to assign different thresholds per channel, although I am not sure why that would be desired.

There are setups where people want to split gammas between samples, though then that would be done upstream in the framework putting together the workspace (via shapesys), so I think this is unrelated to this issue.

alexander-held · 2020-11-17T12:17:26Z

I just ran into this limitation in another context. I have two ROOT workspaces that I want to combine and edit with pyhf. To achieve that, I convert to json, run pyhf combine, and convert back to xmls. The input workspaces have the 1% pruning threshold specified, and this information is lost during conversion to json. Some staterror modifiers consequently get dropped from the model config (I think because the default 5% threshold is applied) and will be pruned in subsequent fits of the combined ROOT workspace. As a workaround, it is possible to manually edit the xmls to recover the behavior.

It was not clear to me at first that this was the explanation for what I was seeing, so I thought it would be good to put it here in case other people run into the same issue.

alexander-held · 2024-07-18T22:01:42Z

Here's an example for how to work around this with a list of parameters that can be held constant in fits:

fixed = model.config.suggested_fixed()
idx = 0
for i, parameter in enumerate(model.config.par_order):
    if "staterror_" in parameter:
        for width in model.config.param_set(parameter).width():
            if width < 0.01:
                fixed[idx] = True
            idx += 1
    else:
        idx += model.config.param_set(parameter).n_parameters

alexander-held mentioned this issue Dec 1, 2019

Translate analysis examples into new specifications, provide feedback, iterating as necessary iris-hep/project-milestones#10

Open

alexander-held mentioned this issue Nov 17, 2020

add staterror config to spec to have faithful roundtrip #760

Open

alexander-held mentioned this issue Jan 28, 2021

Pruning modifiers per sample and channel #1279

Open

alexander-held mentioned this issue Jun 6, 2022

Error when changing bins in VR scikit-hep/cabinetry#340

Closed

lhenkelm mentioned this issue Oct 28, 2022

Let measurement config specify a sequence of booleans in ['parameter']['fixed'] field #2053

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pruning of gammas #662

Pruning of gammas #662

alexander-held commented Dec 1, 2019

lukasheinrich commented Dec 1, 2019

alexander-held commented Dec 2, 2019 •

edited

Loading

lukasheinrich commented Dec 2, 2019

kratsg commented Dec 2, 2019

alexander-held commented Dec 2, 2019

lukasheinrich commented Dec 2, 2019

lukasheinrich commented Dec 2, 2019

kratsg commented Dec 3, 2019

lukasheinrich commented Dec 3, 2019

alexander-held commented Dec 3, 2019

alexander-held commented Nov 17, 2020 •

edited

Loading

alexander-held commented Jul 18, 2024

Pruning of gammas #662

Pruning of gammas #662

Comments

alexander-held commented Dec 1, 2019

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Relevant Issues and Pull Requests

Additional context

lukasheinrich commented Dec 1, 2019

alexander-held commented Dec 2, 2019 • edited Loading

lukasheinrich commented Dec 2, 2019

kratsg commented Dec 2, 2019

alexander-held commented Dec 2, 2019

lukasheinrich commented Dec 2, 2019

lukasheinrich commented Dec 2, 2019

kratsg commented Dec 3, 2019

lukasheinrich commented Dec 3, 2019

alexander-held commented Dec 3, 2019

alexander-held commented Nov 17, 2020 • edited Loading

alexander-held commented Jul 18, 2024

alexander-held commented Dec 2, 2019 •

edited

Loading

alexander-held commented Nov 17, 2020 •

edited

Loading