Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning of gammas #662

Open
alexander-held opened this issue Dec 1, 2019 · 12 comments
Open

Pruning of gammas #662

alexander-held opened this issue Dec 1, 2019 · 12 comments

Comments

@alexander-held
Copy link
Member

Description

HistFactory includes a threshold setting for the size of statistical uncertainties related to predicted distributions, for example

<StatErrorConfig RelErrorThreshold="0.01" ConstraintType="Poisson" /> 

taken from here.
For consistency with other frameworks, it would be convenient if that feature was supported in pyhf and no gammas would be created in bins where the total relative statistical uncertainty of the model per bin is below the threshold.

Is your feature request related to a problem? Please describe.

A small inconsistency is introduced when comparing a fit performed within pyhf to a fit performed with other frameworks that support this pruning. The difference generally should be small, as the purpose of the pruning is to remove parameters that do not matter. Excluding all inconsistencies helps track down the differences in fit results when they are observed.

Describe the solution you'd like

I would like gammas to be created only for bins above the threshold.

Describe alternatives you've considered

An alternative would be manually pruning of the model created by pyhf.

Relevant Issues and Pull Requests

None that I am aware of

Additional context

None

@lukasheinrich
Copy link
Contributor

@alexander-held can you remind me of what ROOT does? does it just not create those pars, or just set them constant (the latter would be easier for us to implement)

@alexander-held
Copy link
Member Author

alexander-held commented Dec 2, 2019

ROOT creates these parameters and sets them to constant, this is implemented in HistoToWorkspaceFactoryFast::createStatConstraintTerms(), a few lines down towards the end of the function.

@lukasheinrich
Copy link
Contributor

I think in this case it's quite straight forward using #653

@kratsg
Copy link
Contributor

kratsg commented Dec 2, 2019

We'd have to know which parameters to set to constant, but it probably shouldn't be strictly up to pyhf to determine how to "prune" (is it really pruning if it's still involved in the fit, but fixed?)

@alexander-held
Copy link
Member Author

There is a default threshold of 5% in ROOT here in StatErrorConfig, though I think setting the default to 0% is a good idea. In general I think it would be convenient to have some function for the user to overwrite the default threshold, and the threshold is then applied when the likelhood is built.

@lukasheinrich
Copy link
Contributor

Since this is part of the HiFa spec, we should add it. possibly on a
channel-by-channel basis

{
"name": "mychannel",
"samples": [ .... ],
"config": {
  "staterror": {"type": "poisson", "relthresh": 0.05}
}

@lukasheinrich
Copy link
Contributor

(though thinking of it, it's not really a channel-level setting, singe the NPs that are affected by it can be shared, but it could just be part of the reduction of the paramsets)

@kratsg
Copy link
Contributor

kratsg commented Dec 3, 2019

(though thinking of it, it's not really a channel-level setting, singe the NPs that are affected by it can be shared, but it could just be part of the reduction of the paramsets)

I've been thinking more. Part of my problem is that adding something like this is a bit magical. What should be done is to add the StatErrorConfig as part of the sample (which is... a bit annoying to do since we only want one copy of the configuration).

The stat error isn't really that shared, although how does it work if channel A ttbar has a high staterr and channel B ttbar has a low staterr -- but I guess it's a per-channel basis, but it can be shared and agh...

Can we perhaps make pruning part of something more higher-level than rather as part of the fit/model-building procedure? e.g. the pruning should perhaps be done on the spec, before loading it into a workspace/model...

@lukasheinrich
Copy link
Contributor

no they are per-channel (but shared across samples) so what I said above, that they can be shared across channels is wrong (I miixed this up with shapesys)..

@alexander-held
Copy link
Member Author

I don't think the spec needs to show what ends up pruned and what not. The pruning can be applied when building the likelihood. To achieve that, the threshold either needs to be in the spec, or specified via a function when calling pyhf (though that would make it different from ROOT, where it is in the spec).

If there are multiple channels with very different stat. uncertainties, still a single threshold is in principle enough, it might cause everything in one channels to be pruned but not in another. I guess it is possible to assign different thresholds per channel, although I am not sure why that would be desired.

There are setups where people want to split gammas between samples, though then that would be done upstream in the framework putting together the workspace (via shapesys), so I think this is unrelated to this issue.

@alexander-held
Copy link
Member Author

alexander-held commented Nov 17, 2020

I just ran into this limitation in another context. I have two ROOT workspaces that I want to combine and edit with pyhf. To achieve that, I convert to json, run pyhf combine, and convert back to xmls. The input workspaces have the 1% pruning threshold specified, and this information is lost during conversion to json. Some staterror modifiers consequently get dropped from the model config (I think because the default 5% threshold is applied) and will be pruned in subsequent fits of the combined ROOT workspace. As a workaround, it is possible to manually edit the xmls to recover the behavior.

It was not clear to me at first that this was the explanation for what I was seeing, so I thought it would be good to put it here in case other people run into the same issue.

@alexander-held
Copy link
Member Author

Here's an example for how to work around this with a list of parameters that can be held constant in fits:

fixed = model.config.suggested_fixed()
idx = 0
for i, parameter in enumerate(model.config.par_order):
    if "staterror_" in parameter:
        for width in model.config.param_set(parameter).width():
            if width < 0.01:
                fixed[idx] = True
            idx += 1
    else:
        idx += model.config.param_set(parameter).n_parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants