-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pruning of gammas #662
Comments
@alexander-held can you remind me of what ROOT does? does it just not create those pars, or just set them constant (the latter would be easier for us to implement) |
ROOT creates these parameters and sets them to constant, this is implemented in HistoToWorkspaceFactoryFast::createStatConstraintTerms(), a few lines down towards the end of the function. |
I think in this case it's quite straight forward using #653 |
We'd have to know which parameters to set to constant, but it probably shouldn't be strictly up to pyhf to determine how to "prune" (is it really pruning if it's still involved in the fit, but fixed?) |
There is a default threshold of 5% in ROOT here in StatErrorConfig, though I think setting the default to 0% is a good idea. In general I think it would be convenient to have some function for the user to overwrite the default threshold, and the threshold is then applied when the likelhood is built. |
Since this is part of the HiFa spec, we should add it. possibly on a
|
(though thinking of it, it's not really a channel-level setting, singe the NPs that are affected by it can be shared, but it could just be part of the reduction of the paramsets) |
I've been thinking more. Part of my problem is that adding something like this is a bit magical. What should be done is to add the StatErrorConfig as part of the sample (which is... a bit annoying to do since we only want one copy of the configuration). The stat error isn't really that shared, although how does it work if channel A ttbar has a high staterr and channel B ttbar has a low staterr -- but I guess it's a per-channel basis, but it can be shared and agh... Can we perhaps make pruning part of something more higher-level than rather as part of the fit/model-building procedure? e.g. the pruning should perhaps be done on the spec, before loading it into a workspace/model... |
no they are per-channel (but shared across samples) so what I said above, that they can be shared across channels is wrong (I miixed this up with shapesys).. |
I don't think the spec needs to show what ends up pruned and what not. The pruning can be applied when building the likelihood. To achieve that, the threshold either needs to be in the spec, or specified via a function when calling pyhf (though that would make it different from ROOT, where it is in the spec). If there are multiple channels with very different stat. uncertainties, still a single threshold is in principle enough, it might cause everything in one channels to be pruned but not in another. I guess it is possible to assign different thresholds per channel, although I am not sure why that would be desired. There are setups where people want to split gammas between samples, though then that would be done upstream in the framework putting together the workspace (via shapesys), so I think this is unrelated to this issue. |
I just ran into this limitation in another context. I have two ROOT workspaces that I want to combine and edit with pyhf. To achieve that, I convert to json, run It was not clear to me at first that this was the explanation for what I was seeing, so I thought it would be good to put it here in case other people run into the same issue. |
Here's an example for how to work around this with a list of parameters that can be held constant in fits: fixed = model.config.suggested_fixed()
idx = 0
for i, parameter in enumerate(model.config.par_order):
if "staterror_" in parameter:
for width in model.config.param_set(parameter).width():
if width < 0.01:
fixed[idx] = True
idx += 1
else:
idx += model.config.param_set(parameter).n_parameters |
Description
HistFactory includes a threshold setting for the size of statistical uncertainties related to predicted distributions, for example
taken from here.
For consistency with other frameworks, it would be convenient if that feature was supported in pyhf and no gammas would be created in bins where the total relative statistical uncertainty of the model per bin is below the threshold.
Is your feature request related to a problem? Please describe.
A small inconsistency is introduced when comparing a fit performed within pyhf to a fit performed with other frameworks that support this pruning. The difference generally should be small, as the purpose of the pruning is to remove parameters that do not matter. Excluding all inconsistencies helps track down the differences in fit results when they are observed.
Describe the solution you'd like
I would like gammas to be created only for bins above the threshold.
Describe alternatives you've considered
An alternative would be manually pruning of the model created by pyhf.
Relevant Issues and Pull Requests
None that I am aware of
Additional context
None
The text was updated successfully, but these errors were encountered: