differences to pyhf JSON #30

cburgard · 2023-12-04T11:13:53Z

cburgard
Dec 4, 2023
Maintainer

The histfactory_dist distribution layout is intentionally designed in such a way as to be as close as possible to the pyhf JSON standard.
However, there are some differences.
I will use this thread here to highlight the (remaining) differences and argue why they exist. I'd be happy to receive feedback whether you things these are good reasons, or if this should be changed.
If you spot more differences than I list here (it's possible I overlooked some), please feel free to post them as replies so we can discuss!

axes
pyhf usually doesn't define the name of the observables. This is not required for the math, but since HS3 uses named parameters and observables for everything, the axes need to have variable names. Syntactically, this uses the same definitions as histograms used in data (see also below)

Staterror
The staterror does rely on the template data defining their errors, rather than having the errors listed explicitly.
Listing them explicitly would make staterror completely identical to shapesys (see below), so there is no need to support this case.
Having a modifier that allows to rely on the errors in the template histograms allows easier editing of the histogram stack (adidng or removing entries) without having to re-calculate the numbers in the modifier definition.

Different list of modifiers
In principle, the list of modifiers in HistFactory is well-defined. However, there are a few caveats. For example, in HS3, we plan long-term to factorize the definition of constraint terms. This means that the modifiers will, eventually, stop referring to a type of constraint term, because that will be defined externally via the likelihood itself.
This means that, for example, shapesys and shapefactor will be the same modifier, as their only difference lies in the constraint term. Thus, for now, the list of modifiers is reduced to

normfactor
normsys (= overallsys, but different from normfactor even without constraint, because the nominal value of alpha is 0)
shapefactor (same as shapesys)
staterror (special case of shapefactor, since the constraint is auto-generated from the error contents of the template histograms, see above)

Modifier syntax

all modifiers store their numbers in a struct data for consistency.
In order to allow people to use the same implementations to read the template histograms (="embedded data") as they do to read the data histograms in the data section, some addtional syntactical elements are added, such as another layer with values (as this is needed to distinguish binned from unbinned data in the data section by adding a type key.
also shape modifiers have another layer inside data, called vals. This is for consistency. since data is always a dictionary (for all modifiers).

Potential future change: factorization
Right now, histfactory_dist is a combined pdf of multipel channels. However, there is very little added info in the simultaneous pdf collecting the channels. It would be very feasible to split this up into several parts, one pdf per channel, which would allow easier editing/merging of different types of pdfs in the files. In this case, each individual channel could be conceived as a function rather than as a distribution, which would make it much easier to construct stacked models, adding histfactory predictions to other, non-histfactory predictions as signal and background models contributing to the same region.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

differences to pyhf JSON #30

{{title}}

Replies: 0 comments

Select a reply

differences to pyhf JSON #30

cburgard Dec 4, 2023 Maintainer

Replies: 0 comments

cburgard
Dec 4, 2023
Maintainer