You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The histfactory_dist distribution layout is intentionally designed in such a way as to be as close as possible to the pyhf JSON standard.
However, there are some differences.
I will use this thread here to highlight the (remaining) differences and argue why they exist. I'd be happy to receive feedback whether you things these are good reasons, or if this should be changed.
If you spot more differences than I list here (it's possible I overlooked some), please feel free to post them as replies so we can discuss!
axes
pyhf usually doesn't define the name of the observables. This is not required for the math, but since HS3 uses named parameters and observables for everything, the axes need to have variable names. Syntactically, this uses the same definitions as histograms used in data (see also below)
Staterror
The staterror does rely on the template data defining their errors, rather than having the errors listed explicitly.
Listing them explicitly would make staterror completely identical to shapesys (see below), so there is no need to support this case.
Having a modifier that allows to rely on the errors in the template histograms allows easier editing of the histogram stack (adidng or removing entries) without having to re-calculate the numbers in the modifier definition.
Different list of modifiers
In principle, the list of modifiers in HistFactory is well-defined. However, there are a few caveats. For example, in HS3, we plan long-term to factorize the definition of constraint terms. This means that the modifiers will, eventually, stop referring to a type of constraint term, because that will be defined externally via the likelihood itself.
This means that, for example, shapesys and shapefactor will be the same modifier, as their only difference lies in the constraint term. Thus, for now, the list of modifiers is reduced to
normfactor
normsys (= overallsys, but different from normfactor even without constraint, because the nominal value of alpha is 0)
shapefactor (same as shapesys)
staterror (special case of shapefactor, since the constraint is auto-generated from the error contents of the template histograms, see above)
Modifier syntax
all modifiers store their numbers in a struct data for consistency.
In order to allow people to use the same implementations to read the template histograms (="embedded data") as they do to read the data histograms in the data section, some addtional syntactical elements are added, such as another layer with values (as this is needed to distinguish binned from unbinned data in the data section by adding a type key.
also shape modifiers have another layer inside data, called vals. This is for consistency. since data is always a dictionary (for all modifiers).
Potential future change: factorization
Right now, histfactory_dist is a combined pdf of multipel channels. However, there is very little added info in the simultaneous pdf collecting the channels. It would be very feasible to split this up into several parts, one pdf per channel, which would allow easier editing/merging of different types of pdfs in the files. In this case, each individual channel could be conceived as a function rather than as a distribution, which would make it much easier to construct stacked models, adding histfactory predictions to other, non-histfactory predictions as signal and background models contributing to the same region.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The
histfactory_dist
distribution layout is intentionally designed in such a way as to be as close as possible to the pyhf JSON standard.However, there are some differences.
I will use this thread here to highlight the (remaining) differences and argue why they exist. I'd be happy to receive feedback whether you things these are good reasons, or if this should be changed.
If you spot more differences than I list here (it's possible I overlooked some), please feel free to post them as replies so we can discuss!
axes
pyhf usually doesn't define the name of the observables. This is not required for the math, but since HS3 uses named parameters and observables for everything, the axes need to have variable names. Syntactically, this uses the same definitions as histograms used in
data
(see also below)Staterror
The staterror does rely on the template data defining their errors, rather than having the errors listed explicitly.
Listing them explicitly would make
staterror
completely identical toshapesys
(see below), so there is no need to support this case.Having a modifier that allows to rely on the errors in the template histograms allows easier editing of the histogram stack (adidng or removing entries) without having to re-calculate the numbers in the modifier definition.
Different list of modifiers
In principle, the list of modifiers in HistFactory is well-defined. However, there are a few caveats. For example, in HS3, we plan long-term to factorize the definition of constraint terms. This means that the modifiers will, eventually, stop referring to a type of constraint term, because that will be defined externally via the likelihood itself.
This means that, for example,
shapesys
andshapefactor
will be the same modifier, as their only difference lies in the constraint term. Thus, for now, the list of modifiers is reduced toerror
contents of the template histograms, see above)Modifier syntax
data
for consistency.data
section, some addtional syntactical elements are added, such as another layer withvalues
(as this is needed to distinguish binned from unbinned data in thedata
section by adding atype
key.data
, calledvals
. This is for consistency. sincedata
is always a dictionary (for all modifiers).Potential future change: factorization
Right now,
histfactory_dist
is a combined pdf of multipel channels. However, there is very little added info in the simultaneous pdf collecting the channels. It would be very feasible to split this up into several parts, one pdf per channel, which would allow easier editing/merging of different types of pdfs in the files. In this case, each individual channel could be conceived as a function rather than as a distribution, which would make it much easier to construct stacked models, adding histfactory predictions to other, non-histfactory predictions as signal and background models contributing to the same region.Beta Was this translation helpful? Give feedback.
All reactions