Introduction

Measurements in High Energy Physics (HEP) rely on determining the compatibility of observed collision events with theoretical predictions. The relationship between them is often formalised in a statistical model $f(\bm{x}|\fullset)$ describing the probability of data x given model parameters $\fullset$. Given observed data, the likelihood $\mathcal{L}(\fullset)$ then serves as the basis to test hypotheses on the parameters $\fullset$. For measurements based on binned data (histograms), the $\HiFa{}$ family of statistical models has been widely used in both Standard Model measurements intro-HIGG-2013-02 as well as searches for new physics intro-ATLAS-CONF-2018-041. In this package, a declarative, plain-text format for describing $\HiFa{}$-based likelihoods is presented that is targeted for reinterpretation and long-term preservation in analysis data repositories such as HEPData intro-Maguire:2017ypu.

HistFactory

Statistical models described using $\HiFa{}$ intro-Cranmer:1456844 center around the simultaneous measurement of disjoint binned distributions (channels) observed as event counts $\channelcounts$. For each channel, the overall expected event rate¹ is the sum over a number of physics processes (samples). The sample rates may be subject to parametrised variations, both to express the effect of free parameters $\freeset$² and to account for systematic uncertainties as a function of constrained parameters $\constrset$. The degree to which the latter can cause a deviation of the expected event rates from the nominal rates is limited by constraint terms. In a frequentist framework these constraint terms can be viewed as auxiliary measurements with additional global observable data $\auxdata$, which paired with the channel data $\channelcounts$ completes the observation $\bm{x} = (\channelcounts,\auxdata)$. In addition to the partition of the full parameter set into free and constrained parameters $\fullset = (\freeset,\constrset)$, a separate partition $\fullset = (\poiset,\nuisset)$ will be useful in the context of hypothesis testing, where a subset of the parameters are declared parameters of interest $\poiset$ and the remaining ones as nuisance parameters $\nuisset$.

$$f(\bm{x}|\fullset) = f(\bm{x}|\overbrace{\freeset}^{\llap{\text{free}}},\underbrace{\constrset}_{\llap{\text{constrained}}}) = f(\bm{x}|\overbrace{\poiset}^{\rlap{\text{parameters of interest}}},\underbrace{\nuisset}_{\rlap{\text{nuisance parameters}}})$$

Thus, the overall structure of a $\HiFa{}$ probability model is a product of the analysis-specific model term describing the measurements of the channels and the analysis-independent set of constraint terms:

$$\begin{aligned} f(\channelcounts, \auxdata \,|\,\freeset,\constrset) = \underbrace{\color{blue}{\prod_{c\in\mathrm{\,channels}} \prod_{b \in \mathrm{\,bins}_c}\textrm{Pois}\left(n_{cb} \,\middle|\, \nu_{cb}\left(\freeset,\constrset\right)\right)}}_{\substack{\text{Simultaneous measurement}\\% \text{of multiple channels}}} \underbrace{\color{red}{\prod_{\singleconstr \in \constrset} c_{\singleconstr}(a_{\singleconstr} |\, \singleconstr)}}_{\substack{\text{constraint terms}\\% \text{for }\unicode{x201C}\text{auxiliary measurements}\unicode{x201D}}}, \end{aligned}$$

where within a certain integrated luminosity we observe n_cb events given the expected rate of events $\nu_{cb}(\freeset,\constrset)$ as a function of unconstrained parameters $\freeset$ and constrained parameters $\constrset$. The latter has corresponding one-dimensional constraint terms $c_\singleconstr(a_\singleconstr|\,\singleconstr)$ with auxiliary data $a_\singleconstr$ constraining the parameter $\singleconstr$. The event rates ν_cb are defined as

$$\nu_{cb}\left(\fullset\right) = \sum_{s\in\mathrm{\,samples}} \nu_{scb}\left(\freeset,\constrset\right) = \sum_{s\in\mathrm{\,samples}}\underbrace{\left(\prod_{\kappa\in\,\bm{\kappa}} \kappa_{scb}\left(\freeset,\constrset\right)\right)}_{\text{multiplicative modifiers}}\, \Bigg(\nu_{scb}^0\left(\freeset, \constrset\right) + \underbrace{\sum_{\Delta\in\bm{\Delta}} \Delta_{scb}\left(\freeset,\constrset\right)}_{\text{additive modifiers}}\Bigg)\,.$$

The total rates are the sum over sample rates ν_csb, each determined from a nominal rate ν_scb⁰ and a set of multiplicative and additive denoted rate modifiers $\bm{\kappa}(\fullset)$ and $\bm{\Delta}(\fullset)$. These modifiers are functions of (usually a single) model parameters. Starting from constant nominal rates, one can derive the per-bin event rate modification by iterating over all sample rate modifications as shown in eqn:sample_rates.

As summarised in tab:modifiers_and_constraints, rate modifications are defined in $\HiFa{}$ for bin b, sample s, channel c. Each modifier is represented by a parameter ϕ ∈ {γ, α, λ, μ}. By convention bin-wise parameters are denoted with γ and interpolation parameters with α. The luminosity λ and scale factors μ affect all bins equally. For constrained modifiers, the implied constraint term is given as well as the necessary input data required to construct it. σ_b corresponds to the relative uncertainty of the event rate, whereas δ_b is the event rate uncertainty of the sample relative to the total event rate ν_b = ∑_sν_sb⁰.

Modifiers implementing uncertainties are paired with a corresponding default constraint term on the parameter limiting the rate modification. The available modifiers may affect only the total number of expected events of a sample within a given channel, i.e. only change its normalisation, while holding the distribution of events across the bins of a channel, i.e. its “shape”, invariant. Alternatively, modifiers may change the sample shapes. Here $\HiFa{}$ supports correlated an uncorrelated bin-by-bin shape modifications. In the former, a single nuisance parameter affects the expected sample rates within the bins of a given channel, while the latter introduces one nuisance parameter for each bin, each with their own constraint term. For the correlated shape and normalisation uncertainties, $\HiFa{}$ makes use of interpolating functions, f_p and g_p, constructed from a small number of evaluations of the expected rate at fixed values of the parameter α³. For the remaining modifiers, the parameter directly affects the rate.

Modifiers and Constraints

Description	Modification	Constraint Term $c_\singleconstr$	Input
Uncorrelated Shape	κ_scb(γ_b) = γ_b	∏_bPois(r_b=σ_b^− 2\| ρ_b=σ_b^− 2γ_b)	σ_b
Correlated Shape	Δ_scb(α) = f_p(α\| Δ_{scb, α = − 1},Δ_{scb, α = 1})	Gaus(a=0\| α,σ=1)	Δ_{scb, α = ± 1}
Normalisation Unc.	κ_scb(α) = g_p(α\| κ_{scb, α = − 1},κ_{scb, α = 1})	Gaus(a=0\| α,σ=1)	κ_{scb, α = ± 1}
MC Stat. Uncertainty	κ_scb(γ_b) = γ_b	∏_bGaus(a_{γ_b}=1\| γ_b,δ_b)	δ_b² = ∑_sδ_sb²
Luminosity Normalisation Data-driven Shape	κ_scb(λ) = λ κ_scb(μ_b) = μ_b κ_scb(γ_b) = γ_b	Gaus(l=λ₀\| λ,σ_λ)	λ₀, σ_λ

Given the likelihood $\mathcal{L}(\fullset)$, constructed from observed data in all channels and the implied auxiliary data, measurements in the form of point and interval estimates can be defined. The majority of the parameters are nuisance parameters — parameters that are not the main target of the measurement but are necessary to correctly model the data. A small subset of the unconstrained parameters may be declared as parameters of interest for which measurements hypothesis tests are performed, e.g. profile likelihood methods intro-Cowan:2010js. The tab:symbol_summary table provides a summary of all the notation introduced in this documentation.

Symbol Notation

Symbol	Name
$f(\bm{x} \| \fullset)$	model
$\mathcal{L}(\fullset)$	likelihood
$\bm{x} = \{\channelcounts, \auxdata\}$	full dataset (including auxiliary data)
$\channelcounts$	channel data (or event counts)
$\auxdata$	auxiliary data
$\nu(\fullset)$	calculated event rates
$\fullset = \{\freeset, \constrset\} = \{\poiset, \nuisset\}$	all parameters
$\freeset$	free parameters
$\constrset$	constrained parameters
$\poiset$	parameters of interest
$\nuisset$	nuisance parameters
$\bm{\kappa}(\fullset)$	multiplicative rate modifier
$\bm{\Delta}(\fullset)$	additive rate modifier
$c_\singleconstr(a_\singleconstr \| \singleconstr)$	constraint term for constrained parameter $\singleconstr$
$\sigma_\singleconstr$	relative uncertainty in the constrained parameter

Declarative Formats

While flexible enough to describe a wide range of LHC measurements, the design of the $\HiFa{}$ specification is sufficiently simple to admit a declarative format that fully encodes the statistical model of the analysis. This format defines the channels, all associated samples, their parameterised rate modifiers and implied constraint terms as well as the measurements. Additionally, the format represents the mathematical model, leaving the implementation of the likelihood minimisation to be analysis-dependent and/or language-dependent. Originally XML was chosen as a specification language to define the structure of the model while introducing a dependence on $\Root{}$ to encode the nominal rates and required input data of the constraint terms intro-Cranmer:1456844. Using this specification, a model can be constructed and evaluated within the $\RooFit{}$ framework.

This package introduces an updated form of the specification based on the ubiquitous plain-text JSON format and its schema-language JSON Schema. Described in more detail in sec:likelihood, this schema fully specifies both structure and necessary constrained data in a single document and thus is implementation independent.

Additional Material

Footnotes

Bibliography

bib/docs.bib

Here rate refers to the number of events expected to be observed within a given data-taking interval defined through its integrated luminosity. It often appears as the input parameter to the Poisson distribution, hence the name “rate”.↩
These free parameters frequently include the of a given process, i.e. its cross-section normalised to a particular reference cross-section such as that expected from the Standard Model or a given BSM scenario.↩
This is usually constructed from the nominal rate and measurements of the event rate at α = ± 1, where the value of the modifier at α = ± 1 must be provided and the value at α = 0 corresponds to the corresponding identity operation of the modifier, i.e. f_p(α = 0) = 0 and g_p(α = 0) = 1 for additive and multiplicative modifiers respectively. See Section 4.1 in intro-Cranmer:1456844.↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intro.rst

intro.rst

Introduction

HistFactory

Declarative Formats

Additional Material

Footnotes

Bibliography

Files

intro.rst

Latest commit

History

intro.rst

File metadata and controls

Introduction

HistFactory

Declarative Formats

Additional Material

Footnotes

Bibliography