Skip to content

Latest commit

 

History

History
178 lines (142 loc) · 7.67 KB

File metadata and controls

178 lines (142 loc) · 7.67 KB

Statistics Distributions of Uncertainty

.. todo::

  Intro

.. todo::

  - Uniform
  - Normal/Truncated normal
  - Lognormal
  - Gamma
  - Beta
  - Ensemble
  - Etc.

Log-normal distribution

A random variable X has a log-normal distribution with parameters \mu and \sigma, denoted X\sim \mathrm{Lognorm}(\mu, \sigma^2), if and only if its logarithm Y=\log(X) has a normal distribution with mean \mu and variance \sigma^2. Equivalently, a random variable Y satisfies Y\sim \mathcal{N}(\mu, \sigma^2) if and only if its exponential X = \exp(Y) satisfies X \sim \mathrm{Lognorm}(\mu, \sigma^2).

Defining a log-normal distribution to simulate parameter uncertainty

A lognormal distribution may be an appropriate model of uncertainty for any positive, unbounded quantity X. Rates and relative risks are common model parameters for which a lognormal distribution may be appropriate. If the literature reports a value v for X and an asymmetric 95% confidence interval (a,b) such that v is close to the geometric mean of the endpoints (that is, v \approx \sqrt{ab}, which is to the left of the interval's midpoint), this is a good indication that the uncertainty in X can be modeled by a lognormal distribution with geometric mean v and central 95% interval (a,b).

A lognormal distribution with parameters \mu and \sigma has geometric mean e^\mu, which is also equal to the median of the distribution. Below is Python code to construct a scipy.stats lognormal distribution with a specified geometric mean (median) e^\mu = v = median and approximate central 95% interval (a,b) (where a = lower and b = upper). The interval (a,b) will be exactly the central 95% interval of the distribution if and only if v = \sqrt{ab}.

import numpy as np
from scipy import stats

def lognorm_from_median_lower_upper(median, lower, upper, quantile_ranks=(0.025,0.975)):
  """Returns a frozen lognormal distribution with the specified median, such that
  the values (lower, upper) are approximately equal to the quantiles with ranks
  (quantile_ranks[0], quantile_ranks[1]). More precisely, if q0 and q1 are
  the quantiles of the returned distribution with ranks quantile_ranks[0]
  and quantile_ranks[1], respectively, then q1/q0 = upper/lower. If the
  quantile ranks are symmetric about 0.5, lower and upper will coincide with
  q0 and q1 precisely when median^2 = lower*upper.
  """
  # Let Y ~ Norm(mu, sigma^2) and X = exp(Y), where mu = log(median)
  # so X ~ Lognorm(s=sigma, scale=exp(mu)) in scipy's notation.
  # We will determine sigma from the two specified quantiles lower and upper.

  # mean (and median) of the normal random variable Y = log(X)
  mu = np.log(median)
  # quantiles of the standard normal distribution corresponding to quantile_ranks
  stdnorm_quantiles = stats.norm.ppf(quantile_ranks)
  # quantiles of Y = log(X) corresponding to the quantiles (lower, upper) for X
  norm_quantiles = np.log([lower, upper])
  # standard deviation of Y = log(X) computed from the above quantiles for Y
  # and the corresponding standard normal quantiles
  sigma = (norm_quantiles[1] - norm_quantiles[0]) / (stdnorm_quantiles[1] - stdnorm_quantiles[0])
  # Frozen lognormal distribution for X = exp(Y)
  # (s=sigma is the shape parameter; the scale parameter is exp(mu), which equals the median)
  return stats.lognorm(s=sigma, scale=median)

Note that since the log-normal distribution has only two parameters \mu and \sigma, using three parameters v= median, a= lower, and b= upper to specify the distribution results in an overdetermined system if the specified median v is not exactly equal to the geometric mean of a and b. In ths case, the above code actually determines the distribution's two parameters from the median v and the ratio b/a, and the returned distribution's central 95% interval will only be approximately equal to (a,b).

If the confidence interval (a,b) was in fact generated from a log-normal distribution with the reported value v being an estimate of the geometric mean e^\mu of X, then any discrepancy between v and \sqrt{ab} would be due to rounding errors. In this case, the above function incorporates all available data and generally results in a better fit than using only one of the two estimated quantiles a or b. See Nathaniel's notebook investigating different lognormal distributions for the :ref:`CIFF acute malnutrition project <2019_concept_model_vivarium_ciff_sam>`.

.. todo::

  Investigate more fully what the above algorithm does when there is no
  lognormal distribution matching the three parameters ``median``, ``lower`` and
  ``upper`` for the specified quantile ranks.

  As noted in the docstring, here's a brief description of what the algorithm
  does. Let :math:`v=` ``median``, :math:`a=` ``lower``, and :math:`b=`
  ``upper``, and let :math:`(p_0, p_1) =` ``(quantile_ranks[0],
  quantile_ranks[1])`` be the specified quantile ranks. Then the returned
  lognormal distribution has median :math:`v` and quantiles :math:`a'` and
  :math:`b'` of ranks :math:`p_0` and :math:`p_1` such that :math:`b'/a' = b/a`.

.. todo::

  - Relative risk
  - Mean difference
  - Proportion
  - Cost estimate
  - Etc.

.. todo::

  - How to handle very asymmetric confidence intervals
  - How to handle uncertainty in data source(s) rather than statistical uncertainty from a single high quality data source?
    - Ex: combining multiple estimates from published papers with their own statistical uncertainty
  - How to handle uncertaity when extrapolating a subnataional estimate to a national estimate?
  - How to handle uncertainty distribution in the case of joint distributions