# Exercise 7.1: $\sqrt{N}$ Upper Limits
 >__Created__:  29 Sep. 2017 Harrison B. Prosper
 
 >__Updated__:  25 Jun. 2018, adaped for ESHEP 18, HBP


In this exercise, we shall determine the relative
frequency with which statements of the form  $$N + \sqrt{N} > \theta,$$
are true in an ensemble of 10,000 experiments, each associated with a _different_ mean count $\theta$.
We assume that each experiment yields a _single_ count $N$. Note that in the real world, unless the phenomenon being
investigated does not exist - in which case the mean count is zero, it is highly unlikely that every
experiment in a
random collection of experiments would be associated with _exactly_ the same mean count. 

We shall simulate 
 such an ensemble of experiments by 
sampling their _mean_ counts from a uniform distribution,
$\textrm{uniform}(0, b) = 1 \, / \, b$,
with mean $b = 3$.  

__TRandom3__ will be used to generate the sequences of random numbers.
   * $N_\textrm{exp}$ number of experiments
   * $b$ range of uniform density
   
Each experiment obtains a count $N$. The statement $$N + \sqrt{N} > \theta,$$ is either _True_ of _False_, where $\theta$ is the mean count for that experiment. Ordinarily, we do not know the mean count $\theta$ associated with an experiment. However, in a simulated world we typically do. Therefore, we can determine whether or not each statement is true. In the limit of an infinitely large ensemble of experiments, the relative frequency with which statements of the form $N + \sqrt{N} > \theta$ are true is called the __coverage__ probability. Note: the latter is a property of the *ensemble* to which the statements belong and *not* a property of any given statement. Consequently, if a given statement is *imagined* to be a embedded in a different ensemble, then, in general, the coverage probability will change. This is an example of the *reference class problem*. Absolute probabilities do not exist; all are conditional.

__The Frequentist Principle__ The goal of frequentist analyses is to guarantee the following: over an (infinite) ensemble of statements, *which could be about different things*, a minimum fraction, CL, of these statements are true. The CL is called the __confidence level__. The clever thing is to invent procedures in which the CL is specified _a priori_. For Gaussian random variables $x$ statements of the form $\mu \in [x - \sigma, x + \sigma]$, where $\mu$ is the mean of the Gaussian, which can vary from one  experiment to the next, are true 68.3% of the time.


In [1]:
import os, sys
import ROOT
%jsroot off

Welcome to JupyROOT 6.12/06


In [2]:
Nexp = 10000  # number of experiments/statements
b    = 3.0    # range of uniform distribution
ran  = ROOT.TRandom3() # This has a cycle of 2^19937 - 1 ~ 10^6001

__Model the experiments__

In the following, we use a Python programming construct called __list comprehension__ to create one Python list from another. The syntax is
```python
    alist = [ expression for loop expression involving a list ]
```


In [3]:
def performExperiments(Nexp, b, ran):
    from math import sqrt, exp
    
    # generate Nexp mean values
    theta = [ran.Uniform(0, b) for i in xrange(Nexp)]
    
    # generate Nexp experimental outcomes
    N  = [ran.Poisson(mean) for mean in theta]

    # compute upper limits
    U = [x + sqrt(x) for x in N]

    return (theta, N, U)        

In [4]:
theta, N, U = performExperiments(Nexp, b, ran)

K   = 10
fmt = ' %5.2f' * K
print 'theta', fmt % tuple(theta[:K])
print 'N    ', fmt % tuple(N[:K])
print 'U    ', fmt % tuple(U[:K])

theta   3.00  0.49  0.85  2.84  0.69  1.45  2.87  2.23  1.62  2.22
N       3.00  0.00  5.00  2.00  1.00  1.00  6.00  3.00  1.00  3.00
U       4.73  0.00  7.24  3.41  2.00  2.00  8.45  4.73  2.00  4.73


__Analyze results of experiments__ 

Relative frequency $p = k \, / \, n$ with rough measure of uncertainty $\sqrt{n p (1 - p)} \, / \, n$.

In [5]:
def computeCoverage(theta, U): 
    from math import sqrt
    
    # number of experiments
    n = len(theta)
    
    # count number of true statements
    t  = [ U[i] > theta[i] for i in range(n) ]
    
    # compute coverage fraction (i.e., fraction of true statements)
    k  = sum(t)
    p  = float(k)/n
    
    # since we have k true statements our of n, this is a binomial
    # problem with variance n*p*(1-p). Therefore, a rough estimate
    # of the uncertainty in p is
    dp = sqrt(n*p*(1-p))/n
    
    return (p, dp)

In [6]:
results = computeCoverage(theta, U)
print "coverage: %8.3f %8.3f" % results

coverage:    0.617    0.005
