# **Data Analysis in High Energy Physics: Exercise 1.5 $p$-values**

**Find the number of standard deviations corresponding to $p$-values of 10%, 5%, and 1% for a Gaussian distribution. Consider both one-sided and two-sided $p$-values.**

As for the two-tailed Gaussian,

$\displaystyle p(x) = P(\left|X\right| \geq x) = 1-\text{erf}\left(\frac{x}{\sqrt{2}\sigma}\right) \equiv \text{erfc}\left(\frac{x}{\sqrt{2}\sigma}\right)$,

it is seen that for $x=n \sigma$, then

$\displaystyle p(n \sigma) = P(\left|X\right| \geq n \sigma) = 1-\text{erf}\left(\frac{n}{\sqrt{2}}\right)$,

thus,

$\displaystyle \text{erf}\left(\frac{n}{\sqrt{2}}\right) = 1 - p(n \sigma)$.

However, at this point we are at an impass analytically, as a Gaussian integral evaluated at bounds that are not $\pm \infty$ has no analytic solution, and must be evaluated numerically.

In [None]:
double pvalues[3]={0.10, 0.05, 0.01};
double sigma;

So using erfc,

In [None]:
for(const double &p : pvalues){
    double n = TMath::Sqrt(2.)*TMath::ErfcInverse(p);
    std::cout << n << " standard deviations corresponds to a p-value of " << p << std::endl;
}

and using erf,

In [None]:
for(const double &p : pvalues){
    double n = TMath::Sqrt(2.)*TMath::ErfInverse(1-p);
    std::cout << n << " standard deviations corresponds to a p-value of " << p << std::endl;
}

the same output is found (as required by the defintion of the functions).

As a one-sided $p$-value considers the probability for the data to have produced as of an exterme value
on only one side of the distribution---$\displaystyle P\left(X \geq x\right)$ for the right tail, or $\displaystyle P\left(X \leq x\right)$ for the left tail---it is seen that for a normalized Gaussian a one-tailed $p$-vaule is $1/2$ that of a two-tailed $p$-value.?????

\begin{split}
    p(x) = P\left(X \geq \left|x\right|\right)&= 1 - \frac{1}{\sqrt{2\pi}}\int\limits_{-\infty}^{x} e^{-t^2/2}\,dt = 1 - \frac{1}{2}\left(1+\text{erf}\left(\frac{x}{\sqrt{2}}\right)\right)\\
    &= 1-\Phi(x)\\
    &= \frac{1}{2}\left(1-\text{erf}\left(\frac{x}{\sqrt{2}}\right)\right) = \frac{1}{2}\text{erfc}\left(\frac{x}{\sqrt{2}}\right)
\end{split}

thus for $x = n \sigma$,

$\displaystyle \text{erf}\left(\frac{n\sigma}{\sqrt{2}}\right) = 1 - 2\,p(n \sigma)$.

In [None]:
sigma = 1.;

for(const double &p : pvalues){
    double n = (TMath::Sqrt(2.)/sigma)*TMath::ErfcInverse(2*p);
    std::cout << n << " standard deviations corresponds to a p-value of " << p << std::endl;
}

std::cout << std::endl;

for(const double &p : pvalues){
    double n = (TMath::Sqrt(2.)/sigma)*TMath::ErfInverse(1-2*p);
    std::cout << n << " standard deviations corresponds to a p-value of " << p << std::endl;
}

### Sanity Check

In [None]:
double checkvalues[5]={0.317310507863, 0.045500263896, 0.002699796063, 0.000063342484, 0.000000573303};

In [None]:
for(const double &p : checkvalues){
    double n = TMath::Sqrt(2.)*TMath::ErfcInverse(p);
    std::cout << n << " standard deviations corresponds to a p-value of " << p << std::endl;
}

$\checkmark$