# Simulating permutation test $P$-values

Suppose that if the null hypothesis is true, the probability distribution of the
data is invariant under some group $\mathcal{G}$, for instance, the reflection group or the symmetric (i.e., permutation) group.

For any pre-specified test statistic $T$, we can estimate a $P$-value by generating uniformly distributed random elements of the orbit of the data under the action of the group (see [Mathematical Fundations](./math-foundations.ipynb) if these notions are unfamiliar).

Suppose we generate $n$ random elements of the orbit.
Let $x_0$ denote the original data; let $\{\pi_k\}_{k=1}^K$ denote IID random elements of 
$\mathcal{G}$ and $x_k = \pi_k(x_0)$, $k=1, \ldots, K$ denote $K$ random elements of the
orbit of $x_0$ under $\mathcal{G}$.

An unbiased estimate of the $P$-value (assuming that the random elements are generated uniformly at random--see [Algorithms for Pseudo-Random Sampling](./permute-sample.ipynb) for a caveats), is

\begin{equation*} 
  \hat{P} = \frac{\#\{ k>0: T(\pi_k(x_0)) \ge T(x_0)\}}{K}.
\end{equation*}

Once $x_0$ is known, the events $\{T(\pi_j(x_0)) \ge T(x_0)\}$ are IID with probability
$P$ of occurring, and $\hat{P}$ is an unbiased estimate of $P$.

Another estimate of $P$ that can also be interpreted as an exact $P$-value for
a randomized test (as discussed below), is

\begin{equation*} 
  \hat{P}' = \frac{\#\{ k \ge 0: T(\pi_k(x_0)) \ge T(x_0)\}}{K+1},
\end{equation*}

where $\pi_0$ is the identity permutation.

The reasoning behind this choice is that, if the null hypothesis is true, the original data are one of the equally likely elements of the orbit of the data--exactly as likely as the elements generated from it. 
Thus there are really $K+1$ values that are equally likely if the null is true,
rather than $K$: nature provided one more random permutation, the original data. 
The estimate $\hat{P}'$ is never smaller than $1/(K+1)$.
Some practitioners like this because it never estimates the $P$-value to be zero.
There are other reasons for preferring it, discussed below.

The estimate $\hat{P}'$ of $P$ is generally biased, however, since $\hat{P}$ is unbiased and 

\begin{equation*} 
   \hat{P}' = \frac{K\hat{P} + 1}{K+1} = \frac{K}{K+1} \hat{P} + \frac{1}{K+1},
\end{equation*}

so

\begin{equation*} 
   \mathbb{E} \hat{P}' = \frac{K}{K+1} P + \frac{1}{K+1} =
   P  + (1-P) \frac{1}{K+1} > P.
\end{equation*}
  