# Appendix D: Predictive Distributions

The purpose of this appendix is to discuss some of the predictive distributions arising from various choices of pre-match features.

## Bernoulli distribution

We start by assuming that each match played by some given team, say team A, within a given season has an outcome governed
by a Bernoulli distribution with constant but unknown parameter $\theta$. For simplicity, we shall neglect the usual subscripts that
denote the particular team. We now suppose that the team has played $n$ matches, winning $w$ of them and losing 
$\ell=n-w$. 
Note that a draw may be regarded as half-a-win and half-a-loss
(see [Appendix C](./C_regression_models.ipynb#Bernoulli-distribution "Regression Models: Bernoulli distribution")),
such that $w$ and $\ell$ are adjusted 'counts' (that no longer need be integer).

### Bernoulli data likelihood

The joint likelihood of the particular sequence of matches won and lost is therefore
\begin{eqnarray}
p(w,n-w\mid\theta) & = & \theta^w\,(1-\theta)^{n-w}\,.
\end{eqnarray}

### Bernoulli prior distribution

We seek a non-informative prior distribution for $\theta$. 
Following Box and Tiao [[1]](#Citations "Citation [1]: ???"), 
we desire a transformation $\phi(\theta)$ such that the likelihood plotted as a function of $\phi$ retains
an approximately constant shape for some fixed $n$ as $w$ varies. A uniform prior for $\phi$ then induces a
non-informative prior for $\theta$. In general, it turns out that the relevant prior is usually inversely
proportional to the square-root of the variance. For the Bernoulli distribution, the non-informative
prior is therefore
\begin{eqnarray}
p(\theta) ~\propto~ \frac{1}{\sqrt{\theta\,(1-\theta)}}
& ~\Rightarrow~ & \theta~\sim~\mathtt{Beta}\left(\frac{1}{2},\frac{1}{2}\right)
\,.
\end{eqnarray}
This is Jeffreys' prior.

### Bernoulli posterior distribution

Multiplying the likelihood by the prior, we obtain
\begin{eqnarray}
p(\theta,w,n-w) & ~=~ & \frac{\theta^{w-\frac{1}{2}}\,(1-\theta)^{n-w-\frac{1}{2}}}{B(\frac{1}{2},\frac{1}{2})}\,,
\end{eqnarray}
where $B(\alpha,\beta)$ is the Beta function.
Integrating out $\theta$ then gives
\begin{eqnarray}
p(w,n-w) & ~=~ & \int_0^1 p(\theta,w,n-w)\,d\theta
\\&~=~&
\int_{0}^{1}\frac{\theta^{w-\frac{1}{2}}\,(1-\theta)^{n-w-\frac{1}{2}}}{B(\frac{1}{2},\frac{1}{2})}\,d\theta
~=~
\frac{B(w+\frac{1}{2},n-w+\frac{1}{2})}{B(\frac{1}{2},\frac{1}{2})}\,.
\end{eqnarray}

The posterior distribution is therefore given by
\begin{eqnarray}
p(\theta\mid w,n-w) & ~=~ & \frac{p(\theta,w,n-w)}{p(w,n-w)}
~=~\frac{\theta^{w-\frac{1}{2}}\,(1-\theta)^{n-w-\frac{1}{2}}}
{B(w+\frac{1}{2},n-w+\frac{1}{2})}\,,
\\
\Rightarrow \theta\mid w,n-w & ~\sim~ & \mathtt{Beta}\left(w+\frac{1}{2},n-w+\frac{1}{2}\right)\,.
\end{eqnarray}

### Bernoulli predictive distribution

Suppose the result of the $(n+1)$-th match is now $X$. Then the likelihood of this result is
\begin{eqnarray}
p(X=x\mid\theta) & ~=~ & \theta^{x}\,(1-\theta)^{1-x}\,.
\end{eqnarray}
Hence, the predictive probability of this result is given by
\begin{eqnarray}
p(X=x\mid w,n-w) & ~=~ & \int_0^1 p(x\mid\theta)\,p(\theta\mid w,n-w)\,d\theta
\\&~=~&
\int_0^1\frac{\theta^{w+x-\frac{1}{2}}\,(1-\theta)^{n-w-x+\frac{1}{2}}}
{B(w+\frac{1}{2},n-w+\frac{1}{2})}\,d\theta
~=~
\frac{B(w+x+\frac{1}{2},n-w+1-x+\frac{1}{2})}
{B(w+\frac{1}{2},n-w+\frac{1}{2})}
\,.
\end{eqnarray}

In terms of the Gamma function, $\Gamma(\cdot)$, this becomes
\begin{eqnarray}
p(X=x\mid w,n-w) & ~=~ & 
\frac{\Gamma(w+x+\frac{1}{2})\,\Gamma(n-w+1-x+\frac{1}{2})}
     {\Gamma(n+x+2)}\,
\frac{\Gamma(n+1)}
     {\Gamma(w+\frac{1}{2})\,\Gamma(n-w+\frac{1}{2})}\,.
\end{eqnarray}
For a loss, i.e. $X=0$, the respective probability reduces to
\begin{eqnarray}
p(X=0\mid w,n-w) & ~=~ & \frac{n-w+\frac{1}{2}}{n+1}\,,
\end{eqnarray}
using the recurrence relation that $\Gamma(z+1)=z\,\Gamma(z)$.
The corresponding probability of a win is therefore
\begin{eqnarray}
p(X=1\mid w,n-w) & ~=~ & \frac{w+\frac{1}{2}}{n+1}\,,
\end{eqnarray}
such that
\begin{eqnarray}
p(X=x\mid w,n-w) & ~=~ & \frac{(w+\frac{1}{2})^x\,(n-w+\frac{1}{2})^{1-x}}{n+1}\,.
\end{eqnarray}


Note that the denominator of $n+1$ corresponds to assuming $n$ observed matches plus a single prior pseudo-match.
Similarly, the numerator $w+\frac{1}{2}$ of a win corresponds to $w$ observed wins plus a prior pseudo-draw, 
i.e. half-a-win.
Likewise, the observed $\ell=n-w$ losses have been *smoothed* by a prior pseudo-draw, i.e. half-a-loss.

## Citations

[1] Box and Tiao???.