In [None]:
from nbmetalog import nbmetalog as nbm


In [None]:
nbm.print_metadata()


# Goal

Suppose the scenario described in [gene_drive_scenario.ipynb](gene_drive_scenario.ipynb).

Here, we apply the method of maximum likelihood estimation to determine the most likely value of $n$ given an observation of $k$ gene values after fixation.


# Calculating Likelihood

Denote the likelihood of a population size $n$ given our observations as $\mathcal{L}(n|\mathbb{X}_0=x_0, \dots, \mathbb{X}_k=x_k)$ or $\mathcal{L}$ for short.
Because our observations are independent, we can calculate likelihood as a product of probability densities,

$\begin{align*}
\mathcal{L}
&= \prod_{i=1}^k n x_i^{n-1}.
\end{align*}$

Applying a logarithmic transformation for convenience,

$\begin{align*}
\log\mathcal{L}
&= \sum_{i=1}^k \log( n x_i^{n-1} ) \\
&= (n-1) \sum_{i=1}^k \log( x_i ) + k \log(n)
\end{align*}$


# Finding Likelihood Extrema

To maximize $\log\mathcal{L}$ with respect to $n$, solve for $n$ where $\frac{\mathrm{d}}{\mathrm{d}n}\log\mathcal{L} = 0$,

$\begin{align*}
0
&= \frac{\mathrm{d}}{\mathrm{d}n}\log\mathcal{L} \\
&= \frac{\mathrm{d}}{\mathrm{d}n} \Big( (n-1) \sum_{i=1}^k \log( x_i ) + k \log(n) \Big)\\
&= \sum_{i=1}^k \log( x_i ) + k/n\\
-k/n &= \sum_{i=1}^k \log( x_i )\\
-k &= n\sum_{i=1}^k \log( x_i )\\
n &= -\frac{k}{\sum_{i=1}^k \log( x_i )}.
\end{align*}$

Note that $\forall i$, $x_i \leq 1$ so $\log( x_i ) \leq 0$ and $\sum_{i=1}^k \log( x_i ) \leq 0$.


# Proving Extrema is Maxima

To check that $n = -\frac{k}{\sum_{i=1}^k \log( x_i )}$ maximizes $\log\mathcal{L}$ rather than minimizing it, we must show that $\frac{\mathrm{d}^2}{\mathrm{d}n^2} < 0$ at this point.

$\begin{align*}
0 &\stackrel{?}{>} \frac{\mathrm{d}^2}{\mathrm{d}n^2} \log\mathcal{L}|_{n = -\frac{k}{\sum_{i=1}^k \log( x_i )}}\\
&\stackrel{?}{>} \frac{\mathrm{d}}{\mathrm{d}n} \sum_{i=1}^k \log( x_i ) + k n^{-1} |_{n = -\frac{k}{\sum_{i=1}^k \log( x_i )}}\\
&\stackrel{?}{>} -kn^{-2} |_{n = -\frac{k}{\sum_{i=1}^k \log( x_i )}}\\
&\stackrel{?}{>} -k/n^{2} |_{n = -\frac{k}{\sum_{i=1}^k \log( x_i )}}\\
&\stackrel{?}{>} -k\\
&\stackrel{?}{<} k.
\end{align*}$

Because $k$ is our count of 1 or more replicate observations, we have $0 \stackrel{\checkmark}{<} k$.


# Result

We have derived the maximum likelihood estimator for $n$ given $k$ observations of fixed gene magnitude $x_1, x_2, ... x_k$ as

$\hat{n}_\mathrm{mle} = -\frac{k}{\sum_{i=1}^k \log( x_i )}$.
