In [1]:
from nbmetalog import nbmetalog as nbm


In [2]:
nbm.print_metadata()


context: ci
hostname: 7ab68eaf880c
interpreter: 3.8.12 (default, Jan 15 2022, 18:39:47)  [GCC 7.5.0]
nbcellexec: 2
nbname: maximum_likelihood_popsize_estimator_mean_square_error
nbpath: /opt/hereditary-stratigraph-concept/binder/popsize/maximum_likelihood_popsize_estimator_mean_square_error.ipynb
revision: null
session: e05d9128-34ee-43d5-896d-35a501eb45c6
timestamp: 2022-12-01T11:26:00Z00:00


IPython==7.16.1
keyname==0.4.1
yaml==5.3.1
nbmetalog==0.2.6
re==2.2.1
ipython_genutils==0.2.0
logging==0.5.1.2
zmq==22.3.0
json==2.0.9
ipykernel==5.5.3


# Goal

Compute mean square error for maximum likelihood estimator for population size given $k$ observations of fixed gene magnitude.


# Derivation

Take $\boldsymbol{X}_i$ as a random variable representing a single observation of fixed gene magnitude.
Take $\boldsymbol{X}$ as $\prod_i=1^{k} \boldsymbol{X}_i$.

From the definition of mean square error,

$\begin{align*}
\mathrm{MSE}( \hat{n}_\mathrm{mle} )
&=
E\Big[ \Big(n-\hat{n}_\mathrm{mle} \Big)^2 \Big]\\
&=
E\Big[ \Big(n - \frac{-k}{\log(\boldsymbol{X})} \Big)^2 \Big]\\
&=
E\Big[ \Big(n + \frac{k}{\log(\boldsymbol{X})} \Big)^2 \Big]\\
&=
E\Big[ n^2 + \frac{2kn}{\log(\boldsymbol{X})} + \frac{k^2}{\log^2(\boldsymbol{X})} \Big]\\
&=
E\Big[ n^2 \Big] + E\Big[\frac{2kn}{\log(\boldsymbol{X})}\Big] + E\Big[\frac{k^2}{\log(\boldsymbol{X})^2} \Big]\\
&=
n^2 + 2kn \times E\Big[\frac{1}{\log(\boldsymbol{X})}\Big] + k^2 \times E\Big[\frac{1}{\log^2(\boldsymbol{X})} \Big]\\
&=
n^2 + 2kn \times E\Big[\frac{1}{\log(\boldsymbol{X})}\Big] + k^2 \times E\Big[\frac{1}{\log^2(\boldsymbol{X})} \Big]\\
&=
n^2 + 2kn \int_0^1 \frac{p(x)}{\log(x)} \, \mathrm{d}x + k^2 \int_0^1 \frac{p(x)}{\log^2(x)} \, \mathrm{d}x.
\end{align*}$

From [extrema_product_probability_density_function.ipynb](extrema_product_probability_density_function.ipynb) we have

$$
p(x)
= \frac{(-1)^{k+1} x^{n-1} n^{k} \log^{k-1}(x)}{(k-1)!}.
$$

So,

$\begin{align*}
\mathrm{MSE}( \hat{n}_\mathrm{mle} )
&=
n^2 + 2kn \int_0^1 \frac{p(x)}{\log(x)} \, \mathrm{d}x + k^2 \int_0^1 \frac{p(x)}{\log^2(x)} \, \mathrm{d}x\\
&=
n^2 + 2kn \int_0^1 \frac{\frac{(-1)^{k+1} x^{n-1} n^{k} \log^{k-1}(x)}{(k-1)!}}{\log(x)} \, \mathrm{d}x + k^2 \int_0^1 \frac{\frac{(-1)^{k+1} x^{n-1} n^{k} \log^{k-1}(x)}{(k-1)!}}{\log^2(x)} \, \mathrm{d}x\\
&=
n^2 + \frac{2(-1)^{k+1} n^{k+1} k}{(k-1)!} \int_0^1 x^{n-1} \log^{k-2}(x) \, \mathrm{d}x + \frac{(-1)^{k+1} k^2 n^{k}}{(k-1)!} \int_0^1  x^{n-1} \log^{k-3}(x) \, \mathrm{d}x.
\end{align*}$

Sympy can perform this integration, but very slowly.
[WolframAlpha](https://www.wolframalpha.com/input?i=%5Cint+x%5E%7Bn-1%7D+%5Clog%5E%7Br%7D%28x%29+%5C%2C+%5Cmathrm%7Bd%7Dx) gives the integration

$\begin{align*}
\int x^{n-1} \log^{r}(x) \, \mathrm{d}x
&=
\frac{ \Gamma(r+1, -n\log(x)) \log^r(x)}{n(-n \log(x))^r}\\
&=
\frac{ (-1)^{r} \Gamma(r+1, -n\log(x)) }{n^{r+1}}.
\end{align*}$

Evaluated between 0 and 1, this becomes

$\begin{align*}
\int_0^1 x^{n-1} \log^{r}(x) \, \mathrm{d}x
&=
\frac{ (-1)^{r} \Gamma(r+1, -n\log(x)) }{n^{r+1}} \Big|_0^1\\
&=
\frac{ (-1)^{r} \Gamma(r+1, -n\log(1)) }{n^{r+1}} - \frac{ (-1)^{r} \Gamma(r+1, -n\log(0)) }{n^{r+1}}\\
&=
\frac{ (-1)^{r} \Gamma(r+1, 0) }{n^{r+1}} - \frac{ (-1)^{r} \Gamma(r+1, \infty) }{n^{r+1}}\\
&=
\frac{ (-1)^{r} \Gamma(r+1) }{n^{r+1}} - \frac{ (-1)^{r} 0 }{n^{r+1}}\\
&=
\frac{ (-1)^{r} \Gamma(r+1) }{n^{r+1}}.
\end{align*}$

Applying this integration,

$\begin{align*}
\mathrm{MSE}( \hat{n}_\mathrm{mle} )
&=
n^2 + \frac{2(-1)^{k+1} n^{k+1} k}{(k-1)!} \int_0^1 x^{n-1} \log^{k-2}(x) \, \mathrm{d}x + \frac{(-1)^{k+1} k^2 n^{k}}{(k-1)!} \int_0^1  x^{n-1} \log^{k-3}(x) \, \mathrm{d}x\\
&=
n^2 + \frac{2(-1)^{k+1} n^{k+1} k}{(k-1)!} \frac{ (-1)^{k-2} \Gamma(k-2+1) }{n^{k-2+1}} + \frac{(-1)^{k+1} k^2 n^{k}}{(k-1)!} \frac{ (-1)^{k-3} \Gamma(k-3+1) }{n^{k-3+1}}\\
&=
n^2 + \frac{2(-1)^{k+1} n^{k+1} k}{(k-1)!} \frac{ (-1)^{k} \Gamma(k-1) }{n^{k-1}} + \frac{(-1)^{k+1} k^2 n^{k}}{(k-1)!} \frac{ (-1)^{k-1} \Gamma(k-2) }{n^{k-2}}\\
&=
n^2 + \frac{2(-1)^{2k+1} n^2 k}{(k-1)!} \Gamma(k-1) + \frac{(-1)^{2k} k^2 n^2}{(k-1)!} \Gamma(k-2)\\
&=
n^2 - \frac{2n^2 k}{(k-1)!} \Gamma(k-1) + \frac{k^2 n^2}{(k-1)!} \Gamma(k-2)\\
&=
n^2 - \frac{2n^2 k}{(k-1)!} (k-2)! + \frac{k^2 n^2}{(k-1)!} (k-3)!\\
&=
n^2 - \frac{2n^2 k}{(k-1)} + \frac{k^2 n^2}{(k-1)(k-2)}\\
&=
n^2\Big(1 - \frac{2k}{(k-1)} + \frac{k^2}{(k-1)(k-2)}\Big)\\
&=
n^2\frac{(k-1)(k-2) - 2k(k-2) + k^2}{(k-1)(k-2)}\\
&=
n^2\frac{k^2-3k+2 - 2k^2+4k + k^2}{(k-1)(k-2)}\\
&=
n^2\frac{k+2}{(k-1)(k-2)}.
\end{align*}$


# Literature Review

[(Terelius, 2012)](terelius2012distributed) Give normalized MSE for the Maximum Likelihood estimator as,

$\begin{align*}
E\Big[ \Big(\frac{n-\hat{n}_\mathrm{mle}}{n} \Big)^2 \Big]
&=
\frac{k^{2}+ k-2}{(k-1)^{2}(k-2)}.
\end{align*}$

So, mean square error follows as

$\begin{align*}
E\Big[ \Big(n-\hat{n}_\mathrm{mle} \Big)^2 \Big]
&= n^2 \frac{k^{2}+ k-2}{(k-1)^{2}(k-2)}\\
&= n^2 \frac{(k-1)(k+2)}{(k-1)^{2}(k-2)}\\.
&\stackrel{\checkmark}{=} n^2 \frac{k+2}{(k-1)(k-2)}.
\end{align*}$


# Result

Mean square error of the maximum likelihood estimator for population size given $k$ observations of fixed gene magnitude is

$\begin{align*}
n^2 \frac{k^{2}+ k-2}{(k-1)^{2}(k-2)}.
\end{align*}$


# References

<a
   id="terelius2012distributed"
   href="http://dx.doi.org/10.1109/CDC.2012.6425912">
H. Terelius, D. Varagnolo and K. H. Johansson, "Distributed size estimation of dynamic anonymous networks," 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012, pp. 5221-5227, doi: 10.1109/CDC.2012.6425912.
</a>
