In [1]:
from IPython.display import display
import math
import mpmath
from nbmetalog import nbmetalog as nbm
import os
import sympy


In [2]:
nbm.print_metadata()


context: ci
hostname: bd28f571e2ee
interpreter: 3.8.10 (default, May 26 2023, 14:05:08)  [GCC 9.4.0]
nbcellexec: 2
nbname: mildest_extrema_popsize_estimator_expected_value
nbpath: /opt/hereditary-stratigraph-concept/binder/popsize/mildest_extrema_popsize_estimator_expected_value.ipynb
revision: null
session: 43544a01-ee12-4f66-8057-6e860fb1333b
timestamp: 2023-11-05T01:04:00Z00:00


IPython==7.16.1
keyname==0.4.1
yaml==5.3.1
mpmath==1.2.1
nbmetalog==0.2.6
sympy==1.5.1
re==2.2.1
ipython_genutils==0.2.0
logging==0.5.1.2
zmq==22.3.0
json==2.0.9
ipykernel==5.5.3


# Goal

Derive the expected value for the mildest extrema estimator for population size $\hat{n}_\mathrm{mue}$.


# Derivation

From [mildest_extrema_popsize_estimator.ipynb](mildest_extrema_popsize_estimator.ipynb), we have

$$
\hat{n}_\mathrm{mue} = \frac{
    \log \Big( - \frac{1}{2}^{1/k} + 1 \Big)
}{\log( \min(x_1, x_2, ..., x_k) )}.
$$

where $p(x_i) =  k nx_i^{n-1} \Big(1 - x_i^n \Big)^{k-1}$ for $x_i \in [0,1]$ and $p(x_i) = 0$ otherwise.

From [mildest_extrema_popsize_estimator.ipynb](mildest_extrema_popsize_estimator.ipynb), we also have

$$
p(x_\min) =
k n x_\min^{n-1} \Big(1 - x_\min^n \Big)^{k-1}.
$$

Working from the definition of expected value,

$\begin{align*}
E(\hat{n}_\mathrm{mue})
&= E \Big(\frac{
    \log \Big( - \frac{1}{2}^{1/k} + 1 \Big)
}{\log( \min(x_1, x_2, ..., x_i) )} \Big)\\
&= E \Big(\frac{
    \log \Big( - \frac{1}{2}^{1/k} + 1 \Big)
}{\log( x_\min )} \Big)\\
&= \int_0^1 \frac{
    \log \Big( - \frac{1}{2}^{1/k} + 1 \Big)
}{\log x_\min } \times k nx_\min^{n-1} \Big(1 - x_\min^n \Big)^{k-1} \, \mathrm{d}x_\min\\
&= k n \log \Big( - \frac{1}{2}^{1/k} + 1 \Big) \int_0^1 \frac{
1
}{\log x_\min } \times x_\min^{n-1} \Big(1 - x_\min^n \Big)^{k-1} \, \mathrm{d}x_\min\\
&= k n \log \Big( - \frac{1}{2}^{1/k} + 1 \Big) \int_0^1 \frac{
x^{n-1}
}{\log x } \Big(1 - x^n \Big)^{k-1} \, \mathrm{d}x.
\end{align*}$

To derive a general form for the integral at hand, we will use computer algebra to test the first few terms and then extrapolate.


In [3]:
def compute_integral(*, k: int,) -> sympy.Expr:

    x = sympy.Symbol('x', nonnegative=True, real=True,)
    n = sympy.Symbol('n', nonnegative=True, real=True,)

    density = x ** (n-1) * (1 - x**n)**(k-1) / sympy.log(x)
    expected_value = sympy.integrate(
        density,
        (x, 0, 1,),
    ).simplify()
    return expected_value


In [4]:
compute_integral(k=1,)


Integral(x**(n - 1)/log(x), (x, 0, 1))

In [5]:
compute_integral(k=2,)


-log(2)

In [6]:
# disabled in CI due to compute intensity
if 'CI' not in os.environ:
    display(compute_integral(k=3,))


In [7]:
# disabled in CI due to compute intensity
if 'CI' not in os.environ:
    display(compute_integral(k=4,))


In [8]:
# disabled in CI due to compute intensity
if 'CI' not in os.environ:
    display(compute_integral(k=5,))


In [9]:
# disabled in CI due to compute intensity
if 'CI' not in os.environ:
    display(compute_integral(k=6,))


The integral appears to be related to infinite products for $\pi/2$, $e$ and $e^\gamma$,

$$
\prod_{i=1}^{k} i^{(-1)^{i+1} \times {k-1 \choose i-1}}
$$

See <https://oeis.org/A122214> and <http://oeis.org/A122215>.

So, we have

$\begin{align*}
E(\hat{n}_\mathrm{mue})
&= k n \log \Big( - \frac{1}{2}^{1/k} + 1 \Big) \log\Big( \prod_{i=1}^{k} i^{(-1)^{i+1} \times {k-1 \choose i-1}} \Big)\\
&= k n \log \Big( 1 - \frac{1}{2}^{1/k} + \prod_{i=1}^{k} i^{(-1)^{i+1} \times {k-1 \choose i-1}} \Big).
\end{align*}$


# Expected Value as $k$ Increases


In [10]:
def calculate_coefficient(*, k: int,) -> float:
    prod_term = math.prod(
        mpmath.mpf(i) ** (
            mpmath.mpf(-1)**(i+1)
            * mpmath.mpf(math.comb(k-1, i-1))
        )
        for i in range(1, k+1)
    )
    power_term = - mpmath.mpf(0.5)**mpmath.mpf(1/k) + 1
    return k * mpmath.log(prod_term) * mpmath.log(power_term)


In [11]:
for k in range(1,62,5,):
    print(k, calculate_coefficient(k=k,))


1 0.0
6 1.15816185364435
11 1.08774719366714
16 1.06288548654699
21 1.04979052268657
26 1.04156228852897
31 1.03584557960134
36 1.03160698617298
41 1.02831796253132
46 1.02567845711025
51 1.02350469889704
56 1.02167743222656
61 -11233.5139442267


The coefficient multiplied by $n$ to yield $E(\hat{n}_\mathrm{mue})$ appears to converge close to 1, although it becomes unstable past 60 potentially due to numerical errors.


# Result

We have shown the expected value for the mildest extrema estimator as

$\begin{align*}
E(\hat{n}_\mathrm{mue})
&= k n \log \Big( 1 - \frac{1}{2}^{1/k} + \prod_{i=1}^{k} i^{(-1)^{i+1} \times {k-1 \choose i-1}} \Big).
\end{align*}$

The bias of this estimator appears to approach 0 as $k$ increases (although this is made uncertain by numerical issues)
