## Comparing $\widehat \theta_{V}$ with $\widehat \theta_{S}$

In this notebook, we will

1. Generate 1000 samples of size $n=10$ from the $B(10,0.5)$
2. For each sample, calculate:
    - $\overline X$
    - $\widehat \theta_{S}$
    - $\widehat \theta_{V}$
3. Calculate:
    - $E(\overline X)$
    - $E(\widehat \theta_{S})$
    - $E(\widehat \theta_{V})$

After the conclulsion of the workflow, we should see that:

- $E(\overline X) \simeq \mu = 10(0.5) = 5$
- $E(\widehat \theta_{S}) \simeq \sigma^{2} = 10(0.5)(0.5) = 2.5$.
- $E(\widehat \theta_{V}) \simeq \frac{n-1}{n}\sigma^{2} = \frac{9}{10}10(0.5)(0.5) = 2.25$.

In [1]:
from scipy.stats import binom
import pandas as pd
import numpy as np
from util.pointestimation import (get_sample, get_samples,
                                  get_description, get_table1)

'''---------------------------------------------------------------------------
The distribution
---------------------------------------------------------------------------'''

a_dist = binom(n=10, p=0.5)

Let us first generate a random sample $X$ from $B(10,0.5)$ to illustrate the first step.

In [2]:
get_sample(a_dist, n=10)  # a sample of size 10 from B(10,0.5)

Unnamed: 0,X,obs
0,1,5
1,2,4
2,3,6
3,4,5
4,5,5
5,6,5
6,7,5
7,8,6
8,9,5
9,10,7


There is nothing particularly surprising here, so let us move on.

Let us now generate $N=1000$ samples of size $n=10$ from $B(10,0.5)$.
We will calculate $\overline X$, $\widehat \theta_{S}$ and $\widehat \theta_{V}$ for each sample.
The returned table will contain these three measures for each sample $X_{i}$.

In [5]:
samples: dict = get_samples(a_dist,
                            n=10,
                            N=1000)

samples.head()

Unnamed: 0,E(X),Theta(S),Theta(V)
0,4.1,0.766667,0.69
1,4.1,2.766667,2.49
2,4.5,1.388889,1.25
3,5.7,1.566667,1.41
4,5.0,2.0,1.8


We now 1000 estimates generated using our estimators.
So let us check the expected values of each of our estimators to see how close they are to $\mu=5$ and $\sigma^{2} = 2.5$.

As a sanity checl, we know that the sample mean is an unbiased estimator of the population mean.
Does $E(\overline X) \simeq 5$?

In [6]:
samples["E(X)"].mean()

5.0198

That is reassuring!
Next, does $E(\widehat \theta_{S}) \simeq 2.5$?

In [7]:
samples["Theta(S)"].mean()

2.525911111111111

Again, we can conclude that they are approximately equal.
An finally, does $E(\widehat \theta_{V}) \simeq 2.25$?

In [8]:
samples["Theta(V)"].mean()

2.27332

Yes, it does seem so.

Finally, we can display that this phenomenom is just an expression of the formulas we used to generate the estimates of $\sigma^{2}$

If, as we have seen,

$$
\begin{aligned}
  E(\widehat \theta_{V}) = \frac{n-1}{n}\sigma^{2},
\end{aligned}
$$

then

$$
\begin{aligned}
  \sigma^{2} =\frac{n}{n-1} E(\widehat \theta_{V}).
\end{aligned}
$$

But we also know that 

$$
E(\widehat \theta_{S}) = \sigma^{2},
$$

so it can be seen that

$$
E(\widehat \theta_{S}) = \frac{n}{n-1} E(\widehat \theta_{V}).
$$

Can we confirm this from our sample?

We know that $n=10, \sigma^{2} = 2.5$ so ...

In [11]:
samples["Theta(S)"].mean() == (10/9) * samples["Theta(V)"].mean()

True

**END**.