## Parameter Estimation 

Imagine that we have a population and we are interested in some characteristics of that population or we believe that our population follows a specific distribution and want to estimate its parameters. In a perfect world, we may want to perform a census and calculate the characteristic (or parameter) of interest. However, in the real world, this is not feasible and sometimes impossible to find this parameter of interest. As a result, we obtain a subset of this population (a sample), and under certain sampling conditions, we can obtain estimates these characteristics.


### Parameters, Statistics, Estimators


A **parameter** is some constant (usually unknown) that is a characteristic of the population. A **statistic** is a random variable that is a function of the observed data. It is important to note that statistics are not a function of the parameter of interest. An **estimator** is a statistic related to some quantity of the population characteristic.

In order to fit a probability law to data, we have to estimate parameters associated with the probability law from the data. For instance, the normal/Gaussian distribution involves two parameters, $\mu$ (mean) and $\sigma$ (standard deviation), so if we believed that our data followed a normal distribution and either or both are unknown, we would need to provide some estimator for $\mu$ and $\sigma$.


### Bias, Variance, and Mean Squared Error


Now, if we have an estimate, how do we know if this estimate is any good? 

For instance, we may ask ourselves: on average, how far is our estimate from the actual value? One may also ask: If I run this experiment many times, how will each calculated estimate change?

The **bias** of an estimate refers to the systematic deviation of the estimate, $\hat \theta$ from its actual value $\theta$ where $\text{bias}({\hat \theta})= E[{\hat \theta}] - \theta$. If $\text{bias}({\hat \theta})=0$, then we say that $\hat \theta$ is unbiased.

The **variance** of our estimate, $Var({\hat \theta}) = E[({\hat \theta} - E[{\hat \theta}])^{2}]$, tells us the spread of our estimate.

One measure of the size of the measurement error of our estimate is the **mean squared error**, $\text{MSE } = E[({\hat \theta} - \theta)^{2}]$

We can decompose the mean squared error into bias and variance with a math trick:

$$E[({\hat \theta} - \theta)^{2}] = E[({\hat \theta} - E[{\hat \theta}] + E[{\hat \theta}] - \theta)^{2}] = E[({\hat \theta} - E[{\hat \theta}] )^{2} - 2(({\hat \theta} - E[{\hat \theta}] ) (E[{\hat \theta}] - \theta) + (E[{\hat \theta}] - \theta)^{2} ]$$

$$=Var({\hat \theta}) + \text{bias}({\hat \theta})^{2} -2E[({\hat \theta} - E[{\hat \theta}] ) (E[{\hat \theta}] - \theta)] = Var({\hat \theta}) + \text{bias}({\hat \theta})^{2}$$

Notice that $E[({\hat \theta} - E[{\hat \theta}] ) (E[{\hat \theta}] - \theta)] = (E[{\hat \theta}] - \theta) E({\hat \theta} - E[{\hat \theta}] ) = (E[{\hat \theta}] - \theta) (E{\hat \theta} - E[{\hat \theta}] )=0$


### Example 1 Coin Flips (With replacement)

Suppose that we have a coin that flips head with probability $p$, a fixed unknown value and we want to more information about what $p$ is.


We can conduct an experiment and toss the coin $n$ times and get $X$ heads. $X$ is a discrete random variable that takes on values between $0$ and $n$. We can give an estimate, call it $\hat p$, for $p$. A reasonable estimate for $p$ would be the proportion of heads that come up in the $n$ tosses.

Let $X_{i}$ be an indicator random variable:

$$X_{i} = \begin{cases}
1 & \text{if the i-th toss is heads}\\
0 & \text{otherwise}
\end{cases}$$

An important fact from probability is that $X_i$ is a Bernoulli random variable with probability $p$ (since it takes on the value $1$ when the i-th toss is heads).

Then we can define $\hat p$ as follows:

$${\hat p} = \frac{X_{1}+ ... + X_{n}}{n} = {\bar X}$$

Notice that the sum of independent $n$ Bernoulli random variables with the same probability $p$ follows a Binomial random variable with parameters, $n$ and $p$. In other words, $X_{1} + ... + X_{n} \sim Bin(n,p)$. Since each $X_{i}$ is a Bernoulli random variable, we know that $Var(X_{1}) = ... = Var(X_{n}) = Var(X) = p(1-p)$.

Then, we can easily find its expectation and standard error:

$$E[{\hat p}] = E[\bar{ X}] = \frac{1}{n} E[X_{1}+ ... + X_{n}] = \frac{1}{n} np = p$$

$$Var({\hat p}) = Var(\bar X) = Var\left( \frac{X_{1}+ ... + X_{n}}{n} \right) = \frac{1}{n^{2}} Var(X_{1}+ ... + X_{n}) = \frac{nVar(X)}{n^{2}} =\frac{n(p)(1-p)}{n^{2}} = \frac{p(1-p)}{n}$$

Then the standard error of $\hat p$ is:

$$SE(\hat p) = \sqrt{\frac{p(1-p)}{n}}$$

Since we do not know the true value $p$, we substitute $p$ for $\hat p$ and write:

$${\widehat {SE}}(\hat p) = \sqrt{\frac{{\hat p}(1-{\hat p})}{n}}$$


The ${\widehat {SE}}$ is written to emphasize that this is an estimate of the standard error.


---

To make this a bit more concrete, let's say that we tossed the coin $1000$ times and found that we had $467$ heads.

Then, our ${\hat p} = {\bar X} = .467$ and the corresponding estimated SE is ${\widehat {SE}}(\hat p) = \sqrt{\frac{{\hat p}(1-{\hat p})}{n}} = \sqrt{\frac{{.467}(1-{.467})}{1000}} \approx 0.000249$


---

What is the MSE of this estimate ${\hat p}$?

Notice that $\text{bias}({\hat p}) = E({\hat p}) - p = 0$ and $Var({\hat p}) = \frac{p(1-p)}{n}$. 

This means that $\text{MSE}({\hat p}) = \frac{p(1-p)}{n}$. As we take $n \rightarrow \infty$, then $\text{MSE}({\hat p}) \rightarrow 0$.

---

### Example 2 Urns (Without replacement)

Example 1 assumes independence between tosses, but what if each of the $X_{i}$'s are dependent? Consider an urn with $n$ balls, either white or red. Let $q$ be the proportion of white balls. 

Let $Y_{i}$ be an indicator random variable:

$$Y_{i} = \begin{cases}
1 & \text{if the i-th ball is white}\\
0 & \text{otherwise}
\end{cases}$$

If we want to estimate $q$ with $\hat q = \frac{Y_{1} + ... Y_{n}}{n}$, then what is $E(q)$ and $Var(q)$?

Remember that linearity of expectation does not require independence, so we have that

$$E[{\hat q}] =E\left[ \frac{Y_{1} + ... Y_{n}}{n} \right]= q$$

The $Var({\hat q}) = Var({\bar Y})$ is trickier because $Cov(Y_{i}, Y_{j}) \neq 0$ for $i \neq j$.

Notice that although each pull is not independent, $E[Y_{i}] = E[Y_{j}] = E[Y]$

Using the definition of expectation and multiplication rule:

$$E[Y_{i} Y_{j}] = \sum_{s=1}^{n} \sum_{t=1}^{n} a_{s} a_{t} P(Y_{i} = a_{s} \cap Y_{j} = a_{t})= \sum_{s=1}^{n} \sum_{t=1}^{n} a_{s} a_{t} P(Y_{i} = a_{s} | Y_{j} = a_{t}) P(Y_{j} = a_{t})$$

$$=\sum_{s=1}^{n}  a_{s} P(Y_{j} = a_{s}) \sum_{t=1}^{n} a_{t} P(Y_{i} = a_{t} | Y_{j} = a_{s})$$


$$P(Y_{i} = a_{t}  | Y_{j} = a_{s}) = \begin{cases} 
\frac{n_{t}}  { N-1} &  \text{if } s \neq t \\  
\frac{n_{t}-1} {N-1} &  \text{if } s = t \end {cases}$$

which means that we can break up the summation as follows:

$$\sum_{t=1}^{n} a_{t} P(Y_{i} = a_{t} | Y_{j} = a_{s})=\sum_{s=1}^{n}  a_{s} P(Y_{j} = a_{t}) =\sum_{t \neq s} a_{t} \frac{n_{t}}  { N-1} + a_{s}\frac{n_{s}-1} {N-1} =\sum_{t \neq s} a_{t} \frac{n_{t}}  { N-1} + a_{s}\frac{n_{s}} {N-1} -a_{s}\frac{1}{N-1} = \sum_{t=1}^{n}  a_{t} \frac{n_{t}}{N-1}-a_{s}\frac{1}{N-1}$$

Back to the main problem:

$$E[Y_{i} Y_{j}] =\sum_{s=1}^{n}  a_{s} P(Y_{j} = a_{s}) \sum_{t=1}^{n} a_{t} P(Y_{i} = a_{t} | Y_{j} = a_{s})=\sum_{s=1}^{n}  a_{s} P(Y_{j} = a_{s})  \left[\sum_{t=1}^{n}  a_{t} \frac{n_{t}}{N-1}-a_{s}\frac{1}{N-1} \right]$$

$$= \sum_{s=1}^{n}  a_{s} \frac{n_{s}}{N} \left[\sum_{t=1}^{n}  a_{t} \frac{n_{t}}{N-1}-a_{s}\frac{1}{N-1} \right]$$

$$=\frac{1}{N(N-1)} \left[\sum_{s=1}^{n} \sum_{t=1}^{n} \left(a_{s} n_{s} a_{t} n_{t} - a_{s}^{2} n_{s} \right) \right] = \frac{1}{N(N-1)} \left(NE[Y] NE[Y] - NE[Y^{2}] \right)= \frac{1}{N(N-1)} \left(N^2  (E[Y])^{2} - N(Var(Y) + [E[Y]]^2)\right)$$

$$=\frac{N(N-1)(E[Y])^{2}}{N-1} - \frac{Var(Y)}{N-1} = (E[Y])^{2} - \frac{Var(Y)}{N-1}$$

Then, we know that

$$Cov(Y_{i} , Y_{j} ) =E[Y_{i} Y_{j}] -E[Y_{i}]E[ Y_{j}] = (E[Y])^{2} - \frac{Var(Y)}{N-1}-E[Y]^{2} = - \frac{Var(Y)}{N-1}$$

Then, to find $Var({\hat q}) = Var\left(\frac{\sum_{i}^{n} Y_{i}}{n} \right)$

$$Var\left(\frac{\sum_{i}^{n} Y_{i}}{n} \right) = \frac{1}{n^{2}} Var(Y_{1} + ... +Y_{n})=\sum_{i=j}Var(Y_{i}) +\sum_{i\neq j } Cov(Y_{i}, Y_{j})  = \frac{1}{n^{2}} \left[n Var(Y_{i}) + n(n-1)Cov(Y_{i}, Y_{j}) \right]$$

$$=\frac{n Var(Y)}{n^{2}} -\frac{n (n-1)Var(Y)}{n^{2} (N-1)} = Var(Y)\left[\frac{1}{n} - \frac{n-1}{n(N-1)} \right]=\frac{ Var(Y)}{n}\left[\frac{(N-1)}{(N-1)} - \frac{n-1}{(N-1)} \right] = \frac{ Var(Y)}{n}\left[1 - \frac{n-1}{N-1} \right] = \frac{q(1-q)}{n}\left[1 - \frac{n-1}{N-1} \right]$$

---

What is the MSE of ${\hat q}$?

Again, we find that ${\hat q}$ is an unbiased estimate of $q$ since $E({\hat q}) - q = 0$.
Furthermore, from the above calculation $Var({\hat q}) = \frac{q(1-q)}{n}\left[1 - \frac{n-1}{N-1} \right]$.

Then $\text{MSE}({\hat q}) = \frac{q(1-q)}{n}\left[1 - \frac{n-1}{N-1} \right]$.

---

Note:

Compared to $\hat p$ and $\hat q$, we see that if $p=q$, their variances differ by a factor of $\left[1 - \frac{n-1}{N-1} \right]$. The special name for this term is **finite population correction factor**.

### Example 3:  Estimating $\mu$ of a Normal Distribution

Imagine that we have some population that follows the normal distribution with mean $\mu$ and variance $\sigma^{2}$, and we want to some information about $\mu$. We take a random sample of size $n$ from this population. Also, assume that $X_{1},..., X_{n} \overset{iid}{\sim} N(\mu, \sigma^{2})$.

An sample estimate of $\mu$ would be ${\hat \mu}={\bar X} = \frac{X_{1} + ... + X_{n}}{n}$

Let's observe its expectation and variance:

$$E[{\bar X}] = E[ \frac{X_{1} + ... + X_{n}}{n}] = \frac{E[X_{1} + ... + E[X_{n}]}{n} = \frac{n\mu}{n} = \mu$$


Using the fact that the $X_{i}$ are iid:

$$Var({\bar X}) = Var\left( \frac{X_{1} + ... + X_{n}}{n} \right) = \frac{Var(X_{1} )+ ... + Var(X_{n})}{n^{2}} =\frac{n \sigma^{2}}{n^{2}}=\frac{\sigma^{2}}{n}$$

The MSE is then $\frac{\sigma^{2}}{n}$.

### Example 4: Estimating $\sigma$ of a Normal Distribution

Now, let's say we wanted to find out some more information about $\sigma$. One sample estimate would be ${\hat \sigma}^{2} = \frac{\sum_{i=1}^{n} (X_{i} - {\bar X})^{2}}{n-1}$

Let's find the expectation of ${\hat \sigma}^{2}$:

$$E({\hat \sigma}^{2}) = E\left[ \frac{\sum_{i=1}^{n} (X_{i} - {\bar X})^{2} }{n-1} \right]=\frac{1}{n-1}E\left[\sum_{i=1}^{n} (X_{i} - {\bar X})^{2} \right]= \frac{E(\sum ( X_{i}^{2} - 2X_{i} {\bar X} + {\bar X}^{2} ))}{n-1} = \frac{\sum_{i=1}^{n} E(X_{i}^{2}) - 2n E({\bar X}^{2}) +n E({\bar X}^{2})}{n-1}$$

$$=\frac{n(\mu^{2} + \sigma^{2}) - n (\mu^{2} + \sigma^{2}/n)}{n-1}=\frac{n\mu^{2} + n \sigma^{2} - n\mu^{2} - \sigma^{2}}{n-1} = \frac{(n-1)\sigma^{2}}{n-1} = \sigma^{2}$$

Rice gives the following theorem: The distribution of $\frac{(n-1){\hat \sigma}^{2}}{ \sigma^{2}}$ is the chi-square distribution with $n-1$ degrees of freedom.

Using this fact, we can get the variance of $\hat \sigma^{2}$:

$$Var\left(\frac{(n-1){\hat \sigma}^{2}}{ \sigma^{2}} \right) = \frac{(n-1)^{2}}{\sigma^{4}} Var({\hat \sigma}^{2}) = 2(n-1)$$

$$Var({\hat \sigma}^{2}) = \frac{2\sigma^{4}}{n-1}$$

### Extra Exercises


1. Another way we could have estimated $\sigma$ was by using the estimator, $\frac{\sum_{i=1}^{n} (X_{i} - {\bar X})^{2}}{n}$. Find $E\left[\frac{\sum_{i=1}^{n} (X_{i} - {\bar X})^{2}}{n} \right]$ and show that this estimator is biased. This gives some reasoning as to why we chose to use ${\hat \sigma}^{2} = \frac{\sum_{i=1}^{n} (X_{i} - {\bar X})^{2}}{n-1}$.


2. Use [Jensen's inequality](https://en.wikipedia.org/wiki/Jensen%27s_inequality) to show that $\sigma = \sqrt{\frac{\sum_{i=1}^{n} (X_{i} - {\bar X})^{2}}{n-1} }$ is a biased estimator for the standard deviation of the population. Hint [here.](https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation)