## Point Estimators 

- how to estimate population-level parameters by using a sample of observations drawn from the population

- if $Y_1,Y_2, \ldots, Y_n$ is a sample from a population distribution described by $p_Y(y)$ or $f_Y(y)$, then the observations $Y_1.Y_2, \ldots, Y_n$ contain valuable information about characteristics of the population distribution, for example, the population mean $\mu=E(Y)$, the population variance $\sigma^2=Var(Y),$ and so on.

- the estimation question is relevant because that parameters associated with a population distribution (or distributions) are **usually unknown.**

For example, suppose a physician observes a random sample of $n=20$ undergraduate students at a college in Texas and records


$Y $= the number of colds within the last 10 months on each student.


As a population-level model, he decides to use $Y=Poisson(\lambda)$, where $\lambda=E(Y)$, the mean of the population.

Now, there are over 20,000 undergraduate students at this college. Therefore, the only way the physician can determine the value of $\lambda$ is to observe all more-than 20,000  students. 

Because it is generally not possible to
**sample the entire population** in real life evaluations (especially in larger populations which may number in the millions or billions), we turn to the problem of **parameter estimation**


### Parameter Estimation

Suppose $Y_1,Y_2, \ldots, Y_n$ is an iid sample from a population distribution described by $p_Y(y)$ or $f_Y(y)$, and let $\theta \in \mathbb{R}$ denote a population-level parameter that is unknown.  

- In the previous example, we could write $Y\sim Poisson(\theta)$, where $\theta=E(Y).$


- Assumption: the population level parameter $\theta$ is unknown but also fixed. 



A point estimator of $\theta$ is any statistic; i.e., $\hat{\theta}=T(Y_1,Y_2, \ldots, Y_n)$, that estimates $\theta$. 

Because a point estimator $\hat{\theta}$ is a statistic, it is random and has its own (sampling) distribution. 

In the Texas college example, a point estimator of $\theta$ based on the $n=20$ students is $\bar{Y}=\frac{1}{20}\sum_{i=1}^{20} Y_i,$ the sample mean.

### Bias

Suppose $\hat{\theta}=T(Y_1,Y_2, \ldots, Y_n)$ is a point estimator for the population-level parameter $\theta$	


- we call $\hat{\theta}$ and unbiased estimator of $\theta$ if $E(\hat{\theta})=\theta$

	
- if $E(\hat{\theta})\neq \theta$ we say that $\hat{\theta}$ is biased

	
- the **bias** of a point estimator $\hat{\theta}$ is 

    
\begin{equation*}
B(\hat{\theta})	= E(\hat{\theta})-\theta 
\end{equation*}

- an **unbiased estimator is perfectly accurate**

 Let $Y_1, Y_2, \ldots, Y_n$ be a random sample with $E(Y_i)=\mu$ and $Var(Y_i)=\sigma^2$. It can be shown that
    
 \begin{equation*}
 {S'}^2=\frac{1}{n}\sum_{i=1}^n {(Y_i -\bar{y})}^2
 \end{equation*}
  
is a biased estimator of $\sigma^2$ and that  
  
 \begin{equation*}
 {S}^2=\frac{1}{n-1}\sum_{i=1}^n {(Y_i -\bar{y})}^2
 \end{equation*} 
 
is an unbiased estimator of $\sigma^2$
  
Let's check this experimentally...

In [1]:
import numpy as np

In [2]:
# parameters for Binomial Trials
p =.7
n_trials = 100
real_var = n_trials * p * (1-p)

In [3]:
# Simulates n samples with n_trails of a Binomial Distribution of probability p
def simulate(n_trials = 100, p=.7, n=10):
    return np.random.binomial(n_trials, p,n)

In [4]:
simulate()

array([75, 79, 71, 67, 72, 68, 71, 75, 74, 70])

In [5]:
num_samples = 10000
sum_biased = 0
biased_overestimates = 0 # numbers of times it overestimates
sum_unbiased = 0
unbiased_overestimates = 0 # numbers of times it overestimaates
for k in range(num_samples):
    sample = simulate(n=60)
    sample_var_biased = sample.var()
    sample_var_unbiased = sample.var(ddof=1)
    sum_biased += sample_var_biased
    if sample_var_biased > real_var:
        biased_overestimates += 1
    sum_unbiased += sample_var_unbiased
    if sample_var_unbiased > real_var:
        unbiased_overestimates += 1

print("For biased estimator:")
print("Average of the estimator:", sum_biased/num_samples)
print("Percentage of times it overestimates:", biased_overestimates/num_samples*100,"%")
print()
print("For unbiased estimator:")
print("Average of the estimator:", sum_unbiased/num_samples)
print("Percentage of times it overestimates:", unbiased_overestimates/num_samples*100,"%")
print()
print("real variance:", real_var)

For biased estimator:
Average of the estimator: 20.6581243055556
Percentage of times it overestimates: 44.06 %

For unbiased estimator:
Average of the estimator: 21.008262005649765
Percentage of times it overestimates: 47.22 %

real variance: 21.000000000000004


In [None]:
help(np.var)

Good for Python basics:
<img src="Book_cover_DS_scratch.png" alt="Drawing" style="width: 200px;"/>

Github: 
https://github.com/joelgrus/data-science-from-scratch   