# Point Estimation

## Overview

In this section we will go over point estimation. This is in contrast to the interval estimation we will encounter later. We have already seen and used point estimates. Indeed the sample mean or the sample standard deviation are point estimates of the associated population parameters. We will also look into some desired properties 
of point estimates. Namely bias and consistency. In general, we would like our estimator to be unbiased. On the other hand,
consistency means that as the sample size increases then the estimator converges to the true parameter. 

##  Point estimation

Point estimation is all about providing a single _best guess_ of some quantity of interest [1]. This quantity
of interest could be a population parameter, a parameter in a parametric model, a PDF or a prediction for a future
value of some random variable [1].

We will denote the unknown parameter with $\theta$ and its point estimate with $\hat{\theta}$.
The former is considered as fixed but unknown. The latter depends of the data available and therefore
it is a random variable. 

Thus, $\hat{\theta}$ is a function of the available data i.e.

$$\hat{\theta}=g(X_1,\dots,X_n)$$

Below we summarise some of the desired properties that point estimators should exhibit, see also [1, 

### Bias

The bias of an estimator $\hat{\theta}$ is defined as [1]

----
**Definition.**



$$Bias(\hat{\theta}) = E\left[\hat{\theta}\right] - \theta$$


----

Thus for an unbiased estimator, we have that

$$E\left[\hat{\theta}\right] - \theta =0 ~~ \text{that is} ~~ E\left[\hat{\theta}\right] = \theta$$

----
**Remark.**


The $MSE$ is can be used in order to assess the quality of a point estimate. The $MSE$ is given by

$$MSE=E_{\theta}(\hat{\theta}-\theta)^2$$

which can be written as [1].

$$MSE=Bias^2(\hat{\theta}) + Var\left[\hat{\theta}\right]$$

----

### Consistency

A reasonable requirement for an estimator is that as we collect more and more data, it converges to 
the true parameter $\theta$. Consistency, quantifies this requirement [1].

----
**Definition.**

A point estimator of a parameter $\theta$ is _consistent_ if 

$$\hat{\theta} \rightarrow_{P} \theta$$

----

### Standard error and sampling distribution

By using different samples to estimate $\hat{\theta}$ we construct a distribution for it. This distribution
is called the sampling distribution. The standard deviation of $\hat{\theta}$ is called the standard error denoted by $se$

$$se = \sqrt{Var\left[\hat{\theta}\right]}$$

We also denote the estimated standard error with $\hat{se}$. The following theorem associates bias, standard error and consistency.

----
**Theorem.** 

If $Bias(\hat{\theta}) \rightarrow 0$ and $se \rightarrow 0$ as the sample size $n \rightarrow \infty$ then $\hat{\theta}$
is consistent.


For a proof of the theorem see [1].

----

Finally, let us intorduce the following definition [1].

----
**Definition.** 

An estimator is _asymptotically normal_ if


$$\frac{\hat{\theta} - \theta}{se} \sim N(0,1$$

----

## Summary

This section revised some of the properties that point estimators should satify. In particular, we saw that a point estimate is desired to be 
unbiased and consistent. The bias of an estimator is defined as the difference between the expected value of the point estimate
and the true parameter value. Hence, it can only be calculated if we know the latter. Consistency means that the estomator converges to the
the true parameter as the sample size increases. These two properties are linked via the standard error $se$.

## References

1. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.