## $$Statistics \ Cheat \ Sheet \ - \ Estimation$$

**Mean Square Error**, $MSE = E[(\hat \theta - \theta)^2]$

**Bias of $\hat \theta$**, $Bias(\hat \theta) = E(\hat \theta) - \theta$

* When $X$ is a binomial rv with parameters $n$ and $p$, the sample proportion $\hat p =  X/n$ is an unbiased estimator of $p$.
* For a uniformly distributed rv, the mid point of the interval of positive density is the expected value, i.e. $E(X_i) = \theta /2 \implies E(\overline X) = \theta/2 \implies E(2 \overline X) = \theta$. Thus $E(2 \overline X)$ is unbiased for $\theta$.

**Probility Limit** $plim_{n \to \infty}\hat \theta = \theta$, i.e. $P(|\hat \theta - \theta| \gt \epsilon) \to 0 \ as \ n \to \infty$

**MVUE**
* Normal Distribution: $\hat \mu = \overline X$

**Sampling Distribution**
$$  \text{Sampling Distribution of $f(X_1,\ldots,X_n)$ = Characteristics of $X$ + Properties of $f(\cdot)$ + Sampling Method for $\{X_i\}_{i=1}^n$  } $$

**Guidelines**
* If $E(\hat{\theta}_1 - \theta) = 0 $ and  $E(\hat{\theta}_2 - \theta) > 0 $ then the unbiased estimator $\theta_1$ is preferred to $\theta_2$ 

* If $E(\hat{\theta}_1 - \theta) = E(\hat{\theta}_2 - \theta) = 0 $ then the estimator with the smallest variance is preferred. 

#### Method of Moments
**kth population moment** or **kth moment of distribution** $f(x) = E(X^k)$

**kth sample moment** is $(1/n)\sum_{i=1}^nX_i^k$
$$ \hat{E}(X^k)= n^{-1}\sum_{i=1}^nX_i^k $$ 
$$ \begin{aligned}  \rho(X,Y) &= \frac{E\left[(X-E(X))(Y-E(Y))\right]}{ \sqrt{V(X)} \sqrt{V(Y)} } \\ 
\hat{\rho}(X,Y) &= \frac{n^{-1}\sum_{i=1}^n(X_i-\overline{X}_n)(Y_i - \overline{Y}_n)}{[(n-1)^{-1}\sum_{i=1}^n(X_i -\overline{X}_n)^2]^{1/2}[(n-1)^{-1}\sum_{i=1}^n(Y_i -\overline{Y}_n)^2]^{1/2}} \end{aligned} $$

* Moment estimator of $\lambda$ for an exponential distribution, $\hat \lambda = 1/\overline X$
* Estimators of parameters $\alpha$ and $\beta$ for a gamma distribution
$$\hat \alpha = \frac{\overline X^2}{(1/n)\sum X_i^2 - \overline X^2}, \hat \beta = \frac{(1/n)\sum X_i^2 - \overline X^2}{\overline X}$$
* Estimators of parameters $p$ and $r$ for a negative binomial distribution
$$\hat p = \frac{\overline X}{(1/n)\sum X_i^2 - \overline X^2}, \hat r = \frac{\overline X}{(1/n)\sum X_i^2 - \overline X^2 - \overline X}$$

#### Maximum Likelihood Estimator
If $l(\theta) = f(x_1,...,x_n;\theta)$ is the likelihood function, then maximum likelihood estimator is the maximum value of the function
$$\hat \theta _{ML} = argmax[l(\theta)] = argmax[ln\{l(\theta)\}]$$ 
For finding the MLE of p(x), solve for
$$\frac{d}{dx}ln[p(x)] = 0$$

MLE estimation makes much stronger assumptions in order to estimate $\theta$ the MOM because you have to "know" the distribution of $X_i$ up to the parameter $\theta$ in order to have a good estimate i.e. you have to know $f(\cdot | \theta)$ in order to maximize it.


* Estimator of $\lambda$ for exponential distribution, $\hat \lambda = 1/\overline X$. <BR>
    This is not unbiased as $E(1/\overline X) \neq 1/E(\overline X)$
* Estimators of $\mu$ and $\sigma$ for a normal distribution
    $$\hat \mu = \overline X, \ \hat \sigma^2 = \frac{\sum (X_i - \overline X)^2}{n}$$
* Estimator of $\lambda$ for a two dimensional region $R$ with area $a(R)$ with a Poisson distribution with parameter $\lambda a(R)$, where $\lambda$ is the expected number of events per unit area
$$ \hat \lambda = \frac{\sum X_i}{\sum a(R_i)}$$

In situations where calculus cannot be applied, the following can be tried:
* Estimator for a uniform distribution $U[a,b]$, $\hat b = max(X_i)$ and $\hat a = min(X_i)$ - This is obtained by maximizing the pdf $(1/(b-a))^n$
* Estimator for a hypergeometric distribution, $\hat N = [Mn/x]$, the largest integer less than or equal to the calculated value. Here the estimator is derived by taking the ratio of the N to the N-1 value.

#### Confidence Intervals
* For a normal distribution with mean $\mu$ and standard deviation $\sigma$, the **95% confidence interval (CI) for** $\mu$ is given by
$$\overline x - 1.96 \cdot \frac{\sigma}{\sqrt n} \lt \mu \lt \overline x + 1.96 \cdot \frac{\sigma}{\sqrt n}$$
* A **100(1 - $\alpha$)% CI** for the $\mu$ of a normal distribution with standard deviation $\sigma$ is
$$\overline x - z_{\alpha/2} \cdot \frac{\sigma}{\sqrt n} \lt \mu \lt \overline x + z_{\alpha/2} \cdot \frac{\sigma}{\sqrt n}$$
where $z_{\alpha/2}$ is the right critical value $P(Z \gt \alpha/2) = \alpha/2$ and the half-width $z_{\alpha/2} \cdot \frac{\sigma}{\sqrt n}$ is also called the **bound on the error of estimation, B** <BR>
This can also be expressed in words as <BR>
    point estimate of $\mu \pm (z$ critical value) (standard error of the mean)
* A **Large Sample (n $\gt$ 40) CI for $\mu$** with CI level approximately $100(1 - \alpha)$% is given by 
    $$\overline x \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt n}$$
    
**Sample Size**
$$n=\biggl(2z_{\alpha/2} \cdot \frac{\sigma}{w}\biggl)^2$$
where $w$ is the desired width and 
$$w = 2z_{\alpha/2} \sigma/\sqrt n$$

* To estimate $\mu$ within an amount $B$ with $100(1 - \alpha)$% confidence, sample size needed is
$$n=\biggl(z_{\alpha/2} \cdot \frac{\sigma}{B}\biggl)^2$$
* The **100(1 - $\alpha$)% CI** can be rewritten as
$$- z_{\alpha/2} \lt \frac{\overline x - \mu}{\frac{\sigma}{\sqrt n}} \lt z_{\alpha/2}$$
If $h(X_1,...,X_n;\mu) = (\overline X - \mu)/(\sigma/\sqrt n)$, using parameter $\theta = \mu$, for any $\alpha$ between 0 and 1, constancts $a$ and $b$ satisfies
$$P(a \lt h(X_1,...,X_n;\theta) \lt b) = 1 - \alpha$$
Isolating $\theta$, the probability statement can be written as
$$P(l(X_1,...,X_n) \lt \theta \lt u(X_1,...,X_n)) = 1 - \alpha$$
where $l(x_1,...,x_n)$ and $u(x_1,...,x_n)$ are the lower and the upper confidence limits.

* The probability statement for **95% CI** for $\lambda$ of an **exponential distribution** is given by
$$P(l(X_1,...,X_n) \lt 2 \lambda \sum X_i \lt u(X_1,...,X_n)) = .95$$
where RV $h(X_1,...,X_n;\lambda) = 2 \lambda \sum X_i$ has a **chi-squared** distribution. Here the upper and lowe limtis can be found using the chi-squared distribution table.
* The probability statement for **95% CI** for $\mu$ of an **exponential distribution** is given by
$$P(\frac{2 \sum X_i}{u(X_1,...,X_n)} \lt \frac{1}{\lambda} \lt \frac{2 \sum X_i}{l(X_1,...,X_n)}) = .95$$

#### T Distribution RV
If the distribution is not known, we can still use the Central Limit theorem for large values of n to assume a normal distribution. Then the **100(1 - $\alpha$)% CI** can be written as
$$P\biggl(- z_{\alpha/2} \lt \frac{\overline x - \mu}{\frac{\sigma}{\sqrt n}} \lt z_{\alpha/2}\biggl) \approx 1 - \alpha$$

When the standard deviation is not known, we can express the t distributed standardized variable as
$$T = \frac{\overline X - \mu}{S/\sqrt n}$$
has $n - 1$ degrees of freedom $(df)$ and 
$$P(-t_{\alpha/2, n - 1} \lt T \lt t_{\alpha/2, n - 1}) = 1 - \alpha$$
The **100(1 - $\alpha$)% CI for $\mu$** is
$$\overline x \pm t_{\alpha/2, n - 1} \cdot s/\sqrt n$$
An **upper confidence bound** is $\overline x + t_{\alpha, n - 1} \cdot s/\sqrt n$ and **lower confidence bound** is $\overline x - t_{\alpha, n - 1} \cdot s/\sqrt n$

++ Use R function **qt** to find t critical values

#### Tolerance Interval
A tolerance interval for capturing at least $k%$ of the values in a normal population distribution with a confidence level 95% has the form
$$\overline x \pm \text{(tolerance critical value)} \cdot s$$

#### Fisher's Transform
$$r' = (0.5)ln|(1 + r) / (1 - r)|$$
where $r$ is the Pearson correlation

**Standard Error**
$$SE = \frac{1}{\sqrt{n - 3}}$$

**95% CI**
$$r' \pm 1.96 \times SE$$