Statistics
Basic Properties
- $E(X) = ∑ x p(x)$
- $Var(X) = ∑ (x-μ)^2f(x)$
- X is around $E(X)$, give or take $SD(X)$
- $E(aX + bY) = aE(X) + bE(Y)$
- $Var(aX + bY) = a^2Var(X) + b^2Var(Y)$
- $Var(X) = E(X^2) - [E(X)]^2$
- $Cov(X_1, X_2) = E(X_1X_2) - E(X_1)E(X_2)$
- $P(A\intersect B) = P(A)P(B)$ if A and B independent
- RV is centered when $E(X)=0$, and any RV can be centered via $Y = X - E(X)$, with SD and variance unaffected
- In $X = μ + ε$, $μ$ is the unknown constant of interest, and $ε$ represents random measurement error.
- if $X$, $Y$ are independent:
- $MX+Y(t) = M_X(t)M_Y(t)$
- $E(XY)=E(X)E(Y)$, converse is true if $X$ and $Y$ are bivariate normal, extends to multivariate normal
Approximations
Law of Large Numbers
Let $X_1, X_2, …, X_n$ be IID, with expectation $μ$ and variance $σ^2$. $\overline{X_n} = \frac{1}{n}∑ni=1X_i\xrightarrow[n]{∞}μ$. Let $x_1, x_2, …, x_n$ be realisations of the random variable $X_1, X_2, …, X_n$, then $\overline{x_n} = \frac{1}{n}∑ni=1x_n \xrightarrow[n]{∞} μ$
Central Limit Theorem
Let $S_n = ∑ni=1X_i$ where $X_1, X_2, …, X_n$ IID. $\frac{S_n - nμ}{\sqrt{n}σ} \xrightarrow[n]{∞} \mathcal{N}(0,1)$
Distributions
Poisson($λ$)
$E(X) = Var(X) = λ$
Normal $X ∼ \mathcal{N}(μ, σ^2)$
$f(x) = \frac{1}{\sqrt{2π}σ} exp \left(-\frac{(x-μ)^2}{2σ^2}\right), -∞<x<∞$
- When $μ = 0$, $f(x)$ is an even function, and $E(X^k) = 0$ where $k$ is odd
- $Y = \frac{X-E(X)}{SD(X)}$ is the standard normal
Gamma $Γ$
$g(t) = \frac{λ^α}{Γ(α)}tα-1e^{-λ t}, t ≥ 0$
$μ_1 = \frac{α}{λ}, μ_2 = \frac{α(α+1)}{λ^2}$
$χ^2$ Distribution
Let $\mathcal{Z} ∼ \mathcal{N}(0,1)$, $\mathcal{U} = \mathcal{Z}^2$ has a $χ^2$ distribution with 1 d.f.
$f\mathcal{U}(u) = \frac{1}{\sqrt{2π}} u-\frac{1{2}} e-\frac{u{2}}, u ≥ 0$
$χ_1^2 ∼ Γ(α=\frac{1}{2}, λ=\frac{1}{2})$
Let $U_1, U_2, …, U_n$ be $χ_1^2$ IID, then $V=∑ni=1U_i$ is $χ_n^2$ with n degree freedom, $V ∼ Γ(α=\frac{n}{2}, λ=\frac{1}{2})$
$E(χ_n^2) = n, Var(χ_n^2) = 2n$
$M(t) = \left(1 - 2t\right)-\frac{n{2}}$
t-distribution
Let $\mathcal{Z} ∼ \mathcal{N}(0,1)$, $\mathcal{U}_n ∼ χ_n^2$ be independent, $t_n = \frac{\mathcal{Z}}{\sqrt{U_n / n}}$ has a t-distribution with n d.f.
$f(t) = \frac{Γ([(n+1)/2])}{\sqrt{n}π\Gamma(n/2)}\left(1 + \frac{t^2}{n} \right)-\frac{n+1{2}}$
- t is symmetric about 0
- $t_n \xrightarrow[n]{∞} \mathcal{Z}$
F-distribution
Let $U ∼ χ_m^2, V ∼ χ_n^2$ be independent, $W = \frac{U/m}{V/n}$ has an F distribution with (m,n) d.f.
If $X ∼ t_n$, $X^2 = \frac{\mathcal{Z}/1}{U_n/n}$ is an F distribution with (1,n) d.f, with $w ≥ 0$:
For $n > 2$, $E(W) = \frac{n}{n-2}$
Sampling
Let $X_1, X_2, …, X_n$ be IID $\mathcal{N}(μ, σ^2)$.
$\text{sample mean, } \overline{X} = \frac{1}{n}∑ni=1X_i$
$\text{sample variance, } S^2 = \frac{1}{n-1}∑ni=1\left(X_i-\overline{X}\right)^2$
Properties of $\overline{X}$ and $S^2$
- $\overline{X}$ and $S^2$ are independent
- $\overline{X} ∼ \mathcal{N}(μ, \frac{σ^2}{n})$
- $\frac{(n-1)S^2}{σ^2} ∼ χn-1^2$
- $\frac{\overline{X} - μ}{S/\sqrt{n}} ∼ tn-1$
Survey Sampling
In population of size $N$, we are interested in a variable $x$. The ith individual has fixed value $x_i$.
$\text{mean of population} = μ = \frac{1}{N}∑Ni=1x_i$
$\text{total of population} = τ = ∑Ni=1x_i =μ N$
$\text{SD of population} = σ$
$σ^2 = ∑Ni=1\left(x_i-μ\right)^2 \frac{1}{N}∑ni=1x_i^2 - μ^2$
Dichotomous case
Population are members with value 0 or 1. Let $p$ be the proportion of members with value 1. $μ = p, σ^2 = p(1-p)$
Simple Random Sampling (SRS)
Assume $n$ random draws are made without replacement. (Not SRS, will be corrected for later).
Lemma A
The draws $X_i$ have the same distribution, and denote $ξ_1, ξ_2, … ξ_n$ as values assumed by the population, and let the number of members with value $ξ_j$ be $n_j$
$P(X_i =ξ_j) = \frac{n_j}{N}$
$E(X_i) = μ, Var(x_i) = σ^2$
Lemma B
For $i ≠ j$, $Cov(X_i, X_j) = - \frac{σ^2}{N-1}$
We use sample mean $\overline{X}$ to estimate $μ$:
$E(\overline{X}) = μ$ from Lemma A, and
$Var(\overline{X}) = \frac{σ^2}{n} \left(\frac{N-n}{N-1}\right)$ from Lemma B, where $\frac{N-n}{N-1}$ is the finite population correction factor.
In 0-1 population, let $\hat{p}$ be proportion of 1s in the sample:
$E(\hat{p}) = p, SD(\hat{p}) = \sqrt{\frac{p(1-p)}{n}{\frac{N-n}{N-1}}}$
Estimation Problem
Let $X_1, X_2, …, X_n$ be random draws with replacement. Then $\overline{X}$ is an estimator of $μ$. and the observed value of $\overline{X}$, $\overline{x}$ is an estimate of $μ$.
Standard Error (SE)
Since $E(\overline{X}) = μ$, the estimator is unbiased.
The error in a particular estimate $\overline{X}$ is unknown, but on average its size is about $SD(\overline{x}) = \frac{σ}{\sqrt{n}}$
Standard error of an $\overline{X}$ is defined to be $SD(\overline{X})$
An unbiased estimator for $σ^2$ is $s^2 = \frac{1}{n-1}∑ni=1(X_i - \overline{X})^2$
| param | est | SE | Est. SE |
| $μ$ | $\overline{X}$ | $\frac{σ}{\sqrt{n}}$ | $\frac{s}{\sqrt{n}}$ |
| $p$ | $\hat{p}$ | $\sqrt{\frac{p(1-p)}{n}}$ | $\sqrt{\frac{\hat{p}(1-\hat{p})}{n-1}}$ |
Without Replacement
SE is multiplied by $\frac{N-n}{N-1}$, because $s^2$ is biased for $σ^2$: $E(\frac{N-1}{N}s^2) = σ^2$, but N is normally large.
Confidence Interval
An approximate $1-α$ CI for $μ$ is
$(\overline{x} - zα/2\frac{s}{\sqrt{n}}, \overline{x} + zα/2\frac{s}{\sqrt{n}})$
Measurement Error
Let $x_1, x_2, …, x_n$ be independent measurements of unknown constant $μ$. $X_i = μ + ε_i$.
The errors are IID with expectation 0 , and variance $σ^2$. $x_i = μ + e_i$, where $x_i$ and $e_i$ are realisations of the RV. Then $\overline{x}$ is an estimate of $μ$, with SE $\frac{σ}{\sqrt{n}}$.
Biased Measurements
Let $X = μ + ε$, where $E(ε) = 0$, $Var(ε) = σ^2$
Suppose X is used to measure an unknown constant a, $a ≠ μ$. $X = a + (μ - a) + ε$, where $μ-a$ is the bias.
Mean square error (MSE) is $E((X-a)^2) = σ^2 + (μ - a)^2$
with n IID measurements, $\overline{x} = μ + \overline{ε}$
$E((x - a)^2) = \frac{σ^2}{n} + \left(μ - a\right)^2$
$\text{MSE} = \text{\text{SE}}^2 + \text{bias}^2$, hence $\sqrt{\text{MSE}}$ is a good measure of the accuracy of the estimate $\overline{x}$ of a.
Estimation of a Ratio
Consider a population of $N$ members, and two characteristics are recorded: $(X_1, Y_1), (X_2, Y_2), … , (X_n, Y_n)$, $r = \frac{μ_y}{μ_x}$.
An obvious estimator of r is $R = \frac{\overline{Y}}{\overline{X}}$
$Cov(\overline{X},\overline{Y}) = \frac{σxy}{n}$, where
$σxy := \frac{1}{N}∑Ni=1(x_i-μ_x)(x_i-μ_y)$ is the population covariance.
Properties
With SRS, the approx variance of $R = \overline{Y}/\overline{X}$ is \begin{align*} Var(R) &≈ \frac{1}{μ_x^2}\left(r^2σ\overline{X}^2 + σ\overline{Y}^2 - 2rσ\overline{X\overline{Y}}\right) \ &= \frac{1}{n}\frac{N-n}{N-1}\frac{1}{μ_x^2}\left(r^2σ\overline{X}^2 + σ\overline{Y}^2 - 2rσ\overline{X\overline{Y}}\right) \end{align*}
Population coefficient $ρ = \frac{σxy}{σxσy}$
$E(R) ≈ r + \frac{1}{n}\left(\frac{N-n}{N-1}\right)\frac{1}{μ_x^2}\left(rσ_x^2-ρ\sigma_xσ_y\right)$
$sxy = \frac{1}{n-1}∑ni=1\left(X_i - \overline{X}\right)\left(Y_i - \overline{Y}\right)$
Ratio Estimates
$\overline{Y}_R = \frac{μ_x}{\overline{X}}\overline{Y} = μ_xR$
$Var(\overline{Y}_R) ≈ \frac{1}{n}\frac{N-n}{N-1}(r^2σ_x^2 + σ_y^2 -2rρ\sigma_xσ_y)$
$E(\overline{Y}_R) - μ_y ≈ \frac{1}{n}\frac{N-n}{N-1}\frac{1}{μ_x}\left(rσ_x^2 -ρ\sigma_xσ_y\right)$
The bias is of order $\frac{1}{n}$, small compared to its standard error.
$\overline{Y}_R$ is better than $\overline{Y}$, having smaller variance, when $ρ > \frac{1}{2}\left(\frac{C_x}{C_y}\right)$, where $C_i = σ_i/μ_i$
Variance of $\overline{Y}_R$ can be estimated by
$s\overline{Y_R}^2 = \frac{1}{n}\frac{N-n}{N-1}\left(R^2s_x^2+s_y^2-2Rsxy\right)$
An approximate $1-α$ C.I. for $μ_y$ is $\overline{Y}_R ± zα/2s\overline{Y_R}$
Estimation
Let $X_1, X_2, …, X_n$ be IID random variables with density $f(x|θ)$, where $θ ∈ \mathcal{R}^P$ is an unknown constant. Realisations $x_1, x_2, …, x_n$ will be used to estimate $θ$, the estimate a realisation of RV $\hat{θ}$. The bias and SE are:
$\text{bias} = E(\hat{θ}) - θ, SE = SD(\hat{θ})$
Moments
Let $X_1, X_2, …, X_n$ be IID with the same distribution as $X$.
$\hat{μ}_k = \frac{1}{n}∑ni=1X_i^k$ is an estimator of $μ_k$, where $μ_k$ is the kth moment. An estimate is also denoted $\hat{μ}_k$.
Method of Moments
To estimate $θ$, express it as a function of moments $g(\hat{μ}_1,\hat{μ}_2,…)$
The bias and SE in an estimate, still depends on the unknown value of the constant. Suppose 1.67 and 0.38 are estimates of $λ$ and $α$. Data is generated from $Γ(1.67, 0.38)$, and the MOM estimators are written as $\widehat{1.67}$ and $\widehat{0.38}$. Because the sample size is large, $(\hat{λ} - λ, \hat{α}-α) ≈ (\widehat{1.67} - 1.67, \widehat{0.38} - 0.38)$
Monte Carlo is used to generate many realisations of $\widehat{1.67}$ via the $Γ(1.67,0.38)$ distribution. With 10,000 realisations,
$bias(1.67) = E1.67,0.38(\widehat{1.67} - 1.67) ≈ 0.09$
$SE(1.67) = SD1.67,0.38(\widehat{0.38}) ≈ 0.35$
and $λ$ is estimated as $1.58 ± 0.35$
$\overline{X} \xrightarrow[n]{∞} α/λ, \hat{σ}^2 \xrightarrow[n]{∞}α/λ^2$, MOM estimators are consistent (asymptotically unbiased).
$\text{Poisson}(λ)$: $\text{bias} = 0, SE ≈ \sqrt{\frac{\overline{x}}{n}}$
$N(μ, σ^2)$: $μ = μ_1$, $σ^2 = μ_2 - μ_1^2$
$Γ(λ, α)$: $\hat{λ} = \frac{\hat{μ}_1}{\hat{μ}_2-\hat{μ}_1^2}=\frac{\overline{X}}{\hat{σ}^2}, \hat{α} = \frac{\hat{μ}_1^2}{\hat{μ}_2-\hat{μ}_1^2}=\frac{\overline{X}^2}{\hat{σ}^2}$
Maximum Likelihood Estimator (MLE)
Let ${f(⋅ | θ) : θ ∈ Θ}$ be a (identifiable) parametric identity
Suppose $X_1, X_2, …,X_n$ are IID with density $f(⋅|θ)$, where $θ_0 ∈ Θ$ is an unknown constant, we want to estimate $θ_0$ using realisations $x_1, x_2, …, x_n$.
$Pr(X_1=x_1, X_2=x_2,…) = ∏ni=1f(x_i|θ)$ for a discrete distribution.
$θ → L(θ) = ∏ni=1f(x_i|θ)$
The maximum likelihood (ML) estimate of $θ_0$ is the number that maximises the likelihood over $θ$.
The estimate is a realisation of the ML estimator $\hat{θ}_0$, which can also be found my maximising $L(θ) = ∏ni=1f(X_i|θ)$
The bias and SE are:
$\text{bias} = Eθ_0(\hat{θ}_0)-θ_0, SE = SD(\hat{θ}_0)$
Poisson Case
$L(λ) = ∏^ni=1\frac{λx_ie-λ}{x_i!} = \frac{λ\sum^ni=1x_ie-nλ}{∏ni=1x_i!}$
$l(λ) = ∑ni=1x_ilog\lambda - nλ - ∑ni=1log x_i!$
ML estimate of $λ_0$ is $\overline{x}$. ML estimator is $\hat{λ}_0 = \overline{X}$
Normal case
$l(μ, σ) = -nlog\sigma - \frac{nlog 2π}{2} - \frac{∑ni=1\left(X_i-μ\right)^2}{2σ^2}$
$\frac{∂ l}{∂ μ} = \frac{∑ \left(X_i - μ\right)}{σ^2} \implies \hat{μ} = \overline{x}$
$\frac{∂ l}{∂ σ} = \frac{∑ni=1\left(X_i-μ\right)^2}{σ^3} - \frac{n}{σ} \ \implies \hat{σ^2} = \frac{1}{n}∑ni=1\left(X_i-\overline{X}\right)^2$
Gamma case
$l(θ) = nα\logλ + (α -1)∑ni=1log X_i - λ\sumni=1 X_i - nlog\Gamma(α)$
$\frac{∂ l}{∂ α} = nlog\alpha + ∑ni=1log X_i - ∑ni=1X_i - \frac{n}{Γ(α)}Γ ‘(α)$
$\frac{∂ l}{∂ λ} = \frac{nα}{λ} - ∑ni=1X_i$
$\hat{λ} = \frac{\hat{α}}{\hat{x}}$
bias and SE are estimated through Monte Carlo and Bootstrap methods.
Multinomial Case
$f(x_1, …, x_r) = {n \choose {x_1, x_2, … x_r}} ∏ni=1 p_iX_i$
where $X_i$ is the number of times the value occurs, and not the number of trials. and $x_1, x_2, … x_r$ are non-negative integers summing to $n$. $∀ i$:
$E(X_i) = np_i, Var(X_i)=np_i(1-p_i)$
$Cov(X_i,X_j) = -np_ip_j, ∀ i ≠ j$
$l(p) = Κ + ∑r-1i=1x_ilog p_i + x_rlog(1-p_1-…-pr-1)$
$\frac{∂ l}{∂ p_i} = \frac{x_i}{p_i} - \frac{x_r}{p_r} = 0 \text{ assuming MLE exists}$
$\frac{x_i}{\hat{p}_i} = \frac{x_r}{\hat{p}_r} \implies \hat{p}_i = \frac{x_i}{c}, c=\frac{x_r}{\hat{p}_r}$
$∑^ri=1\hat{p}_i = ∑^ri=1\frac{x_i}{c} = 1 \ \implies c = ∑ri=1x_i = n \implies \hat{p}_i = \frac{\overline{x}_i}{n}$
same as MOM estimator.
MLE vs MOM
- ML estimates have smaller SEs than MOM estimates
- In some cases bias and SE have to be computed numerically via methods like Newton-Rhapson, and requires bootstrap and Monte Carlo
Hardy-Weinberg Equilibrium
Let a locus have two alleles A and a, where the proportion of $a$ in the population is $θ$.
Assuming, the population is large, and mating is random, then in the next generation, the proportion of a alleles is the sum of 2 Be RV, $Bin(2,θ)$ and the number of $a$ alleles is $Bin(2n,θ)$
CIs in MLE
When sample size is large, $\hat{θ}_0$ is approximately normal.
$\frac{\hat{X} - μ}{s/\sqrt{n}} ∼ tn-1$
Given the realisations $\overline{x}$ and $s$,
$\left(\overline{x} - tn-1, α/2\frac{s}{\sqrt{n}}, \overline{x} + tn-1, α/2\frac{s}{\sqrt{n}}\right)$
is the exact $1-α$ CI for $μ$.
$\frac{n\hat{σ}^2}{σ^2} ∼ χn-1^$
$\left(\frac{n\hat{σ}^2}{χn-1,α/2^2}, \frac{n\hat{σ}^2}{χn-1,1-α/2^2}\right)$
is the exact $1-α$ CI for $σ$.