## $$Statistics \ Cheat \ Sheet \ - \ RV$$

#### 1A. Discrete Random Variables
**Probability Mass Function (pmf)**, $p(x)$ for a random variable $X$ is given by 
$$p(x) = p_X(x) = P(X = x) = P(\{w \in \Omega:X(w) = x\})$$ where $x$ is a value of the discrete RV $X$ and belongs to the set of real numbers $$\{x \in R :f(x) \gt 0\}$$ and maps to the outcomes w and $(X = x)$ is the corresponding event.<BR>
Also,
    $$p(x) \geq 0$$ and
    $$\sum_xp(x) = 1$$

#### 1B. Continuous Random Variables
For any two numbers a and b, the **probability density function (pdf)** of a continuous RV $X$ is given by
$$P(a \leq X \leq b) = \int_a^bf(x)\ dx$$ where 
$$f(x) \geq 0$$ for all $x$ and 
$$\int_{-\infty}^{\infty}f(x) \ dx = 1$$ is the area under the entire graph of $f(x)$. 

$f(x)dx$ is the probability that $X$ is in an infinitesimal range around $x$ of width $dx$.

Also, $$P(a \leq X \leq b) = P(a \lt X \lt b) = P(a \lt X \leq b) = P(a \leq X \lt b)$$

**Continuous RV at any single value**
$$P(X = c) = \int_c^cf(x) \ dx = \lim_{\epsilon \to 0} \int_{c - \epsilon}^{c + \epsilon} f(x) \ dx = 0$$

#### 2. Distribution of Probability Mass Function (PMF) 
The distribution of the pmf is the distribution of $X$ across all possible outcomes, i.e. $P(X=x_1), P(X=x_2),...,P(X=x_n)$ <BR>

**Bernoulli Random Variable**<BR>
For a Bernoulli RV, possible outcomes are $\{0,1\}$ and so the pmf distrbution will be the distribution for $P(X=0)$ and $P(X=1)$. If $P(X=1)=\alpha$, then $P(X=0) = 1 - \alpha$. Hence
$$p(x; \alpha) = \begin{cases}
1 - \alpha, & \ \text{if x = 0} \\
\alpha, & \ \text{if x = 1} \\
0, & \ \text{otherwise}
\end{cases}$$

**Geometric Distributions**
$$p(x) = \begin{cases}
(1 - p)^xp, & \ x = 0,1,2,3,... \\
0, & \ \text{otherwise}
\end{cases}$$
$$= \begin{cases}
(1 - p)^{x-1}p, & \ x = 1,2,3,... \\
0, & \ \text{otherwise}
\end{cases}$$
where $x$ is the number of failures before the first success

#### 3. Probability Density Function (PDF) 
**Continuous RV with uniform distribution**
$$f(x; A,B) = \begin{cases}
\frac{1}{B - A}, & \ A \leq x \leq B \\
0, & \ otherwise
\end{cases}$$

For n independent RVs,
$$f(x_1,...,x_n; \theta) = \begin{cases}
\frac{1}{\theta^n}, & \ 0 \leq x_1 \leq \theta,...,0 \leq x_n \leq \theta \\
0, & \ otherwise
\end{cases}$$

#### 4. Cumulative Density Function (CDF) 
**Discrete RV with pmf p(X)**
$$F(x) = P(X \leq x) = \sum_{y:y \leq x}p(y)$$
$$F(x) = \begin{cases}
0, & \ -\infty \lt x \lt x_1 \\
f(x_1), & \ x_1 \leq x \lt x_2 \\
f(x_1) + f(x_2), & \ x_2 \leq x \lt x_3 \\
\vdots & \ \vdots \\
f(x_1) + ...f(x_n), & \ x_n \leq x \lt \infty
\end{cases}$$

**Bernoulli Random Variable**
$$F(x; \alpha) = \begin{cases}
0, & \  x \lt 0 \\
1 - \alpha, & \  0 \leq x \lt 1 \\
1, & \ x \geq 1
\end{cases}$$

**Geometric Distributions**
$$F(x) = P(X \leq x) = \begin{cases}
0, & \ x \lt 1 \\
1 - (1 - p)^{[x]}, & \ x \geq 1 
\end{cases}$$
where $[x]$ is the largest integer $\leq x$

**Continuous RV**
$$F(x) = P(X \leq x) = \int_{-\infty}^x f(y) \ dy $$
$$F(x) = P(X \gt x) = \int_x^{\infty} f(y) \ dy $$
$$P(X \lt -x) = P(X \gt x)$$

**Continuous RV with uniform distribution**
$$F(x) = \int_{-\infty}^xf(y) \ dy = \int_A^x \frac{1}{B - A} dy = \frac{x - A}{B - A}$$
$$F(x) = \begin{cases}
0, & \ x \lt A \\
\frac{x - A}{B - A}, & \ A \leq x \leq B \\
1, & \ x \geq B
\end{cases}$$
$\implies$ For any $Z \sim U[0,1]$ and $a \in (0,1), P(Z \lt a) = a$

#### 5. Obtaining pmf from cdf 
For any two numbers a and b , $a \leq b$,
$$P(a \leq X \leq b) = F(b) - F(a-)$$ where "a-" is the largest possible X value that is strictly less than a.<BR>
If only possible values are integers and if a and b are integers, then
    $$P(a \leq X \leq b) = F(b) - F(a - 1)$$ and 
    $$P(X = a) = F(a) - F(a - 1)$$
    $$P(X \geq a) = 1 - F(a) = 1 - F(X \leq a)$$
    
$$f(x) = F(x) - \lim_{u \to x^-}F(u)$$

#### 6. Obtaining pdf from cdf 
$$P(X \gt a) = 1 - F(a)$$
$$P(a \leq X \leq b) = F(b) - F(a)$$

**Fundamental Theorem Of Calculus**<BR>
If $X$ is a continuous rv with pdf $f(x)$ and cdf $F(x)$, then at every $x$ at which the derivative $F'(x)$ exists
$$ F'(x) = \frac{d}{dx} F(x) = f(x)$$

#### 7. Percentile of a continuous distribution
The **(100p)th percentile** of the distribution of a continuous RV $\eta(p)$ is 
$$p = F(\eta(p)) = \int_{-\infty}^{\eta(p)}f(y) \ dy$$
where p is a number between 0 and 1 <BR>
Also, this is the same as the **pth quantile** of $X$ which is the value $q_p$ such that $P(X \leq q_p) = p$
    
In R, use **qnorm** function to calculate quantiles

**Deciles** are expressed as $10^{th}$. For example, $3^{rd}$ decile is equal to 3/10 or 0.3 quantile

**Quartiles** are expressed as $4^{th}$. For example, $3^{rd}$ quartile is equal to 3/4 or 0.75 quantile or 75th percentile
    
The **median** of a continuous RV is the 50th percentile 
$$.5 = F(\eta(.5)) = F(\tilde\mu)$$
Equivalently, median can be calculated by solving for x in
$$P(X \leq x) = 0.5$$

#### 8. Expected or Mean value 
**Discrete RV**
$$E(X) = \mu_X = \sum_{x \in D}x . p(x)$$
where X is a discrete RV with set of possible values D and pmf p(x) <BR>
If $p(x_i)=p(x_j) \ \forall \ i,j$, i.e. each outcome has the same probability and hence equal likelihood of happening, then
    $$E(X) = p(x)\sum_{i = 1}^kx_k = \frac{1}{k} \sum_{i = 1}^kx_k$$
    where 1/k is the probability of k terms with equal likelihood <BR>
$$\mu_{h(X)} = E[h(X)] = \sum_Dh(x).p(x)$$

**DRV geometric series**
$$E(X) = \sum_D x. p(x) = \sum_{x=1}^{\infty}xp(1-p)^{x-1} = p\sum_{x=1}^{\infty}x(1-p)^{x-1} = p \sum_{x=1}^{\infty}\biggl[ - \frac{d}{dp}(1-p)^x \biggl] = \frac{1}{p}$$

**DRV harmonic series**
$$\mu = E(X) = \sum_{x=1}^{\infty}x.\frac{k}{x^2} = k\sum_{x=1}^{\infty}\frac{1}{x} = \infty$$

**Continuous RV**
$$\mu_X = E(X) = \int_{-\infty}^{\infty}x.f(x) \ dx$$
$$\mu_{h(X)} = E[h(X)] = \int_{-\infty}^{\infty}h(x).f(x) \ dx$$

**Uniform Distribution**
$$E(X) = \frac{A + B}{2}$$

#### 9. Rules of Expected Value
For any linear function $h(X) = aX + b $,
$$E(aX + b) = a. E(X) + b$$
$$\mu_{aX + b} = a. \mu_X + b$$

$$E(X + Y) = E(X) + E(Y)$$
$$E(aX + bY) = aE(X) + bE(Y)$$

If $X$ and $Y$ are independent,
$$E(XY) = E(X)E(Y)$$

#### 10. Variance of X
**Discrete RV**
$$V(X) = \sigma_X^2 = \sum_D(x - \mu)^2 . p(x) = E[(x- \mu)^2]$$
$$V(X) = \sigma_X^2 = \biggl[\sum_D x^2 .p(x)\biggl] - \mu^2 = E(X^2) - [E(X)]^2 = E(X^2) - \mu^2$$

**Continuoius RV**
$$V(X) = \sigma_X^2 = \int_{-\infty}^{\infty}(x - \mu)^2 . f(x) \ dx = E[(x- \mu)^2]$$

**Uniform Distribution**
$$V(X) = \frac{(B - A)^2}{12}$$

#### 11. Standard Deviation of X
$$\sigma_X = \sqrt{\sigma_X^2}$$

#### 12.Rules of Variance & Standard Deviation
$$V[h(X)] = \sigma_{h(X)}^2 = \sum_D\{h(X) - E[h(X)]\}^2 .p(x)$$
For any linear function $h(X) = aX + b $,
$$Variance, \ V(aX + b) = \sigma_{aX + b}^2 = a^2. \sigma_x^2 = a^2 . Var(X)$$

$$Var(aX + bY) = a^2Var(X) + b^2Var(Y) + 2ab Cov(X,Y)$$
$$Var(aX - bY) = a^2Var(X) + b^2Var(Y) - 2ab Cov(X,Y)$$
If $X$ and $Y$ are independent,
$$Var(X + Y)  = Var(X) + Var(Y) = Var(X - Y)$$

$$Standard \ Deviation, \ \sigma_{aX + b} = |a|. \sigma_x$$
$E[(X - a)^2]$ is a minumum when $$a = \mu = E(X)$$

Variance of any constant is zero and if a random variable has zero variance, then it is essentially constant.

#### 13. Joint Distributions

**Discrete RV**
$$P(X_i, Y_j) = P[(X = X_i) \cap (Y = Y_j)]$$
where 
    $$p(x,y) \geq 0$$ and
    $$\sum_x\sum_yp(x,y) = 1$$
For any 2 dimensional set A,
$$P[(X,Y) \in A] = \sum_{(x,y)}\sum_{\in A}p(x,y)$$

**Continuous RV**
$$P(X_i, Y_j) = P[(X = X_i) \cap (Y = Y_j)] $$
where 
    $$p(x,y) \geq 0$$ and
    $$\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f(x,y) \ dx \ dy = 1$$
For any 2 dimensional set A,
$$P[(X,Y) \in A] = \int_A\int f(x,y) \ dx \ dy$$

If A is the two dimensional rectangle $\{(x,y):a \leq x \leq b, c \leq y \leq d\}$ , then
$$P[(X,Y) \in A] = P(a \leq X \leq b, c \leq Y \leq d)\int_a^b\int_c^d f(x,y) \ dx \ dy$$

**CDF**
$$F_{X,Y}(x,y) = P(X \leq x, Y \leq y)$$
If $X$ and $Y$ are continuous random variables with joint density $f(x, y)$ over the range $[a, b] \times [c, d]$
$$F(x,y) = \int_a^x\int_c^y f(x,y) \ dx \ dy$$
$$F(x,y) = \sum_{x_i \leq x}\sum_{y_j \leq y} p(x_i,y_j)$$

**PDF**<BR>
To get joint pdf from joint cdf,
$$f_{X,Y}(x,y) = \frac{d^2}{dxdy}F_{X,Y}(x,y)$$

Example of a joint pdf:
$$f(x,y) = \begin{cases}
24xy, & \ 0 \leq x \leq 1, 0 \leq y \leq 1, x + y \leq 1 \\
0, & \ otherwise
\end{cases}$$

**Expected Value**
$$E(XY) = \sum_{i=1}^{N_X} \sum_{j=1}^{N_Y} X_iY_j\cdot P(X_i,Y_j) $$
$$E[h(X,Y)] = \sum_x\sum_yh(x,y) \cdot p(x,y)$$
$$=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}h(x,y) \cdot f(x,y) \ dx \ dy$$

#### 14. Conditional Probability Density Function
Let $X$ and $Y$ be two continuous rv’s with joint pdf $f(x, y)$ and marginal pdf of $X$ as $f_X(x)$. Then for any $X$ value $x$ for which $f_X(x) \gt 0$, the conditional probability density function of $Y$ given that $X = x$ is
$$f_{Y|X}(y|x) = \frac{f(x,y)}{f_X(x)} \ \ -\infty \lt y \lt \infty$$
If $X$ and $Y$ are discrete, replacing pdf’s by pmf’s in this definition gives the conditional probability mass function of $Y$ when $X = x$.

This can also be written as
$$ P(Y = Y_j | X = X_i)  = \frac{P(X_i, Y_j)}{P(X = X_i)}$$

$$\implies P(X_i, Y_j) = P(Y = Y_j | X = X_i) \cdot P(X = X_i)$$

**Conditional Expectation**
$$E(Y|X = X_i) = \sum_{j = 1}^{N_Y} Y_j \cdot P(Y = Y_j | X = X_i)$$
$$=\int_{-\infty}^{\infty}y \cdot f_{X|Y}(y|x)dy$$

**Properties of Conditional Expectations**
- $E[c(X)|X] = c(X)$
- $E[a(X)Y + b(X)|X] = a(X)E(Y|X) + b(X)$
- If $X$ and $Y$ are independent, $E(Y|X) = E(Y)$

**Conditional Variance**
$$Var(Y|X = x) = E(Y^2|x) - [E(Y|x)]^2$$

#### 15. Marginal Probability
**Discrete RV**
$$p_X(x) = \sum_{y:p(x,y) \gt 0}p(x,y) = \sum_jp_{X,Y}(x_i, y_j) = p(x,y_1) + p(x, y_2) + ...$$
Here x will be constant and will be summed for all possible values of y.
$$p_Y(y) = \sum_{x:p(x,y) \gt 0}p(x,y) = \sum_ip_{X,Y}(x_i, y_j) = p(x_1,y) + p(x_2,y) + ...$$
Here y will be constant and will be summed for all possible values of x.

**Continuous RV**
$$f_X(x) = \int_{-\infty}^{\infty}f(x,y) dy \ \ for -\infty \lt x \lt \infty$$
$$f_Y(y) = \int_{-\infty}^{\infty}f(x,y) dx \ \ for -\infty \lt y \lt \infty$$
$$f_Y(y) = \int_{-\infty}^{\infty}f_{Y|X}(y|x)f_X(x) dx \ \ for -\infty \lt y \lt \infty$$

**CDF**<BR>
If $X$ and $Y$ jointly take values on $[a,b] \times [c,d]$
$$F_X(x) = F(x,d), F_Y(y) = F(b,y)$$
If $d$ is $\infty$ then this becomes a limit 
$$F_X(x) = \lim_{y \to \infty}F(x,y), F_Y(y) = \lim_{x \to \infty}F(x,y)$$

**Expectation**
$$E(X) = \sum_x x f_X(x) = \sum_x x \sum_yp_{X,Y}(x, y)$$
$$E(Y) = \sum_y y f_Y(y) = \sum_y y \sum_xp_{X,Y}(x, y)$$

#### 16. Independent RV
**Discrete**
$$p(x,y) = p_X(x) \cdot p_Y(y)$$
For discrete variables independence means the probability in a cell must be the product of the marginal probabilities of its row (entire) and column(entire)

**Continuous**
If $X$ and $Y$ are independent, then
$$f_{X|Y}(x,y) = f_X(x)$$
$$\implies f_{X|Y}(x,y) = \frac{f(x,y)}{f_Y(y)} = f_X(x)$$
$$\implies f(x,y) = f_X(x) \cdot f_Y(y)$$

To be independent, $f(x, y)$ must have the form $g(x) \cdot h(y)$ and the region of positive density must be a rectangle whose sides are parallel to the coordinate axes.
$$P(a \leq X \leq b, c \leq Y \leq d) = P(a \leq X \leq b) \cdot P(c \leq Y \leq d)$$
Equivalently,
$$F(X,Y) = F_X(x)F_Y(y)$$

#### 17. Law of iterated expectations

$$ E(Y) = E[E(Y|X)] = E_X[E(Y|X=x)]$$

#### 18. Covariance

Covariance measures the amount of **linear dependence** between two random variables. 
$$Cov(X, Y) = \sigma_{XY} = E[(X - \mu_X)(Y - \mu_Y)] = E(XY) - E(X)E(Y)$$
$$=\sum_x\sum_y(x - \mu_X)(y - \mu_Y)p(x,y) = \biggl(\sum_x\sum_y xyp(x,y)\biggl) - \mu_X\mu_Y$$
$$=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}(x - \mu_X)(y - \mu_Y)f(x,y) \ dx \ dy =\biggl(\int_a^b\int_c^dxyf(x,y) \ dx \ dy \biggl) - \mu_X\mu_Y $$

$$Cov(X, X) = E[(X - \mu_X)^2] = V(X)$$
If $X$ and $Y$ are independent, 
$$Cov(X,Y) = 0$$

Also,
$$Cov(aX + b, cY + d) = acCov(X,Y)$$
$$Cov(X, Y+Z) = Cov(X,Y) + Cov(X,Z)$$

**Cauchy Schwartz Inequality**
$$|Cov(X,Y)| \leq sd(X)sd(Y)$$

#### 19. Correlation
$$\rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X \cdot \sigma_Y}$$

**Rules of Correlation**<BR>
* If $a$ and $c$ are either both positive or both negative,
$$Corr(aX + b, cY + d) = Corr(X,Y)$$
* If $ac \lt 0$ ,
$$Corr(aX + b, cY + d) = -Corr(X,Y)$$
* For any two rv’s X and Y, 
$$-1 \leq  Corr(X, Y) \leq 1$$

* If $X$ and $Y$ are independent, then $\rho = 0$, but $\rho = 0$ does not imply independence.<BR>
* $\rho = 1 \ or \ -1$ iff $Y = aX + b$ for some numbers $a$ and $b$ with $a \neq 0$.
* For descriptive purposes, the relationship will be described as strong if $| \rho | \geq .8$, moderate if $.5 \lt |\rho | \lt .8$, and weak if $| \rho| \leq .5$.
* $\rho$ is the covariance of the standardizations of $X$ and $Y$
* $\rho$ is dimensionless (it's a ratio)

#### 20. Multinomial Distribution
$$p(x_1, x_2,...,x_n) = \begin{cases}
\frac{n!}{(x_1!)(x_2!)...(x_n!)} p_1^{x_1}...p_r^{x_r} & \ x_i = 0,1,2..., with \ x_1 +...+x_r = n \\
0, & \ otherwise
\end{cases}$$
The case $r = 2$ gives the binomial distribution, with $X1$ = number of successes and $X2 = n - X1$ = number of failures.

**PMF**
$$p(x_1, x_2,...,x_n) = P(X_1 = x_1, X_2 = x_2,...,X_n = x_n)$$

**PDF**
$$P(a_1 \leq X_1 \leq b_1,...,a_n \leq X_n \leq b_n) = \int_{a_1}^{b_1}...\int_{a_n}^{b_n}f(x_1,...,x_n)dx_n...dx_1$$

#### 21. Distribution of Sample Mean
Let $X_1, X_2, . . . , X_n$ be a random sample from a distribution with mean value $\mu$ and standard deviation $\sigma$. Then

$$E(\overline X) = \mu_{\overline X} = \mu$$
$$V(\overline X) = \sigma_{\overline X}^2 = \sigma^2/n$$
$$\sigma_{\overline X} = \sigma/\sqrt n$$

In addition, with $T_o = X_1 + . . . + X_n$ (the sample total), 
$$E(T_o) = n\mu, V(T_o) = n\sigma^2 , and \  \sigma_{T_o} = \sqrt n \sigma$$

The values of $\overline{X}_n - E(X)$ are generally closer to zero as the sample size increases. 

To quantify how "fast" $\overline{X}_n - E(X)$ converges to zero, we usually find a sequence of numbers $\{a_n\}_{n=1}^\infty$ so that 

$$a_n(\overline{X}_n -E(X))$$

neither *converges* to zero nor *diverges* to $\infty$ 

It turns out that under very general conditions that sequence is $a_n = \sqrt{n}$ .

#### 22. I.I.D. Observations 

* *independent observations*: $ P \big( [X_i= a] \cap [X_j = b]\big) = P(X_i = a)P(X_j = b)$ for all $i\neq j$

* *identically distributed*: The distribution of the values of each observation are the same which implies that for all $i \neq j$

$$ E(X_i) = E(X_j) = E(X)  \;\; \text{ and } \;\; V(X_i) = V(X_j) = V(X)$$

#### 23. Central Limit Theorem
Let $X_1, X_2,…, X_n$ be a random sample from a distribution with mean$\mu$ and variance $\sigma^2$. Then if $n$ is sufficiently large, $\overline X$ has approximately a normal distribution with $\mu_{\overline X} = \mu$ and $\sigma_{\overline X}^2 = \sigma^2/n$, and $T_o$ also has approximately a normal distribution with $\mu_{T_o} = n\mu, \sigma^2_{T_o} = n\sigma^2$. The larger the value of $n$, the better the approximation.

If
$$T_o = X_1 + X_2 + ... + X_n \ \ (n=1,2,...),$$
$$\lim_{n \to \infty} p \biggl(a \leq \frac{T_o - n \mu}{\sigma\sqrt{n}} \leq b \biggl) = \frac{1}{\sqrt{2 \pi}} \int_a^b e^{-u^2/2}du$$
where $\frac{T_o - n \mu}{\sigma\sqrt{n}}$ is the standardized random variable


** Lindeberg–Lévy CLT ** <BR>
if
* $E(X) = \mu $ and $-\infty < \mu  < \infty$, i.e. $E(X)$ has to be finite

* $V(X) = \sigma^2 < \infty $, i.e. $V(X)$ has to be finite

* $\{X_i\}_{i=1}^n$ are i.i.d observations 

then, 

$$ \sqrt{n}(\overline{X}_n - \mu) \stackrel{d}{\longrightarrow} N(0,\sigma^2) $$
or of more practical use, for n "large" 
$$ \overline{X}_n  \sim N \left(\mu,\frac{\sigma^2}{\sqrt{n}} \right) $$
where $\sim$ means " is approximately distributed as" 

**Law of Large Numbers (LLN)**<BR>
**If**: $E(X_i) < \infty$, **and**  $f(X_1,X_2, \ldots,X_n) = n^{-1}\sum_{i=1}^n X_i \; $ **and** $\{X_{i}\}_{i=1}^n$ is an i.i.d random sample **then**    

$$ \left| n^{-1}\sum_{i=1}^n X_i - E(X) \right| \stackrel{p}{\rightarrow} 0 $$

** Weak Law of Large Numbers ** <BR>
For any $\epsilon \gt 0$
$$\lim_{n \to \infty}P(|\overline X_n - \mu| \lt \epsilon) = 1$$
$$\overline X_n \xrightarrow{P} \mu \ as \ n \to \infty$$
    
The CDFs of $Z_n$ converge to $\phi(z)$:
$$\lim_{n \to \infty} F_{Z_n}(z) = \phi(z)$$

#### 24. Linear Combination RV
$$Y = a_1X_1 + ... + a_nX_n = \sum_{i = 1}^na_iX_i$$
where $X_1, . . . , X_n$ are a collection of $n$ random variables and $n$ numerical constants $a_1, . . . , a_n$

Taking $a_1 = a_2 =... = a_n = 1$ gives
$$Y = X_1 + ... + X_n = T_o$$
and $a_1 = a_2 =... = a_n = 1/n$ yields
$$Y = \frac{1}{n}X_1 + ... + \frac{1}{n}X_n = \frac{1}{n}T_o = \overline X$$

Here $X_i$ ’s don't have to be independent or identically distributed. All the $X_i$ ’s could have different distributions and therefore different mean values and variances.<BR>
    
**Expectation**<BR>
Whether or not the $X_i$ ’s are independent, 
    $$E(a_1X_1 + a_2X_2 + . . . + a_nX_n) = a_1E(X_1) + a_2E(X_2) + ... + a_nE(X_n) = a_1\mu_1 + ... + a_n\mu_n$$
    $$E\biggl(\sum_{i=1}^n a_i X_i \biggl) = \sum_{i=1}^n a_i E(X_i)$$

**Variance**<BR>
The random variables ${X_1, ... , X_n}$ are **pairwise uncorrelated random variables** if each variable in the set is uncorrelated with every other variable in the set. That is, $Cov(X_i , X_j) = 0$, for all $i \neq j$.
    
If $X_1, . . . , X_n$ are independent,
$$V(a_1X_1 + a_2X_2 + ... + a_nX_n) = a_1^2V(X_1) + a_2^2V(X_2) + ... + a_n^2V(X_n) = a_1^2\sigma_1^2 + a_2^2\sigma_2^2 + ... + a_n^2\sigma_n^2$$
and 
$$\sigma_{a_1X_1 + ... + a_nX_n}= \sqrt{a_1^2\sigma_1^2 + a_2^2\sigma_2^2 + ... + a_n^2\sigma_n^2}$$

For any $X_1, . . . , X_n$,
$$V\biggl(\sum_{i=1}^n a_i X_i \biggl) = \sum_{i=1}^n a_i^2 V(X_i)$$
$$V(a_1X_1 + a_2X_2 + ... + a_nX_n) = \sum_{i = i}^n\sum_{j = 1}^na_ia_jCov(X_i, X_j)$$
$$V(a_1X_1 + a_2X_2) = a_1^2V(X_1) + a_2^2V(X_2) + 2a_1a_2Cov(X_1, X_2)$$

For random sample ($X_i$ 's iid) with $a_1 = a_2 =... = a_n = 1/n$, 
$$E(\overline X) = \mu, V(\overline X) = \sigma^2/n$$

**Special Case**<BR>
If $n=2$ and $a_1 = 1$ and $a_2 = -1$, then
$$E(X1 - X2) = E(X1) - E(X2)$$ for any two rv’s X1 and X2. $$V(X1 - X2) = V(X1) + V(X2)$$ if X1 and X2 are independent rv’s.

#### Higher Order Moments - Skewness and Kurtosis

Third moment of standard normal RV $Z$ is used to determine skewness
$$E(Z^3) = \frac{E[(X - \mu)^3]}{\sigma^3}$$

Fourth moment of standard normal RV $Z$ is used to determine kurtosis
$$E(Z^4) = \frac{E[(X - \mu)^4]}{\sigma^4}$$