# Expectation

**Definition**

The "expected value"/"mean"/"first moment" of $X$ is defined as:

- Discrete case

$\mathbb{E}(X) = \mu_X = \sum_x xf_X(x)$, where $\mathbb{E}(|X|) < \infty \iff \sum_x |x|f_X(x) < \infty$

- Continuous case

$\mathbb{E}(X) = \mu_X = \int xdF_X(x) = \int xf_X(x)dx$, where $\mathbb{E}(|X|) < \infty \iff \int |x|f_X(x)dx < \infty$

---

**Theorem (the rule of lazy statistician)**

Let $Y = r(X)$. Then:

- Discrete case

$\mathbb{E}(Y) = \mathbb{E}(r(X)) = \sum_x r(x)f_X(x)$

- Continuous case

$\mathbb{E}(Y) = \mathbb{E}(r(X)) = \int r(x)dF_X(x) = \int r(x)f_X(x)dx$

---

**Theorem (the rule of lazy statistician extended)**

Let $Z = r(X, Y)$. Then:

- Discrete case

$\mathbb{E}(Z) = \mathbb{E}(r(X, Y)) = \sum_{x,y} r(x,y)f_{XY}(x,y)$

- Continuous case

$\mathbb{E}(Z) = \mathbb{E}(r(X, Y)) = \int \int r(x,y)dF_{XY}(x,y) = \int \int r(x,y)f_{XY}(x,y)dxdy$

---

**Definition**

The $k^{th}$ moment of $X$ is $\mathbb{E}(X^k)$, where $\mathbb{E}(|X^k|) < \infty$

---

**Theorem**

If $k^{th}$ moment exists ($\mathbb{E}(|X^k|) < \infty$) and if $j < k$, then $j^{th}$ moment exists.

---

**Definition**

The $k^{th}$ central moment of $X$ is $\mathbb{E}((X-\mathbb{E}(X))^k) = \mathbb{E}((X-\mu_X)^k)$

### Properties of Expectations

**Theorem**

If $X_1,...,X_n$ are random variables and $a_1,...,a_n$ are constants. Then:

$\mathbb{E}(\sum_i a_i X_i) = \sum_i a_i\mathbb{E}(X_i)$

---

**Theorem**

Let $X_1,...,X_n$ be independent random variables. Then:

$\mathbb{E}(\prod_i X_i) = \prod_i \mathbb{E}(X_i)$

### Variance and Covariance

**Definition**

Let $X$ be a random variable with mean $\mu$. The variance of $X$ is defined by:

- Discrete case

$\mathbb{V}(X) = \sigma_X^2 = \mathbb{E}((X - \mu)^2) = \sum (x - \mu)^2 f_X(x)$

- Continuous case

$\mathbb{V}(X) = \sigma_X^2 = \mathbb{E}((X - \mu)^2) = \int (x - \mu)^2 dF_X(x) = \int (x - \mu)^2 f_X(x)dx$

---

**Definition**

The standard deviation of $X$ is:

$\sigma_X = \sqrt{\mathbb{V}(X)}$

---

**Theorem**

Variance has the following properties:

- $\mathbb{V}(X) = \mathbb{E}(X^2) - \mu^2$

- If $a$ and $b$ are constants, then $\mathbb{V}(aX + b) = a^2\mathbb{V}(X)$

- If $X_1,...,X_n$ are independent and $a_1,...,a_n$ are constants, then $\mathbb{V}(\sum_i a_i X_i) = \sum_i a_i^2 \mathbb{V}(X_i)$

---

**Definition**

Let $X$ and $Y$ be random variables with means $\mu_X$ and $\mu_Y$, and standard deviations $\sigma_X$ and $\sigma_Y$. The covariance between $X$ and $Y$ is defined as:

$Cov(X,Y) = \mathbb{E}((X-\mu_X)(Y-\mu_Y))$

---

**Definition**

Let $X$ and $Y$ be random variables with means $\mu_X$ and $\mu_Y$, and standard deviations $\sigma_X$ and $\sigma_Y$. The correlation between $X$ and $Y$ is defined as:

$\rho = \rho_{XY} = \rho(X,Y) = \frac{Cov(X,Y)}{\sigma_X\sigma_Y}$

---

**Theorem**

The covariance satisifies:

$Cov(X,Y) = \mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)$

---

**Theorem**

The correlation satisfies:

$-1\leq \rho(X,Y)\leq 1$

---

**Theorem**

If $Y = aX + b$, then $\rho(X,Y) = 1$ if $a > 0$ and $\rho(X,Y) = -1$ if $a < 0$. The converse is not true in general.

---

**Theorem**

If $X$ and $Y$ are independent, then $Cov(X,Y) = \rho(X,Y) = 0$. The converse is not true in general.

---

**Theorem**

$\mathbb{V}(X+Y) = \mathbb{V}(X) + \mathbb{V}(Y) + 2Cov(X,Y)$

$\mathbb{V}(X-Y) = \mathbb{V}(X) + \mathbb{V}(Y) - 2Cov(X,Y)$

For random variables $X_1,...,X_n$:

$\mathbb{V}(\sum_i a_i X_i) = \sum_i a_i^2 \mathbb{V}(X_i) + 2\sum\sum_{i<j} a_i a_j Cov(X_i, X_j)$

### Sample mean and variance

**Definition**

If $X_1,...,X_n$, the sample mean is defined by:

$\overline{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$

---

**Definition**

If $X_1,...,X_n$, the sample variance is defined by:

$S_n^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \overline{X}_n)^2$

---

**Theorem**

Let $X_1,...,X_n$ be IID and let $\mu = \mathbb{E}(X_i)$ and $\sigma^2 = \mathbb{V}(X_i)$. Then:

- $\mathbb{E}(\overline{X}_n) = \mu$

- $\mathbb{V}(\overline{X}_n) = \frac{\sigma^2}{n}$

- $\mathbb{E}(S_n^2) = \sigma^2$

### Multivariate Mean and Variance

**Definition**

If $X = \begin{bmatrix} X_1 \\ \vdots \\ X_n \end{bmatrix}$, then $\mathbb{E}(X) = \begin{bmatrix} \mathbb{E}(X_1) \\ \vdots \\ \mathbb{E}(X_n) \end{bmatrix}$

---

**Definition**

If $X = \begin{bmatrix} X_1 \\ \vdots \\ X_n \end{bmatrix}$, then $\mathbb{V}(X) = 
\begin{bmatrix}
\mathbb{V}(X_1) & Cov(X_1, X_2) & \cdots & Cov(X_1, X_n) \\
Cov(X_2, X_1) & \mathbb{V}(X_2) & \cdots & Cov(X_2, X_n) \\
\vdots & \vdots & \vdots & \vdots \\
Cov(X_n, X_1) & Cov(X_n, X_2) & \cdots & \mathbb{V}(X_n)
\end{bmatrix}$

---

**Lemma**

If $a$ is a vector and $X$ is a random vector with mean $\mu$ and variance $\Sigma$, then $\mathbb{E}(a^T X) = a^T\mu$ and $\mathbb{V}(a^T X) = a^T\Sigma a$. If $A$ is a matrix, then $\mathbb{E}(AX) = A\mu$ and $\mathbb{V}(AX) = A\Sigma A^T$.

### Conditional Expectation

**Definition**

The conditional expectation of $X$ given $Y=y$ is:

- Discrete case

$\mathbb{E}(X|Y=y) = \sum x f_{X|Y}(x|y)$

- Continuous case

$\mathbb{E}(X|Y=y) = \int x f_{X|Y}(x|y)dx$

---

**Definition**

If $r(x,y)$ is a function of $x$ and $y$, the conditional expectation of $r(X,Y)$ given $Y=y$ is:

- Discrete case

$\mathbb{E}(r(X,Y)|Y=y) = \sum r(x,y) f_{X|Y}(x|y)$

- Continuous case

$\mathbb{E}(r(X,Y)|Y=y) = \int r(x,y) f_{X|Y}(x|y)dx$

---

**Theorem (the rule of iterated expectations)**

For random variables $X$ and $Y$, assuming the expectations exist, we have that:

$\mathbb{E}(\mathbb{E}(Y|X)) = \mathbb{E}(Y)$ and $\mathbb{E}(\mathbb{E}(X|Y)) = \mathbb{E}(X)$

More generally, for any function $r(x,y)$, we have that:

$\mathbb{E}(\mathbb{E}(r(X,Y)|X)) = \mathbb{E}(r(X,Y))$

---

**Definition**

The conditional variance is defined as:

- Discrete case

$\mathbb{V}(Y|X=x) = \sum (y - \mathbb{E}(Y|X=x))^2 f(y|x)$

- Continuous case

$\mathbb{V}(Y|X=x) = \int (y - \mathbb{E}(Y|X=x))^2 f(y|x)dy$

---

**Theorem**

For random variables $X$ and $Y$,

$\mathbb{V}(Y) = \mathbb{E}(\mathbb{V}(Y|X)) + \mathbb{V}(\mathbb{E}(Y|X))$

### Moment Generating Functions

**Definition**

The moment generating function (MGF) of $X$ (Laplace transform of $X$) is defined by:

$\psi_X(t) = \mathbb{E}(e^{tX}) = \int e^{tx}dF_X(x)$, where $t\in\mathbb{R}$

- $\psi'(0) = \left[\frac{d}{dt}\mathbb{E}(e^{tX})\right]_{t=0} = \left[\mathbb{E}(\frac{d}{dt}e^{tX})\right]_{t=0} = \left[\mathbb{E}(Xe^{tX})\right]_{t=0} = \mathbb{E}(X)$

- $\psi^{(k)}(0) = \mathbb{E}(X^k)$

---

**Lemma**

Properties of the MGF:

- if $Y=aX+b$, then $\psi_Y(t) = e^{bt}\psi_X(at)$

- if $X_1,...,X_n$ are independent and $Y = \sum_i X_i$, then $\psi_Y(t) = \prod_i \psi_i(t)$, where $\psi_i$ is the MGF of $X_i$.

---

**Theorem**

Let $X$ and $Y$ be random variables. If $\psi_X(t) = \psi_Y(t)$ for all $t$ in an open interval around 0, then $X$ and $Y$ are equal in distribution.