# Conditional Expectation

**Video lecture: [https://youtu.be/RKtSO9hmb1A](https://youtu.be/RKtSO9hmb1A)**

Given a probability space $(\Omega, \mathcal{F_o}, P)$, a sigma field $\mathcal{F} \subset \mathcal{F_o}$ and a random variable $X \in \mathcal{F_o}$ with $E|X| \lt\infty$, we define the **condition expectation** of$X$ given $\mathcal{F}$, $E[X|\mathcal{F}]$, to be any random variable $Y$ that has:

1. $Y \in \mathcal{F}$
2. for all $E \in \mathcal{F}$, $\int_{E}XdP = \int_{E}YdP$

If there's another random variable also satisfies thise two conditions, they are equal a.s. We denote any random variable that satisfies these two conditions as $E[X|\mathcal{F}]$.



If $Y'$ also satisfies 1 and 2 of the conditional expectation definition then $Y$ and $Y'$ are equal almost surely.

_Proof:_

Let $A = \{ Y - Y' \ge \epsilon \gt 0\}$, Since both $Y$ and $Y'$ are $\mathcal{F}$ measurable, the $Y-Y' \ge \epsilon$ is also $\mathcal{F}$ measurable. By the 2 condition we have:

$$
0=\int_{A}X - X dP = \int_{A}Y - Y' dP \ge \epsilon P(A)
$$

which indicates $P(A) = 0$. Conversely, we could also conclude that $A' = \{Y' - Y \ge \epsilon \gt 0\}$ also has measure zero.

$\blacksquare$.

## Radon-Nikodyn(RN) Theorem

Let $\mu$ and $\nu$ be $\sigma$-finite measures on $(\Omega, \mathcal{F})$. If $\nu \ll \mu$, then there's a $\mathcal{F}$ measurable function $f$ so that for all $A \in \mathcal{F}$:

$$
\int_{A}f d\mu = \nu(A)
$$

$f$ is usually denoted $d\nu/d\mu$ and called the **Radon-Nikodym derivative**.

- **$\sigma$-finite**: $\mu$ is said to be $\sigma$-finite if there's a increasing sequence of sets $A_n \in \mathcal{F}$ that satisfies $A_n \uparrow \Omega$ and $\mu(A_n) \lt \infty$ for all $n$.
- **$\nu$ absolute continuous with respect to $\mu$ ($\nu \ll \mu$)**: If $\mu(A) = 0$ indicates $\nu(A) = 0$.

**Existence of probability density function:** Let $\mu$ be lebesgue measure, then for any probability measure $\nu$ that is absolute continuous w.r.t $\mu$ then $\nu$ has a probability densitiy function which is the Radon-Nikodym derivative.

**Existence of conditional expectation:** 

First assume $X \ge 0$, and $\mu=P$ and $\nu(A) = \int_{A}XdP$ for all $A \in \mathcal{F}$. $\nu$ is a measure (verify disjoin additivity property) and $\nu\ll\mu$ (cause whenevever $P(A) = 0$, we must have $\nu(A) = 0$). By RN theorem, 

$$
\int_{A}XdP = \nu(A) = \int_{A} \frac{d\nu}{d\mu} dP
$$

the RN derivative is a version of conditional expectation $E[X|\mathcal{F}]$. 

For any $X$, use $X = X^+ - X^-$ we can conclude the existence of conditional expectation for both positive and negative part of $X$ hence the conditional expection of $X$ itself.

## Theorem: $\sigma$-field generated by partitions

**Video lecture: [https://youtu.be/YdDtsZ5OT2w](https://youtu.be/YdDtsZ5OT2w)**

Suppose $\Omega_1,\Omega_2,...$ is finite or infinite partition of $\Omega$ into disjoin sets, $P(\Omega_i) \gt 0$ for all $i$. Let $\mathcal{F} = \sigma(\Omega_1, \Omega_2,...)$, then a version of $E[X|\mathcal{F}]$ is:

$$
E[X|\mathcal{F}]_\omega = \sum_{i=1}^{\infty}c_i\mathbf{1}_{\omega \in \Omega_i}
$$

where $c_i = \frac{E[X; \Omega_i]}{P(\Omega_i)} = \frac{\int_{\Omega_i}XdP}{P(\Omega_i)}$.

_Proof:_

Because each $\mathbf{1}_{\Omega}$ is $\mathcal{F}$ measurable, obviously $E[X|\mathcal{F}]$ is $\mathcal{F}$ measurable because of it's a summation of simple measurable step functions. To verify the second condition ($\int_{A}{\sum_{i=1}^{\infty}c_i\mathbf{1}_{\omega \in \Omega_i}}dP = \int_{A}{X}dP$ for all $A \in \mathcal{F}$), note that any event $A \in \mathcal{F}$ is a countable union of disjoint $\Omega_i$. Therefore:

$$
\begin{align}
&\int_{A}{\sum_{i=1}^{\infty}c_i\mathbf{1}_{\omega \in \Omega_i}}dP \\
&= \int{\mathbf{1}_{\Omega_{i(1)}}(\sum_{j=1}^{\infty}c_j\mathbf{1}_{\Omega_j}})dP + \int{\mathbf{1}_{\Omega_{i(2)}}(\sum_{j=1}^{\infty}c_j\mathbf{1}_{\Omega_j}})dP + ... \\
&= \int{c_{i(1)}\mathbf{1}_{\Omega_{i(1)}}}dP + \int{c_{i(2)}\mathbf{1}_{\Omega_{i(2)}}}dP ... 
\end{align}
$$

And we are reduced to prove:

$$
\begin{aligned}
\int_{\Omega_i}c_i dP &= \int_{\Omega_i}X dP \\
c_i \int_{\Omega_i} dP &= \frac{E[X; \Omega_i]}{P(\Omega_i)} P(\Omega_i) = \int_{\Omega_i}X dP = E[X;\Omega_i]
\end{aligned}
$$

$\blacksquare$

## Conditional Probability

$$
P(A|\mathcal{F}) = E[\mathbf{1}_{A}|\mathcal{F}]
$$

If we let $\mathcal{F} = \sigma(B, B^c)$, then by the theorem above, the following is a version of $P(A|\mathcal{F})$:

$$
P(A|\mathcal{F}) = \frac{P(A\cap B)}{P(B)}\mathbf{1}_{B} + \frac{P(A\cap B^c)}{P(B^c)}\mathbf{1}_{B^c}
$$

Given that experiment $\omega$ falls in $B$, then 

$$
P(A|\mathcal{F})_{\omega} = \frac{P(A\cap B)}{P(B)}
$$

And this is exactly how the elementary conditional probability comes from, and mathematicians gives this a definition ($P(A|B)$) that you see in elementary probability books:

$$
P(A|B) = \frac{P(A\cap B)}{P(B)}
$$

## Theorem: Bayes' Formula

For any $B \in \mathcal{F}$:

$$
P(B|A) = \frac{\int_{B}{P(A|\mathcal{F})}dP}{\int_{\Omega}{P(A|\mathcal{F})}dP}
$$

If $\mathcal{F}$ is a $\sigma$-field generated by a partition, then this formula reduces to the usual Bayes' formula:

$$
P(B_i|A) = \frac{P(A|B_i)P(B_i)}{\sum_{i=1}^{\infty}P(A|B_i)P(B_i)}
$$

_Proof:_

$$
\int_{B}P(A|\mathcal{F})dP = \int_{B}E(\mathbf{1}_{A}|\mathcal{F})dP = P(A\cap B)
$$

$$
P(B|A)\int_{\Omega}{P(A|\mathcal{F})}dP = \frac{P(B\cap A)}{P(A)}P(A) = P(A\cap B)
$$

$\blacksquare$

## Theorem: Conditional expectation with PDF

**Video lecture: https://www.youtube.com/watch?v=IJoNHV_Mpbw**

Suppose $X,Y$ have joint pdf $f(x,y)$, and $\int{f(x,y)}dx \not= 0$, and if $E|g(X)| \lt \infty$ then $E[g(X)|Y] = h(Y)$ where:

$$
h(y) = \frac{\int{g(x)f(x,y)}dx}{\int{f(x,y)}dx}
$$

**Lemma**

If $f$ is measurable, $f_x(y): y \mapsto f(x, y)$ is measurable.

_Proof of lemma:_

First we claim that function $T_x: y \mapsto (x,y) $ is measurable. This implies $f_x = f \circ T_x$ is measurable. 

To prove that $T_x$ is measurable, note that $T_x^{-1}$ is an operator that cross cut the two dimension Borel set on $x$. Define $B_x = \{y: (x,y) \in B\}$ where $B \in \mathcal{B}^2$, 



$$
\begin{align}
& T_x^{-1}(B^c) = \mathbf{R} - T_x^{-1}(B)  \quad \text{ implies } \quad  (B^c)_x = (B_x)^c \\
& T_x^{-1}(\cup_i{B_i}) = \cup_i{T_x^{-1}(B_i)}  \quad  \text{ implies }  \quad   (\cup B_i)_x = \cup (B_i)_x
\end{align}
$$

Let $\mathbb{B} = \{B: B \in \mathcal{B}^2 \text{ such that } B_x \in \mathcal{B} \}$. 

- If $B \in \mathbb{B}$, then $(B^c)_x = (B_x)^c \in \mathcal{B}$ implies $B^c \in \mathbb{B}$.
- If $B_i \in \mathbb{B}$, then $(\cup_i{B_i})_x = \cup_i(B_i)_x \in\mathcal{B}$ implies $\cup_i{B_i} \in \mathbb{B}$.

Therefore $\mathbb{B}$ is a $\sigma$-field, and since rectangle borel sets are also included in $\mathbb{B}$, we have $\mathcal{B}^2 \subset \mathbb{B}$, which concludes $T_x^{-1}(B) = B_x \in \mathcal{B}$ for all $B \in \mathcal{B}^2$.

$\blacksquare$

**Lemma**

$y \mapsto \int{f(x,y)}dx$ is measurable.

_Proof of lemma:_

First assume $f \ge 0$, $f$ measurable implies $y \mapsto f(x,y)$ is measurable by the lemma above. Define $f_x^{n}(y) = \frac{[2^nf(x,y)]}{2^n}\wedge{n}$. $f_x^{n}(y)$ are simple functions that converges upwards to $f(x,y)$.

Measurability of simple function $f_x^{n}(y)$ indicates $y \mapsto \int{f_x^{n}(y)}dx$ is measurable. The limit of $y \mapsto \int{f_x^{n}(y)}dx$ converges to $y \mapsto \int{f(x, y)}dx$ by the monotone convergence theorem. limits of measurable function is measurable, therefore $y \mapsto \int{f(x,y)}dx$ is measurable.

For arbitrary $f$, simply write $f = f^{+} - f^{-}$ to finish our prove.

$\blacksquare$

_Proof:_

First, by the lemma above, $h(y)$ is measurable, so condition 1 of the definition of conditional expectation is met. To prove the second condition, suppose $A \in \sigma(Y)$, $A = \{\omega: Y(\omega) \in B\}$ for some $B \in \mathcal{B}$:

$$
\begin{align}
\int_{B}\int{h(y)f(x,y)}dxdy &= \int_{B}{h(y)\int{f(x,y)}dx}dy \\
&= \int_{B}{\int{g(x)f(x,y)}dx}dy \\
&= E[g(X)\mathbf{1}_{Y(\omega) \in B}] = E[g(X);A]
\end{align}
$$

$\blacksquare$

## Properties of conditional expectation

**Video lecture: https://youtu.be/Uk-HBqB7HnA**


1. Linearity: $E[aX+bY|\mathcal{F}] = aE[X|\mathcal{F}] + bE[Y|\mathcal{F}]$
2. Monotonicty: $X \le Y$ implies $E[X|\mathcal{F}] \le E[Y|\mathcal{F}]$
3. Chebyshev's inequality: Suppose $\phi: \mathbf{R} \mapsto \mathbf{R}$, and $\phi \ge 0$, Let $A\in\mathcal{B}$ and $i_a = \inf\{\phi(x): x \in A\}$, then $i_AP(X\in A | \mathcal{F}) \le E[\phi(X)|\mathcal{F}]$
4. Markov inequality: For any $a \in \mathbf{R}$, $a^2P(|X| \ge a | \mathcal{F}) \le E[X^2|\mathcal{F}]$
5. Monotone convergence: If $X_n \ge 0$ and $X_n\uparrow X$ with $E|X| \lt \infty$, then $E[X_n|\mathcal{F}]$
6. Jenson inequality: If $\phi$ is convex and $E|X| \lt \infty$ and $E|\phi(X)| \lt \infty$ then $\phi(E[X|\mathcal{F}]) \le E[\phi(X)|\mathcal{F}]$
7. Cauchy-Schwarz inequality: $E[XY|\mathcal{F}]^2 \le E[X^2|\mathcal{F}]E[Y^2|\mathcal{F}]$
8. $E[E[X|\mathcal{F}]] = E[X]$
9. Given $\mathcal{F_1} \subset \mathcal{F_2}$ then: $E[E[X|\mathcal{F_1}]|\mathcal{F_2}] = E[X|\mathcal{F_1}]$
10. Given $\mathcal{F_1} \subset \mathcal{F_2}$ then: $E[E[X|\mathcal{F_2}]|\mathcal{F_1}] = E[X|\mathcal{F_1}]$

## Theorem

If $X\in \mathcal{F}$, $E|Y| \lt \infty$ and $E|XY| \lt \infty$ then $E[XY|\mathcal{F}] = XE[Y|\mathcal{F}]$

_Proof:_

$XE[Y|\mathcal{F}]$ is obviously $\mathcal{F}$ measurable. Only need to check the second condition. We first check when $X = \mathbf{1}_A$ for any $A\in\mathcal{F}$:

$$
\begin{align}
\int_{E} E[\mathbf{1}_{A}Y|\mathcal{F}]dP &= \int_{E} \mathbf{1}_{A}YdP \\
&= \int_{E \cap A}YdP \\
&= \int_{E \cap A}E[Y|\mathcal{F}]dP \\
&= \int_{E}\mathbf{1}_{A}E[Y|\mathcal{F}]dP
\end{align}
$$

Therefore its also true when $X$ is simple positive random variable. Then further to positive $X$ where $X$ is the limit of increasing positive simple random variables $X_n \uparrow X$. Last, for any $X = X^+-X^-$.

$\blacksquare$

## Theorem: Mean square error and Orthogonality

Suppose $X \in L^2(\mathcal{F_o})$, where $L^2(\mathcal{F_o}) = \{Y \in \mathcal{F_o}: E[Y^2] \lt \infty\}$.  $L^2(\mathcal{F}) \subset L^2(\mathcal{F_o})$.

$E[X|\mathcal{F}]$ is the random variable $Y \in \mathcal{F}$ that minimized $E[(X-Y)^2]$.

_Proof:_

If $Y \in L^2(\mathcal{F})$, let $Z = E[X|\mathcal{F}] - Y$, then:

$$
\begin{align}
E[(X-Y)^2] &= E[(X-E[X|\mathcal{F}] + Z)^2] \\
&= E[(X-E[X|\mathcal{F}])^2] + 2E[Z(X-E[X|\mathcal{F}])] + E[Z^2]
\end{align}
$$

The middle term is zero:

$$
E[Z(X-E[X|\mathcal{F}])] = E[ZX] - E[E[ZX|\mathcal{F}]] = 0
$$

$E[(X-Y)^2]$ is minimized when $Z = 0$.

$\blacksquare$