# Joint and Conditional Probability, Bayes' Theorem and Law of Total Probability {#sec-fundamental-laws}

## Overview

Random variables, see [chapter @sec-random-variables],  are very often analyzed with respect to other random variables. We need therefore tools so that
we are able to work in such cases. Hence, in this chapter, we will look into some fundamental concepts of probability theory namely <a href="https://en.wikipedia.org/wiki/Joint_probability_distribution">joint</a> and conditional probability and <a href="https://en.wikipedia.org/wiki/Bayes'_theorem">Bayes' theorem</a>.
We will also introduce the law of <a href="https://en.wikipedia.org/wiki/Law_of_total_probability">total probability</a>.


## Joint probability

Let's say we toss two die and we want to compute the probability that the first dice returns $X$ and the second $Y$. 
This is called the joint probability and it is denoted with $P(X,Y)$. Specifically, given two random variables, $X$ and $Y$ which are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs [1]. Referring back to the set theory from chapter [Basic Set Theory @sec-basic-set-theory], the
joint probability corresponds to the intersection of the two events $X$ and $Y$. If the two events are independent, then 

$$P(X,Y)=P(X)P(Y)$$

The joint distribution can also be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables and the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).

If both $X$ and $Y$ are continuous, we denote the joint PDF as $f_{X,Y}(x,y)$. The joint CDF, see [chapter @sec-cumulative-distribution-function], is defined in a similar way
as in the 1D case; 

$$F_{X,Y}(x,y)=P(X \leq x, Y \leq y)=\int_{-\infty}^{x}\int_{-\infty}^{y} f_{X,Y}(u,v)dvdu$$

In a similar token we can write the joint PMF for two discrete variables

$$F_{X,Y}(x, y)=P(X = x, Y = y)=\sum_{-\infty}^{x}\sum_{-\infty}^{y} f_{X,Y}(u,v)$$

Given that we know the joint PDF, we can marginalize over it to calculate the PDF of either $X$ or $Y$ as follows

$$f_X(x) = \int_{-\infty}^{\infty}f_{X,Y}(x,y)dy$$

## Conditional probability

Very often we want to calculate the probability of an event $E_1$ given that another event $E_2$ has occured. 
Let's assume that $E_1 \bigcup E_2 = \Omega$ i.e. that the two events span $\Omega$. Given that
$E_2$ has occured, this somehow must constraint the occurence of $E_1$. Notice that this does not neccessarilly mean
that the probability of $E_1$ now becomes smaller. We denote the conditional probability of event
$E_1$ given $E_2$ with $P(E_1|E_2)$ and we have the following definition:


----
**Definition: Conditional Probability**


The conditional probability $E_1$ and $E_2$ is defined as [5]:

$$P(E_1|E_2) = \frac{P(E_1 \bigcap E_2)}{P(E_2)}=\frac{P(E_1, E_2)}{P(E_2)}$$

----

Obviously the definition above assumes that $P(E_2) \neq 0$. Notice also, that if the events $E_1$ and $E_2$ are independent, then 

$$P(E_1|E_2) = P(E_1)$$


The idea behind the definition above is that if the event $E_2$ occurred then the relevant sample space becomes that of $E_2$ rather than $\Omega$ [5].
<a href="https://en.wikipedia.org/wiki/Bayes'_theorem">Bayes' theorem</a> can also be used in order to calculate this probability. 

In addtion, from the definition above we have the multiplication law, which is very useful  when we want to calculate the probabilities of intersections.


----
**Multiplication Law**

$$P(E_1 \bigcap E_2) = P(E_1|E_2)P(E_2)$$

----


### Law of total of probability

Let $E_1, E_2,\dots, E_n$ be such that their union makes up the entire sample space $\Omega$ i.e $\bigcup_{i=1}^{\infty}E_i = \Omega$. Also let $E_i \bigcap E_j = \emptyset$ for $i \neq j$.
Let also that $P(E_i)>0$ for all i. Then, for any  event $A$  [5]: 

$$P(A) = \sum_{i=1}^n P(A|E_i)P(E_i)$$

The law of total probability is very useful when we want to calculate $P(A)$ but doing so is not so obvious but calculating $P(A|E_i)$ and $P(E_i)$ is more straightforward. 




### Bayes' rule

Another importan law in probability theory is <a href="https://en.wikipedia.org/wiki/Bayes'_theorem">Bayes' theorem</a>.
Bayes' law allows us to calculate the probability of an event, based on prior knowledge of conditions that might be related to the event. Specifically, Bayes' law for two events $A$ and $B$ is 

$$P(A|B) = \frac{P(B|A)P(B)}{P(A)}$$

Bayesian inference is heavily based on Bayes' theorem. Adopting a Bayesian interpretation of probability, Bayes' theorem expresses 
how a degree of belief, expressed as a probability, should rationally change to account for the availability of related evidence.

We can also state Bayes' theorem for a series of mututally disjoint events that span the sample space $\Omega$. Specifically, see [4],

----
**Bayes' Rule**

Let $A$ and $B_1, \dots, B_n$ be events such that $\bigcup_{i=1}^{n} B_i = \Omega$. Lets also $P(B_i)>0$ $\forall$ i.
Then

$$P(B_j|A) = \frac{P(A|B_j)P(B_j)}{\sum_{i=1}^{n}P(A|B_i)P(B_i)}$$

----

So far we have discussed some important laws from probability theory. Let's work on some theoretical examples so that we solidify them

### Examples

## Summary

In this chapter we looked into some fundamental, and frequently used in practice, laws of probability theory. Specifically,
we discussed the joint and conditional probability of two events and introduce Bayes' theorem. We also presented the law
of total probability and the multiplication law.

The next chapter introduces the concept of random variable. A <a href="https://en.wikipedia.org/wiki/Random_variable">random variable</a>  allows us to link between data and sample spaces. 

## References

1. <a href="https://en.wikipedia.org/wiki/Joint_probability_distribution">Joint probability</a>
2. <a href="https://en.wikipedia.org/wiki/Law_of_total_probability">Law of total probability</a>.
3. <a href="https://en.wikipedia.org/wiki/Bayes'_theorem">Bayes' theorem</a>
4. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.
5. John A. Rice _Mathematical Statistics and Data Analysis_, 2nd Edition, Duxbury Press, 1995.