# Joint  and Conditional Probability

## Overview

In this chapter, we will look into some fundamental concepts of probability theory namely <a href="https://en.wikipedia.org/wiki/Joint_probability_distribution">joint</a> and conditional probability.
We will also introduce the law of <a href="https://en.wikipedia.org/wiki/Law_of_total_probability">total probability</a>.


## Joint and conditional probaility

### Joint probability

Let's say we toss two dice and we want to compute the probability that the first dice returns $X$ and the second $Y$. In other words, we 
want to compute the joint probability $P(X,Y)$. Specifically, given two random variables, $X$ and $Y$ which are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs [1]. 


The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables and the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).





Given that the two dice are independent, then 

$$P(X,Y)=P(X)P(Y)$$

### Conditional probability

The conditional proability of two events $E_1$ and $E_2$ is defined as follows


----
**Definition: Conditional Probability**


The conditional probability $E_1$ and $E_2$ is defined as 

$$P(E_1|E_2) = \frac{P(E_1 \bigcap E_2)}{P(E_2)}$$

----

Obviously the definition above assumes that $P(E_2)=0$. Notice also, that if the events $E_1$ and $E_2$ are independent, then 

$$P(E_1|E_2) = P(E_1)$$

In addtion, from the definition above we have the multiplication law


----
**Multiplication Law**

$$P(E_1 \bigcap E_2) = P(E_1|E_2)P(E_2)$$

----

The multiplication law is very useful when we want to calculate the probabilities of intersections.

### Law of total of probability

Let $E_1, E_2,\dots, E_n$ be such that their union makes up the entire sample space $\Omega$ i.e $\bigcup_{i=1}^{\infty}E_i = \Omega$. Also let $E_i \bigcap E_j = \emptyset$ for $i \neq j$.
Let also that $P(B_i)>0$ for all i. Then, for any even event $A$  

$$P(A) = \sum_{i=1}^n P(A|E_i)P(E_i)$$

The law of total probability is very useful when we want to calculate $P(A)$ but doing so is not so obvious but calculating $P(A|E_i)$ and $P(E_i)$ is more straightforward. 




### Bayes' rule

Another importan law in probability theory is <a href="https://en.wikipedia.org/wiki/Bayes'_theorem">Bayes' theorem</a>.
Bayes' law allows us to calculate the probability of an event, based on prior knowledge of conditions that might be related to the event. Specifically, Bayes' law for two events $A$ and $B$ is 

$$P(A|B) = \frac{P(B|A)P(B)}{P(A)}$$

Bayesian inference is heavily based on Bayes' theorem. Adopting a Bayesian interpretation of probability, Bayes' theorem expresses 
how a degree of belief, expressed as a probability, should rationally change to account for the availability of related evidence.

We can also state Bayes' theorem for a series of mututally disjoint events that span the sample space $\Omega$. Specifically, see [4],

----
**Bayes' Rule**

Let $A$ and $B_1, \dots, B_n$ be events such that $\bigcup_{i=1}^{n} B_i = \Omega$. Lets also $P(B_i)>0$ $\forall$ i.
Then

$$P(B_j|A) = \frac{P(A|B_j)P(B_j)}{\sum_{i=1}^{n}P(A|B_i)P(B_i)}$$

----

In [3]:
from collections import defaultdict

In [4]:
omega={(i,j):i+j for i in range(1,7) for j in range(1,7)}

In [5]:
dinv = defaultdict(list)
for i,j in omega.items():
    dinv[j].append(i)

In [6]:
X={i:len(j)/36. for i,j in dinv.items() }
X

{2: 0.027777777777777776,
 3: 0.05555555555555555,
 4: 0.08333333333333333,
 5: 0.1111111111111111,
 6: 0.1388888888888889,
 7: 0.16666666666666666,
 8: 0.1388888888888889,
 9: 0.1111111111111111,
 10: 0.08333333333333333,
 11: 0.05555555555555555,
 12: 0.027777777777777776}

## Summary

In this chapter we looked into some fundamental, and frequently used in practice, laws of probability theory

## References

1. <a href="https://en.wikipedia.org/wiki/Joint_probability_distribution">Joint probability</a>
2. <a href="https://en.wikipedia.org/wiki/Law_of_total_probability">Law of total probability</a>.
3. <a href="https://en.wikipedia.org/wiki/Bayes'_theorem">Bayes' theorem</a>
4. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.