In [1]:
%run ../../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython
config_ipython()
setup_matplotlib()
set_css_style()

# Joint, Marginal and Conditional probability

## The joint probability

The joint probability of one or more events is the probability that they happen together. If $X$, $Y$, $Z$, ... are the random variables, their joint probability is written as

$$
P(X, Y, Z, \ldots)
$$

or as 

$$
P(X \cap Y \cap Z \ldots)
$$

### The case of independent variables

If the variables are independent, their joint probability reduces to the product of their probabilities: $P(X_1, X_2, \ldots, X_n) = \Pi_{i=1}^n P(X_i)$. 

## The marginal probability


<figure style="float:left;">
  <img src="../../imgs/joint-marg.png" width="500" align="left" style="margin:0px 50px"/>
  <figcaption>Image by IkamusumeFan (Own work) [<a href="http://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>], <a href="https://commons.wikimedia.org/wiki/File%3AMultivariate_normal_sample.svg">via Wikimedia Commons</a></figcaption>
</figure>


If we have the joint probability of two or more random variables, the marginal probability of each is the probability related to that variable and to its own space of events; it expresses the probability of the variable when the value of the other one is not known. It calculated by summing the joint probability over the space of events of the other variable. More specifically, given $P(X, Y) = P(X=x, Y=y)$,

$$
P(X=x) = \sum_y P(X=x, Y=y) \ .
$$

The illustration here shows points extracted from a joint probability (the black dots) and the marginal probabilities as well.

## The conditional probability

The conditional probability expresses the probability that an event occurrs given that another one has occurred. With $Y$ being the (variable related to the) event that has occurred and $X$ the (variable related to the) event whose probability of occurrence we are interested in, it is defined as

$$
P(X | Y) = \frac{P(X, Y)}{P(Y)} \ ,
$$

that is, as the ratio of the joint probability of the two to the probability of $Y$. 

In the case of more than two variables we can write the joint probability as

$$
P(X_1, X_2, \ldots, X_n) = P(X_1 | X_2, \ldots, X_n) P(X_2, \ldots, X_n)
$$

and can repeat the process to isolate them one by one, obtaining

$$
P(\cap_{i=1}^n X_i) = \Pi_{i=1}^n P(X_i | \cap_{j=i+1}^{n} X_j) \ ,
$$

which is known as the [chain rule](https://en.wikipedia.org/wiki/Chain_rule_(probability). 

### In the case of independence

This is easy, we would have

$$
P(X | Y) = \frac{P(X, Y)}{P(Y)} = \frac{P(X) P(Y)}{P(Y)} = P(X) \ ,
$$

which indicates what is suggested by the definition itself: independent variables mean that the happening of one does not influence the happening of the other. 