In [2]:
import numpy as np
from __future__ import division

### Bayes' Theorem for Random Variables

Let's reason about Bayes' Theorem as a computational procedure.

We start by observing $Y = y$

* $X$ can take value $x$ with probability $P(X=x) = p_X(x)$ based on our prior knowledge, But now that we have observed $Y = y$, this changes. We can represent this change as a weighting of the original probability $p_X(x)$ by a factor, which happens to be $p_{Y|X}(y|x)$. Hence, our belief of how plausible $X = x$ becomes:

  $\alpha(x|y) \triangleq p_X(x)p_{Y|X}(y|x)$
  This is equivalent to creating a new probability table $\alpha(\cdot | y)$ but note that this is no longer guaranteed to be normalized.
  
* Consequently, the next step is re-normalization. Based on what we know about probabilities and how to make them normalized, we can write:
  $p_{X|Y}(x | y) = \frac{\alpha(\cdot|y)}{\sum_{x^{\prime}}\alpha(x^{\prime}|y)}$

To sum it up:

$p_{X|Y}(x|y) = \frac{p_X(x)p_{Y|X}(y|x)}{\sum_{x^{\prime}}p_X(x^{\prime})p_{Y|X}(y|x^{\prime})}$

So we are framing our problem into the problem of computing poster probability distribution $p_{X|Y}(x|y)$. For this purpose, we use the famous Bayes' theorem.

### Maximum A Posteriori Distribution (MAP)

Often we are interested in reporting which values of $X$ has the highest posterior probability. This value is called _maximum a posteriori_ (MAP) estimate of $X$ given $Y = y$. It is denoted as $\hat{x}_{MAP}(y)$ and is given by:

$\hat{x}_{MAP}(y) = \operatorname*{arg\,max}_x p_{X|Y}(x|y)$

A note on notation: The keyword $arg$ means that we are interested in the argument of the function $p_{X|Y}(x|y)$ that maximizes the poster probability i.e., the value of x.

### Checking for independence


In [18]:
# Joing probability distribution of W and I
prob_W_I = np.array([[1/2, 0], [0, 1/6], [0, 1/3]])
# We can generate marginal probabilities
prob_W = prob_W_I.sum(axis=1)
prob_I = prob_W_I.sum(axis=0)

print prob_W_I

[[ 0.5         0.        ]
 [ 0.          0.16666667]
 [ 0.          0.33333333]]


In [15]:
test_prob_W_I = np.outer(prob_W, prob_I)
print test_prob_W_I

[[ 0.25        0.25      ]
 [ 0.08333333  0.08333333]
 [ 0.16666667  0.16666667]]


In [1]:
# Joing probability distribution of W and I
prob_X_Y = np.array([[1/4, 1/4], [1/12, 1/12], [1/6, 1/6]])
# We can generate marginal probabilities
prob_X = prob_X_Y.sum(axis=1)
prob_Y = prob_X_Y.sum(axis=0)

print prob_X_Y
test_prob_X_Y = np.outer(prob_X, prob_Y)
print test_prob_X_Y

NameError: name 'np' is not defined

### Conditional Independence
The idea of conditional independence:
The definition of conditional independence follows from the general definition of independence, i.e.,
$X \perp Y$, if
$p_{X, Y}(x, y) = p_X(x)p_Y(y)$
Similarly, $X$ and $Y$ are conditionally indpendent give $Z$, if:
