In [6]:
import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as pp
%matplotlib inline

# 10.1

## (a) graph structure
The graph structure gives us the dependencies between the variables. (A)-->(B) means B depends on A, that is $p(a,b) = p(a)\ p(b|a)$

## (b) conditional independence
If two random variables (a, b) depend on the same variable (z) and nothing else, they become independent once z is observed.

$p(a,b,z) = p(z)\ p(a|z)\ p(b|z)$

but

$p(a|b,z) = p(a|z)$

and

$p(b|a,z) = p(b|z)$

(having observed (z), additionally observing (a) doesn't influence the likelihood of (b) occurring and vice versa)

Since the graph structure gives us the dependencies between the random variables, it is now sufficient to only store those explicitely defined probablities instead of having to assume dependencies between all of the variables, i.e. a fully connected graph.

## (c) topological sorting
Elements inside parantheses can be permutated:
(BEF)(A)(DH)(CG)

## (d) joint distribution factorization of the graph
$p(A,B,C,D,E,F,G,H) = p(B) p(E) p(F) \quad p(A|F) \quad p(D|A,E) p(H|A,B) \quad p(C|F,H) p(G|A,H)$

## (e) Markov blanked of node A
Markov blanket of node X: (X) + (parent nodes) + (child nodes) + (parents of child nodes)

For A these are all nodes except for C: (A) + (F) + (D,G,H) + (B,E) = AB_DEFGH

## (f) naive Bayes classifier
In the naive Bayes approach we discard relationsships between the effect variables and treat them as conditionally independent, as if they only depended on the cause variable.

For instance in email spam detection we would treat word occurence frequencies independent of each other given the cause variable (spam, not spam). The graphical model would have 1 root node (spam, not spam) with #words child nodes. Given the word frequencies in vector $x$, we would determine $p(spam|x)$ like this:

$p(\text{spam}|x) = \frac{p(\text{spam})\ p(x|\text{spam})}{p(x)} \overset{\text{conditionally independent } x_i}{\propto} p(\text{spam})\ \prod_i p(x_i|\text{spam})$

$p(\text{not spam}|x) = \frac{p(\text{not spam})\ p(x|\text{not spam})}{p(x)} \overset{\text{conditionally independent } x_i}{\propto} p(\text{not spam})\ \prod_i p(x_i|\text{not spam})$$

The denominator $p(x)$ may be discarded, because it's the same for $p(\text{spam}|x)$ and $p(\text{not spam}|x)$, i.e. has no effect on a specific email's classification as spam or not spam.

# 10.2 Software

In [29]:
# water sprinkler example from K. Murphy
def p(c=[0,1],r=[0,1],s=[0,1],w=[0,1]):
    if isinstance(c, int): c = [c]
    if isinstance(r, int): r = [r]
    if isinstance(s, int): s = [s]
    if isinstance(w, int): w = [w]
   
    # variable syntax: p_...x := p(x|...)
    p_c = [0.5, 0.5]  # p(c)
    p_cs = [[0.5, 0.5], [0.9, 0.1]]  # p(s|c)
    p_cr = [[0.8, 0.2], [0.2, 0.8]]  # p(r|c)
    p_rsw = [[[1.0, 0.0], [0.1, 0.9]], [[0.1, 0.9], [0.01, 0.99]]]  # p(w|rs)
 
    p_sum = 0.0
    for ci in c:
        for ri in r:
            for si in s:
                for wi in w:
                    p_sum += p_c[ci] * p_cs[ci][si] * p_cr[ci][ri] * p_rsw[ri][si][wi]

    return p_sum

def example():          
    print("p(s=1|w=1) =", p(s=1, w=1)/p(w=1))
    print("p(r=1|w=1) =", p(r=1, w=1)/p(w=1))

    return
example()

p(s=1|w=1) = 0.4297635605006954
p(r=1|w=1) = 0.7079276773296246
