In [1]:
import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as pp
%matplotlib inline

# 10.1

## (a) graph structure
The graph structure gives us the dependencies between the variables. (A)-->(B) means B depends on A, that is $p(a,b) = p(a)\ p(b|a)$

## (b) conditional independence
If two random variables (a, b) depend on the same variable (z) and nothing else, they become independent once z is observed.

$p(a,b,z) = p(z)\ p(a|z)\ p(b|z)$

but

$p(a|b,z) = p(a|z)$

and

$p(b|a,z) = p(b|z)$

(having observed (z), additionally observing (a) doesn't influence the likelihood of (b) occurring and vice versa)

Since the graph structure gives us the dependencies between the random variables, it is now sufficient to only store those explicitely defined probablities instead of having to assume dependencies between all of the variables, i.e. a fully connected graph.

## (c) topological sorting
Elements inside parantheses can be permutated:
(BEF)(A)(DH)(CG)

## (d) joint distribution factorization of the graph
$p(A,B,C,D,E,F,G,H) = p(B) p(E) p(F) \quad p(A|F) \quad p(D|A,E) p(H|A,B) \quad p(C|F,H) p(G|A,H)$

## (e) Markov blanked of node A
Markov blanket of node X: (X) + (parent nodes) + (child nodes) + (parents of child nodes)

For A these are all nodes except for C: (A) + (F) + (D,G,H) + (B,E) = AB_DEFGH

## (f) naive Bayes classifier
In the naive Bayes approach we discard relationsships between the effect variables and treat them as conditionally independent, as if they only depended on the cause variable.

For instance in email spam detection we would treat word occurence frequencies independent of each other given the cause variable (spam, not spam). The graphical model would have 1 root node (spam, not spam) with #words child nodes. Given the word frequencies in vector $x$, we would determine $p(spam|x)$ like this:

$p(\text{spam}|x) = \frac{p(\text{spam})\ p(x|\text{spam})}{p(x)} \overset{\text{conditionally independent } x_i}{\propto} p(\text{spam})\ \prod_i p(x_i|\text{spam})$

$p(\text{not spam}|x) = \frac{p(\text{not spam})\ p(x|\text{not spam})}{p(x)} \overset{\text{conditionally independent } x_i}{\propto} p(\text{not spam})\ \prod_i p(x_i|\text{not spam})$$

The denominator $p(x)$ may be discarded, because it's the same for $p(\text{spam}|x)$ and $p(\text{not spam}|x)$, i.e. has no effect on a specific email's classification as spam or not spam.

# 10.2 Software

In [135]:
# water sprinkler example from K. Murphy
def p(c=[0,1],r=[0,1],s=[0,1],w=[0,1]):
    if isinstance(c, int): c = [c]
    if isinstance(r, int): r = [r]
    if isinstance(s, int): s = [s]
    if isinstance(w, int): w = [w]
   
    # variable syntax: p_...x := p(x|...)
    p_c = [0.5, 0.5]  # p(c)
    p_cs = [[0.5, 0.5], [0.9, 0.1]]  # p(s|c)
    p_cr = [[0.8, 0.2], [0.2, 0.8]]  # p(r|c)
    p_rsw = [[[1.0, 0.0], [0.1, 0.9]], [[0.1, 0.9], [0.01, 0.99]]]  # p(w|rs)
 
    p_sum = 0.0
    for ci in c:
        for ri in r:
            for si in s:
                for wi in w:
                    p_sum += p_c[ci] * p_cs[ci][si] * p_cr[ci][ri] * p_rsw[ri][si][wi]

    return p_sum

def example():
    print("p(s=1|w=1) =", p(s=1, w=1) / p(w=1))
    print("p(r=1|w=1) =", p(r=1, w=1) / p(w=1))
    print("p(s=1|w=1,r=1) = ", p(s=1,w=1,r=1) / p(w=1,r=1))
    
    return
example()

p(s=1|w=1) = 0.4297635605006954
p(r=1|w=1) = 0.7079276773296246
p(s=1|w=1,r=1) =  0.19449901768172886


# 10.3

## (a) graph
<img src="files/graph10-3a.jpg">

## (b) explaining away / selection bias
Selection bias is a phenomenon, where selecting a subpopulation can make otherwise independent variables dependent in the subpopulation.

(see document bottom for code and exact values)
### in the graph above

Both (B) and (E) cause (A), but are independent: 

$$P(B \mid E) = P(B)$$

But if we know the state of their effect (A), they become dependent:

$$P(B \mid E, A) \not = P(B \mid A)$$

### another example
For example, let the population be all humans and the variables be if they ...

    (h) have good hearing
    (l) speak your language
    (u) understand a word you're saying

with a V-shape graph-structure such that (h)-->(u) and (l)-->(u):
    
             (h) (l)
               \ /
                V
               (u)
           
        Figure 1: ASCII Art

Normally, (h) if they have good hearing and (l) if they speak your language are independent. But by selecting a population, which (u) understands what you're saying, the posterior probability $p(h=1 \mid l=1,u=1)$ is lower than $p(h=1 \mid u=1)$: (h) and (l) are now dependent because either one is sufficient to explain the evidence (u).

*Hearing well* now depends on *speaking your language* in the subpopulation of *people who understand what you're saying*.

In [17]:
# the graph in (a)
def p(b=[0,1],e=[0,1],a=[0,1],r=[0,1]):
    if isinstance(b, int): b = [b]
    if isinstance(e, int): e = [e]
    if isinstance(a, int): a = [a]
    if isinstance(r, int): r = [r]
   
    # variable syntax: p_...x := p(x|...)
    p_b = [0.99, 0.01]
    p_e = [1.-10e-6, 10e-6]
    p_re = [[1.0, 0.0], [0.0, 1.0]]
    p_bea = [[[0.999, 0.001], [0.59, 0.41]], [[0.05, 0.95], [0.02, 0.98]]]
 
    p_sum = 0.0
    for bi in b:
        for ei in e:
            for ai in a:
                for ri in r:
                    p_sum += p_b[bi] * p_e[ei] * p_re[ri][ei] * p_bea[bi][ei][ai]

    return p_sum

def example():
    print("p(b=1) =", p(b=1))
    print("p(b=1|e=1) =", p(b=1,e=1) / p(e=1))
    print()
    
    for b in [1,0]:
        for e in [1,0]:
            for a in [1,0]:
                pb_a = p(b=b,a=a) / p(a=a)
                pb_ea = p(b=b,e=e,a=a) / p(e=e,a=a)
                sign = "< " if pb_ea < pb_a else ">="
                
#                 print("b=%i,e=%i,a=%i:  " % (b,e,a), end="")
                print("p(b=%i|e=%i,a=%i)" % (b,e,a), sign, "p(b=%i|a=%i)" % (b,a))
#                 print(pb_ea)
#                 print(pb_ea)

                
example()

p(b=1) = 0.01
p(b=1|e=1) = 0.010000000000000002

p(b=1|e=1,a=1) <  p(b=1|a=1)
p(b=1|e=1,a=0) <  p(b=1|a=0)
p(b=1|e=0,a=1) >= p(b=1|a=1)
p(b=1|e=0,a=0) >= p(b=1|a=0)
p(b=0|e=1,a=1) >= p(b=0|a=1)
p(b=0|e=1,a=0) >= p(b=0|a=0)
p(b=0|e=0,a=1) <  p(b=0|a=1)
p(b=0|e=0,a=0) <  p(b=0|a=0)


In [18]:
# extra example
def p(h=[0,1],l=[0,1],u=[0,1]):
    if isinstance(h, int): h = [h]
    if isinstance(l, int): l = [l]
    if isinstance(u, int): u = [u]
   
    # variable syntax: p_...x := p(x|...)
    p_h = [0.1, 0.9]
    p_l = [0.95, 0.05]
    p_lhu = [[[1.0, 0.0], [0.8, 0.2]], [[0.1, 0.9], [0.001, 0.999]]]
 
    p_sum = 0.0
    for li in l:
        for hi in h:
            for ui in u:
                p_sum += p_h[hi] * p_l[li] * p_lhu[li][hi][ui]

    return p_sum

def example():
    # this might make it clearer:
    # before observing (u), the variables (l) and (h) are independent: p(h|l) == p(h), 
    #   because p(h,l) = p(h)*p(l)
    #   and thus p(h|l) = p(h,l)/p(l) = p(h) p(l)/p(l) = p(h)
    print("not having observed u:")
    print("p(h=1) =", p(h=1))
    print("p(h=1|l=1) =", p(h=1,l=1) / p(l=1))
    # but if we additionally observe their effect (u): p(h|l,u) != p(h|u), 
    #   because p(h,l|u) != p(h|u)*p(l|u)
    #   and thus p(h|l,u) = p(h,l,u)/p(l,u) = p(h) p(u) p(u|h,l) / p(l,u)
    print("\nhaving observed u the probability for h=1 goes down if l=1:\n")

    # p(h|l,u) <= p(h|u),  if h == l
    # p(h|l,u) >= p(h|u),  if h != l
    # no matter what the effect u is
    for h in [1,0]:
        for l in [1,0]:
            for u in [1,0]:
                ph_u = p(h=h,u=u) / p(u=u)
                ph_lu = p(h=h,l=l,u=u) / p(l=l,u=u)
                sign = "< " if ph_lu < ph_u else ">="

#                 print("h=%i,l=%i,u=%i:  " % (h,l,u), end="")
                print("p(h=%i|l=%i,u=%i)" % (h,l,u), sign, "p(h=%i|u=%i)" % (h,u))
#                 print(ph_lu)
#                 print(ph_u)
                
    return
example()

not having observed u:
p(h=1) = 0.9
p(h=1|l=1) = 0.9

having observed u the probability for h=1 goes down if l=1:

p(h=1|l=1,u=1) <  p(h=1|u=1)
p(h=1|l=1,u=0) <  p(h=1|u=0)
p(h=1|l=0,u=1) >= p(h=1|u=1)
p(h=1|l=0,u=0) >= p(h=1|u=0)
p(h=0|l=1,u=1) >= p(h=0|u=1)
p(h=0|l=1,u=0) >= p(h=0|u=0)
p(h=0|l=0,u=1) <  p(h=0|u=1)
p(h=0|l=0,u=0) <  p(h=0|u=0)
