### Bayesian Network Inference Library

We will be using the pomegranate library for Bayes Net inference

  * Installation instructions https://pomegranate.readthedocs.io/en/latest/install.html
  * Tutorial / documentation https://pomegranate.readthedocs.io/en/latest/BayesianNetwork.html
  
In the tutorial / documentation, ignore the parts about "initializing a Bayesian network based completely on data" and the sections on "Probability" "Prediction" and "Fitting" -- see the example below on how to determine the probability distribution on a node in the graph based on evidence.

Just to make sure things are working, first load in the Monty Hall code from the tutorial and answer the question about whether or not a contestant should take Monty up on his offer to switch doors.

The Monty Hall problem:  
  * A prize is placed randomly between door A or door B or door C
  * The guest chooses one of those three doors
  * Monty then reveals whether or not one of the other doors contains the prize
     * Monty will never reveal the door that the guest chooses
     * Monty will nevery reveal the door with the prize
  * Monty gives the guest the chance to change choices to the door that Monty didn't reveal
     
Suppose for example the guest chooses door A.  Monty then reveals that door B does not contain the prize.  Which is the case:

  * This information is of no use to the guest.  It is now equally likely that the prize is behind A and behind C.  Therefore the guest has no incentive to change from A to C
  * It is now more likely that the prize is behind C, so the guest should switch to C
  * It is now more likely that the prize is behind A, so the guest should not switch


---------------------------------------

![Monty](MontyPicture.GIF)

---------------------------------------

In [None]:
from pomegranate import *

# The three "random variables" are
#    guest -- what door will the guest choose -- doors are A, B, and C
#    prize -- what door is the prize behind
#    monty -- what door will Monty open.  This is a function of both guest and prize:
#               Monty will never open the door the guest chooses and will never open the 
#               door with the prize (if the guest doesn't choose it)
#             So the first three lines of the CPT below say the guest chooses A and 
#               the prize is behind A, and then Monty will choose B or C with equal probability

# Notice the pattern of building networks:  
#   -- build your distributions -- either DiscreteDistribution for nodes without parents
#          or ConditionalProbabilityTable for nodes with parents.  The CPT for Monty needs 27 
#          entries, since there are 9 possible combination of parent values, and three possible
#          values the monty random variable can take.

guestdist = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
prizedist = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
montydist = ConditionalProbabilityTable(
        [['A', 'A', 'A', 0.0],
         ['A', 'A', 'B', 0.5],
         ['A', 'A', 'C', 0.5],
         
         ['A', 'B', 'A', 0.0],
         ['A', 'B', 'B', 0.0],
         ['A', 'B', 'C', 1.0],
         
         ['A', 'C', 'A', 0.0],
         ['A', 'C', 'B', 1.0],
         ['A', 'C', 'C', 0.0],
         
         ['B', 'A', 'A', 0.0],
         ['B', 'A', 'B', 0.0],
         ['B', 'A', 'C', 1.0],
         
         ['B', 'B', 'A', 0.5],
         ['B', 'B', 'B', 0.0],
         ['B', 'B', 'C', 0.5],
         
         ['B', 'C', 'A', 1.0],
         ['B', 'C', 'B', 0.0],
         ['B', 'C', 'C', 0.0],
         
         ['C', 'A', 'A', 0.0],
         ['C', 'A', 'B', 1.0],
         ['C', 'A', 'C', 0.0],
         
         ['C', 'B', 'A', 1.0],
         ['C', 'B', 'B', 0.0],
         ['C', 'B', 'C', 0.0],
         
         ['C', 'C', 'A', 0.5],
         ['C', 'C', 'B', 0.5],
         ['C', 'C', 'C', 0.0]], [guestdist, prizedist])

s1 = Node(guestdist, name="guest")
s2 = Node(prizedist, name="prize")
s3 = Node(montydist, name="monty")

model = BayesianNetwork("Monty Hall Problem")
model.add_states(s1, s2, s3)
model.add_edge(s1, s3)
model.add_edge(s2, s3)
model.bake()

In [None]:
# Based on no more evidence, what is the likelihood that the contestant will win the prize?
model.predict_proba({})

In [None]:
# The list above is a list of nodes, or states. How do you get the 
# name of the nodes?   
#  This is an ordered list of the node names -- the name used to construct the node

list(map(lambda s: s.name, model.states))

In [None]:
# For example, this is the distribution of values over the monty variable, given no 
# evidence
model.predict_proba({})[2].parameters[0]

In [None]:
# Helper functions to get a distribution based on node name and evidence

def probDist(nodeName, model, evidence):
    def nodeIndex(model, nodeName):
        return list(map(lambda s: s.name, model.states)).index(nodeName)
    return model.predict_proba(evidence)[nodeIndex(model, nodeName)].parameters[0]

In [None]:
# Easier way to get at Monty distribution
probDist('monty', model, {})

In [None]:
# Add evidence -- does choosing a door change belief about where the prize is?
probDist('prize', model, {'guest': 'A'})

In [None]:
##  Suppose the guest chooses A, and Monty chooses B.
##  Monty gives the guest to switch from A to C.  Should she?

probDist('prize', model, {"guest": 'A', "monty": 'B'})

----------------------------------------------------------------

### Second Example:  Typical Noisy Sensor

* The variable **water** is the amount of water in my basement.  This variable takes values **{none, some, lots}**
* I have a water detector **waterDetector** that is either **on** or **off**
  * It is supposed to be **on** if and only if **water** is either **some** or **lots**
  * However, it sometimes fails to alert (is **off** when **water** is either **some** or **lots**)
  * It also sometimes false alarms (is **on** when **water** is **none**)

This is what I discovered by observing the basement over time
* On any given day, the probability of **water** is **(.98, .015, .005)** for values **(none, some, lots)**
* The likelihood of a false alarm **P(waterDetector = on | water = none) = 0.01**
* The likelihood of the sensor missing water depends on the water level: 
  * **P(waterDetector = off | water = some) = .10**;   
  * **P(waterDetector = off | water = lots) = .005**


Network has 2 nodes, **water** and **waterDetector**, and **water** is the parent of **waterDetector**

In [None]:
from pomegranate import *
waterdist = DiscreteDistribution({'none': .98, 'some': .015, 'lots': .005})
waterdetectordist = ConditionalProbabilityTable(
        [['none', 'off', 0.99],
         ['none', 'on', 0.01],
         
         ['some', 'off', 0.10],
         ['some', 'on', 0.90],
         
         ['lots', 'off', 0.005],
         ['lots', 'on', 0.995]
        ], [waterdist])

water = Node(waterdist, name="water")
waterdist = Node(waterdetectordist, name="waterdetector")

wmodel = BayesianNetwork("Water Sensor")
wmodel.add_states(water, waterdist)
wmodel.add_edge(water, waterdist)
wmodel.bake()

With no further information, what is the likelihood that there is some or lots of water in my basement

In [None]:
# Compute probabilities on the basis of no additional evidence.  Its output is a list of 
# distributions over node values, in the order they were added -- in our case, water is at [0] and waterDetector is at [1]

d = probDist('water', wmodel, {})
d['some'] + d['lots']

With no evidence, what is the likelihood that my water detector is displaying ON

In [None]:
probDist('waterdetector', wmodel, {})['on']

Suppose I learn that the water detector is **on**.  How does that affect my beliefs over the basement water level

In [None]:
d = probDist('water', wmodel, {})
d2 = probDist('water', wmodel, {'waterdetector': 'on'})

print(f"Distribution with no evidence: {d}")
print(f"Distribution with water detector on is {d2}")

print("Change in belief that water level is NONE when water detector is ON: {:.0%}".format((d2['none'] - d['none'])/d['none']))
print("Change in belief that water level is SOME when water detector is ON: {:.0%}".format((d2['some'] - d['some'])/d['some']))
print("Change in belief that water level is LOTS when water detector is ON: {:.0%}".format((d2['lots'] - d['lots'])/d['lots']))

Suppose instead I go to the basement and observe that there is no water in the basement.  
How does that affect my belief as to whether or not the water detector is on?

In [None]:
d = probDist('waterdetector', wmodel, {})
d2 = probDist('waterdetector', wmodel, {'water': 'none'})

print(f"Distribution with no evidence: {d}")
print(f"Distribution with water detector on is {d2}")
print("Change in belief that water detector is ON when water NONE: {:.0%}".format((d2['on'] - d['on'])/d['on']))
print("Change in belief that water level is OFF when water is NONE: {:.0%}".format((d2['off'] - d['off'])/d['off']))


Suppose I have no information about the water level and the detector.  What is the probability that there is either some or lots of water, and that the water detector is off?

In [None]:
d = probDist('water', wmodel, {})
d2 = probDist('waterdetector', wmodel, {})
probwater = d['some'] + d['lots']
probdetectoroff = d2['off']

In [None]:
print("Prob some or lots of water: {:.0%}".format(probwater))
print("Prob water detector off: {:.0%}".format(probdetectoroff))
print("Prob some or lots of water AND water detector OFF: {:.0%}".format(probwater*probdetectoroff))