# An Introduction to Bayes Nets and Exact Inferencing

## References 

The information contained in this notebook can be derived from the following sources:
 - Sheldon Ross, "A First Course in Probability, 7th Edition", Pearson Prentice Hall, 2006 
 - Zoubin Ghahramani, "Learning Dynamic Bayesian Networks," Department of Computer Science at University of Toronto, 1997

## Bayesian vs Frequentists

In statistcs, there are several interpretations to consider. Two common approaches are the Bayesian (conditional) and Frequentist approach. There are many conceptual ways to understand this distinction, but it can be instructive to start at the definition of both first. 
<br>
>$\textbf{Frequentist:}$ A Frequentist defines probability in terms of relative frequency. For a given sample space, S, the probability that an event E occurs is definted as: $$\lim_{n \rightarrow \infty} \frac{n(E)}{n} = P(E).$$ In this case, the interpretation depends on the relatively frequency of the given event over an unspecified number of trials, and the $\it{assumption}$ that this converges to a constant value. This assumption must be taken as an axiom of the interpretation, since there is no mathematically rigorous way to prove that a second set of trials won't produce a different limiting value for an arbitrary S.

<br>
>$\textbf{Bayesian:}$ A Bayesian uses information from other variables in the sample space, S, to determine an event's probability. More formally, this can be defined as: $$P(E|F) = \frac{P(E \cap F)}{P(F)}.$$ This allows Bayesians to infer the probability of an event given ancillary information about the system.

<br>
At face value, these two interpretations aren't in direct opposition. However, they have drastic practical implications. Consider the following scenario. You roll 2 dice. One stays on the table, showing a 3, the other rolls off the table under a chair. You want to know the probability that you rolled a 7. A Frequentist would conduct a set of trials, and compute $$\lim_{n \rightarrow \infty} \frac{n(E)}{n}.$$ In this case, they would find that the probability is 1/6 assuming a fair set of dice (there are 6 ways to sum to 7 for a pair of dice, over 36 possible outcomes). A Bayesian, on the other hand, would use the fact that one dice is 3, and infer that the other one must be 4 in order to equal 7. Since the other dice has a 1/6 chance of landing on 4, the probability would be 1/6. 

<br>
It turns out the the Bayesian interpretation is vastly more powerful in the context of machine learning. It allows us to tap into a $\it{prior}$ distribution, and use real-world observables to compute a corresponding $\it{posterior}$ distribution given our prior. It also allows us to decompose an arbitrary joint probability distribution as a product of conditional probabilities.  

In [2]:
from bayesian.bbn import build_bbn
import numpy as np

In [3]:
# DEFINE BBN NODES AND CPTs
def Pollution_Node(P):
    '''Pollution'''
    if P == 'high':
        return 0.1
    elif P == 'low':
        return 0.9


def Smoker_Node(S):
    '''Smoker'''
    if S is True:
        return 0.3
    elif S is False:
        return 0.7


def Cancer_Node(P, S, C):
    '''Cancer'''
    table = dict()
    table['ttt'] = 0.05
    table['ttf'] = 0.95
    table['tft'] = 0.02
    table['tff'] = 0.98
    table['ftt'] = 0.03
    table['ftf'] = 0.97
    table['fft'] = 0.001
    table['fff'] = 0.999
    key = ''
    key = key + 't' if P == 'high' else key + 'f'
    key = key + 't' if S else key + 'f'
    key = key + 't' if C else key + 'f'
    return table[key]


def Xray_Node(C, X):
    '''X-ray'''
    table = dict()
    table['tt'] = 0.9
    table['tf'] = 0.1
    table['ft'] = 0.2
    table['ff'] = 0.8
    key = ''
    key = key + 't' if C else key + 'f'
    key = key + 't' if X else key + 'f'
    return table[key]


def Dyspnoeia_Node(C, D):
    '''Dyspnoeia'''
    table = dict()
    table['tt'] = 0.65
    table['tf'] = 0.35
    table['ft'] = 0.3
    table['ff'] = 0.7
    key = ''
    key = key + 't' if C else key + 'f'
    key = key + 't' if D else key + 'f'
    return table[key]

In [5]:
g = build_bbn(Pollution_Node, Smoker_Node, Cancer_Node, Xray_Node, Dyspnoeia_Node,domains={'P': ['low', 'high']})
g.q()

+------+-------+----------+
| Node | Value | Marginal |
+------+-------+----------+
| C    | False | 0.988370 |
| C    | True  | 0.011630 |
| D    | False | 0.695929 |
| D    | True  | 0.304070 |
| P    | high  | 0.100000 |
| P    | low   | 0.900000 |
| S    | False | 0.700000 |
| S    | True  | 0.300000 |
| X    | False | 0.791859 |
| X    | True  | 0.208141 |
+------+-------+----------+


In [6]:
g.q(P='high')

+------+-------+----------+
| Node | Value | Marginal |
+------+-------+----------+
| C    | False | 0.971000 |
| C    | True  | 0.029000 |
| D    | False | 0.689850 |
| D    | True  | 0.310150 |
| P    | low   | 0.000000 |
| P*   | [92mhigh*[0m | 1.000000 |
| S    | False | 0.700000 |
| S    | True  | 0.300000 |
| X    | False | 0.779700 |
| X    | True  | 0.220300 |
+------+-------+----------+


In [7]:
g.q(D=True)

+------+-------+----------+
| Node | Value | Marginal |
+------+-------+----------+
| C    | False | 0.975139 |
| C    | True  | 0.024861 |
| D    | False | 0.000000 |
| D*   | [92mTrue*[0m | 1.000000 |
| P    | high  | 0.101999 |
| P    | low   | 0.898001 |
| S    | False | 0.692966 |
| S    | True  | 0.307034 |
| X    | False | 0.782597 |
| X    | True  | 0.217403 |
+------+-------+----------+


In [8]:
g.q(S=True)

+------+-------+----------+
| Node | Value | Marginal |
+------+-------+----------+
| C    | False | 0.968000 |
| C    | True  | 0.032000 |
| D    | False | 0.688800 |
| D    | True  | 0.311200 |
| P    | high  | 0.100000 |
| P    | low   | 0.900000 |
| S    | False | 0.000000 |
| S*   | [92mTrue*[0m | 1.000000 |
| X    | False | 0.777600 |
| X    | True  | 0.222400 |
+------+-------+----------+


In [9]:
g.q(C=True, S=True)

+------+-------+----------+
| Node | Value | Marginal |
+------+-------+----------+
| C    | False | 0.000000 |
| C*   | [92mTrue*[0m | 1.000000 |
| D    | False | 0.350000 |
| D    | True  | 0.650000 |
| P    | high  | 0.156250 |
| P    | low   | 0.843750 |
| S    | False | 0.000000 |
| S*   | [92mTrue*[0m | 1.000000 |
| X    | False | 0.100000 |
| X    | True  | 0.900000 |
+------+-------+----------+


In [11]:
g.q(D=True,S=True,P='high',X=True)

+------+-------+----------+
| Node | Value | Marginal |
+------+-------+----------+
| C    | False | 0.660870 |
| C    | True  | 0.339130 |
| D    | False | 0.000000 |
| D*   | [92mTrue*[0m | 1.000000 |
| P    | low   | 0.000000 |
| P*   | [92mhigh*[0m | 1.000000 |
| S    | False | 0.000000 |
| S*   | [92mTrue*[0m | 1.000000 |
| X    | False | 0.000000 |
| X*   | [92mTrue*[0m | 1.000000 |
+------+-------+----------+
