# Bayesian Network for Thyroid Nodule Identification

In [1]:
from pyBN import *
bn = read_bn('data/TN.bn')

{'V': ['Microcalcification', 'Cystic Aspect', 'Cancer', 'Do Operation', 'Keep Watching'], 'E': {'Microcalcification': ['Cancer'], 'Cystic Aspect': ['Cancer'], 'Cancer': ['Do Operation', 'Keep Watching'], 'Do Operation': [], 'Keep Watching': []}, 'F': {'Microcalcification': {'values': ['No', 'Yes'], 'parents': [], 'cpt': [0.999, 0.001]}, 'Cystic Aspect': {'values': ['No', 'Yes'], 'parents': [], 'cpt': [0.998, 0.002]}, 'Cancer': {'values': ['No', 'Yes'], 'parents': ['Cystic Aspect', 'Microcalcification'], 'cpt': [0.999, 0.001, 0.71, 0.29, 0.06, 0.94, 0.05, 0.95]}, 'Do Operation': {'values': ['No', 'Yes'], 'parents': ['Cancer'], 'cpt': [0.95, 0.05, 0.1, 0.9]}, 'Keep Watching': {'values': ['No', 'Yes'], 'parents': ['Cancer'], 'cpt': [0.99, 0.01, 0.3, 0.7]}}}


In [2]:
print(bn.V)
print(bn.E)

['Microcalcification', 'Cystic Aspect', 'Cancer', 'Do Operation', 'Keep Watching']
{'Microcalcification': ['Cancer'], 'Cystic Aspect': ['Cancer'], 'Cancer': ['Do Operation', 'Keep Watching'], 'Do Operation': [], 'Keep Watching': []}


As you can see, we have a Bayesian network with 5 nodes and some edges between them. 

In [3]:
alarm_factor = Factor(bn,'Cancer')

Now that we have a factor, we can explore its properties. Every factor has the following attributes:

    *self.bn* : a BayesNet object

    *self.var* : a string
        The random variable to which this Factor belongs
    
    *self.scope* : a list
        The RV, and its parents (the RVs involved in the
        conditional probability table)
    
    *self.card* : a dictionary, where
        key = an RV in self.scope, and
        val = integer cardinality of the key (i.e. how
            many possible values it has)
    
    *self.stride* : a dictionary, where
        key = an RV in self.scope, and
        val = integer stride (i.e. how many rows in the 
            CPT until the NEXT value of RV is reached)
    
    *self.cpt* : a nested numpy array
        The probability values for self.var conditioned
        on its parents

In [4]:
print(alarm_factor.bn)
print(alarm_factor.var)
print(alarm_factor.scope)
print(alarm_factor.card)
print(alarm_factor.stride)
print(alarm_factor.cpt)

<pyBN.classes.bayesnet.BayesNet object at 0x10a9d5c18>
Cancer
['Cancer', 'Cystic Aspect', 'Microcalcification']
{'Cancer': 2, 'Cystic Aspect': 2, 'Microcalcification': 2}
{'Cancer': 1, 'Cystic Aspect': 2, 'Microcalcification': 4}
[ 0.999  0.001  0.71   0.29   0.06   0.94   0.05   0.95 ]


methods explanation:

    *multiply_factor*
        Multiply two factors together. The factor
        multiplication algorithm used here is adapted
        from Koller and Friedman (PGMs) textbook.

    *sumover_var* :
        Sum over one *rv* by keeping it constant. Thus, you 
        end up with a 1-D factor whose scope is ONLY *rv*
        and whose length = cardinality of rv. 

    *sumout_var_list* :
        Remove a collection of rv's from the factor
        by summing out (i.e. calling sumout_var) over
        each rv.

    *sumout_var* :
        Remove passed-in *rv* from the factor by summing
        over everything else.

    *maxout_var* :
        Remove *rv* from the factor by taking the maximum value 
        of all rv instantiations over everyting else.

    *reduce_factor_by_list* :
        Reduce the factor by numerous sets of
        [rv,val]

    *reduce_factor* :
        Condition the factor by eliminating any sets of
        values that don't align with a given [rv, val]

    *to_log* :
        Convert probabilities to log space from
        normal space.

    *from_log* :
        Convert probabilities from log space to
        normal space.

    *normalize* :
        Make relevant collections of probabilities sum to one.

Here is a Factor Multiplication:

In [5]:
import numpy as np
f1 = Factor(bn,'Cancer')
f2 = Factor(bn,'Microcalcification')
f1.multiply_factor(f2)

f3 = Factor(bn,'Microcalcification')
f4 = Factor(bn,'Cancer')
f3.multiply_factor(f4)

print(np.round(f1.cpt,3))
print('\n',np.round(f3.cpt,3))

[ 0.998  0.001  0.709  0.29   0.     0.001  0.     0.001]

 [ 0.998  0.001  0.709  0.29   0.     0.001  0.     0.001]


Here is "sumover_var":

In [6]:
f = Factor(bn,'Cancer')
print(f.cpt)
print(f.scope)
print(f.stride)
f.sumover_var('Microcalcification')
print('\n',f.cpt)
print(f.scope)
print(f.stride)

[ 0.999  0.001  0.71   0.29   0.06   0.94   0.05   0.95 ]
['Cancer', 'Cystic Aspect', 'Microcalcification']
{'Cancer': 1, 'Cystic Aspect': 2, 'Microcalcification': 4}

 [ 2.  2.]
['Microcalcification']
{'Microcalcification': 1}


Here is a look at "sumout_var", which is essentially the opposite of "sumover_var":

In [7]:
f = Factor(bn,'Cancer')
f.sumout_var('Cystic Aspect')
print(f.stride)
print(f.scope)
print(f.card)
print(f.cpt)

{'Cancer': 1, 'Microcalcification': 2.0}
['Cancer', 'Microcalcification']
{'Cancer': 2, 'Microcalcification': 2}
[ 1.709  0.291  0.11   1.89 ]


Additionally, you can sum over a LIST of variables with "sumover_var_list". Notice how summing over every variable in the scope except for ONE variable is equivalent to summing over that ONE variable:

In [8]:
f = Factor(bn,'Cancer')
print(f.cpt)
f.sumout_var_list(['Microcalcification','Cystic Aspect'])
print(f.scope)
print(f.stride)
print(f.cpt)

f1 = Factor(bn,'Cancer')
print('\n',f1.cpt)
f1.sumover_var('Cancer')
print(f1.scope)
print(f1.stride)
print(f1.cpt)

[ 0.999  0.001  0.71   0.29   0.06   0.94   0.05   0.95 ]
['Cancer']
{'Cancer': 1}
[ 1.819  2.181]

 [ 0.999  0.001  0.71   0.29   0.06   0.94   0.05   0.95 ]
['Cancer']
{'Cancer': 1}
[ 1.819  2.181]


Even more, you can use "maxout_var" to take the max values over a variable in the factor. This is a fundamental operation in Max-Sum Variable Elimination for MAP inference. Notice how the variable being maxed out is removed from the scope because it is conditioned upon and thus taken as truth in a sense.

In [9]:
f = Factor(bn,'Cancer')
print(f.scope)
print(f.cpt)
f.maxout_var('Microcalcification')
print('\n', f.scope)
print(f.cpt)

['Cancer', 'Cystic Aspect', 'Microcalcification']
[ 0.999  0.001  0.71   0.29   0.06   0.94   0.05   0.95 ]

 ['Cancer', 'Cystic Aspect']
[ 0.999  0.94   0.71   0.95 ]


Moreover, you can also use "reduce_factor" to reduce a factor based on evidence. This is different from "sumover_var" because "reduce_factor" is not summing over anything, it is simply removing any 
        parent-child instantiations which are not consistent with
        the evidence. Moreover, there should not be any need for
        normalization because the CPT should already be normalized
        over the rv-val evidence (but we do it anyways because of
        rounding). This function is essential when user's pass in evidence to any inference query.

In [10]:
f = Factor(bn, 'Cancer')
print(f.scope)
print(f.cpt)
f.reduce_factor('Microcalcification','Yes')
print('\n', f.scope)
print(f.cpt)

['Cancer', 'Cystic Aspect', 'Microcalcification']
[ 0.999  0.001  0.71   0.29   0.06   0.94   0.05   0.95 ]

 ['Cancer', 'Cystic Aspect']
[ 0.06  0.94  0.05  0.95]


Another piece of functionality is the capability to convert the factor probabilities to/from log-space. This is important for MAP inference, since the sum of log-probabilities is equal the product of normal probabilities

In [11]:
f = Factor(bn,'Cancer')
print(f.cpt)
f.to_log()
print(np.round(f.cpt,2))
f.from_log()
print(f.cpt)

[ 0.999  0.001  0.71   0.29   0.06   0.94   0.05   0.95 ]
[-0.   -6.91 -0.34 -1.24 -2.81 -0.06 -3.   -0.05]
[ 0.999  0.001  0.71   0.29   0.06   0.94   0.05   0.95 ]


Lastly, we have normalization. This function does most of its work behind the scenes because it cleans up the factor probabilities after multiplication or reduction. Still, it's an important function of which users should be aware.

In [12]:
f = Factor(bn, 'Cancer')
print(f.cpt)
f.cpt[0]=20
f.cpt[1]=20
f.cpt[4]=0.94
f.cpt[7]=0.15
print(f.cpt)
f.normalize()
print(f.cpt)

[ 0.999  0.001  0.71   0.29   0.06   0.94   0.05   0.95 ]
[ 20.    20.     0.71   0.29   0.94   0.94   0.05   0.15]
[ 0.5         0.5         0.70999996  0.29000004  0.5         0.5
  0.25000025  0.74999975]
