# Day 23 notebook

The objectives of this notebook are to practice

* sampling from a Bayesian network
* computing the log probability of a configuration for a Bayesian network

## Modules used for this assignment

In [2]:
# standard library modules
import random  # for sampling
import math    # for log

# course modules
import graph         # for directed graph implementations 
import random_sample # for sample_categorical

## A Bayesian network class
In this activity, we will fill in the implementation of a simple Bayesian network class, which will support modeling a set of discrete random variables, with conditional probability distributions (CPDs) represented by simple tables (implemented as dictionaries).  As Bayesian networks are directed acyclic graphs and often require graph algorithms for their computations, this class inherits from some of the graph classes that we developed earlier in the semester.  Below is a skeleton for this class.

In [3]:
class BayesianNetwork(graph.AdjacencyListDirectedGraph, graph.VertexLabeledDirectedGraph):
    def __init__(self, random_variable_names):
        """Inits a BayesianNetwork
        
        The initial network will have a set of variables/vertices labeled by the given names
        but without any edges or CPDs.  set_cpd should be called for each variable to 
        specify the edges and CPDs of the network.
        """
        # Call the parent graph class constructor, giving it the number of vertices
        super().__init__(len(random_variable_names))
        
        # Set the labels of the vertices to the names of the variables
        for i, name in enumerate(random_variable_names): self.set_vertex_label(i, name)
        
        # The cpds member variable is a list of the conditional probability distributions
        # for the variables.  self.cpds[i] will be the CPD for the ith random variable.
        self.cpds = [None] * len(random_variable_names)
        
        # The possible_values member variable is a list of lists of the possible values
        # that each variable may take.  self.possible_values[i] will be a list of the 
        # possible values for the ith random variable.
        self.possible_values = [None] * len(random_variable_names)
    
    def set_cpd(self, name, parent_names, possible_values, cpd=None):
        """Sets the CPD for a random variable in the network.
        
        The CPDs for the parents of the named random variables should be set prior to this call.
        Args:
            name: the name (a string) of the random variable
            parent_names: a list of the names of the parents of the random variable 
            possible_values: a list of the values that this random variable may take on
            cpd: the CPD for the random variable.  The CPD should be a dictionary mapping
                tuples of parent variable values to conditional probability distributions
                over the possible values.  Each conditional probability distribution should 
                be a list of probabilities for a categorial distribution over the possible 
                values.  If the random variable has no parents, the CPD dictionary should have
                a single entry, with the key being the empty tuple and the value being the 
                distribution over the variables possible values.
        """
        # convert from variable names to vertex indices
        i = self.vertex_index(name)
        parent_indices = [self.vertex_index(name) for name in parent_names]
        
        # add edges to the graph
        for j in parent_indices:
            self.add_edge(j, i)

        # check that the CPDs for the parents have been set already
        # this is important because we need to know the possible values for the parents
        # in order to encode the CPD for this random variable
        assert None not in [self.possible_values[j] for j in parent_indices], "Parent CPDs need to be set first"
        
        # Set the possible values for this random variable
        self.possible_values[i] = possible_values[:]

        # Set the CPD for this random variable, converting it to use encoded parent values
        self.cpds[i] = {self.encode_values(parent_values, parent_indices): distribution[:]
                        for parent_values, distribution in cpd.items()}

    def encode_values(self, values, variable_indices=None):
        """Encodes variable values by their indices into the possible values lists.
        
        Args:
            values: a list of variable values
            variable_indices: a list of variable indices corresponding to the values list.  If this
                is None, it is assumed that values is a configuration of all variables in the network.
        Returns:
            A list of encoded variable values.
        """
        indices = variable_indices if variable_indices else range(len(values))
        return tuple(self.possible_values[i].index(value) for i, value in zip(indices, values))

    def decode_values(self, encoded_values):
        """Decodes variable values from indices (the inverse of encode_values).
        
        Args:
            encoded_values: a list of encoded variable values, one per variable in the network.
        Returns:
            A list of variable values.
        """
        return tuple(self.possible_values[i][encoded_values[i]] for i in range(len(encoded_values)))

    def log_probability(self, values):
        """Computes the log joint probability of a single configuration of values for the random variables.
        
        Args:
            values: a list of observed values for the random variables, one value per variable, with
                the ith entry giving the value for the ith random variable (vertex)
        """
        ### YOUR CODE HERE      
        encoded_values = self.encode_values(values)
        log_prob = 0
        for i, value in enumerate(encoded_values):
            parent_values = tuple(encoded_values[j] for j in self.parents(i))
            log_prob += math.log(self.cpds[i][parent_values][value])
        return log_prob
        ###

    def sample(self):
        """Samples a single configuration of values for the random variables in the network.
        
        Returns:
            A list of sampled values for the random variables, one value per variable, with the
            the ith entry giving the value for the ith random variable (vertex).
        """
        ###
        ### YOUR CODE HERE
        values = [None] * self.num_vertices()
        for i in self.topological_order():
            parent_values = tuple(values[j] for j in self.parents(i))
            values[i] = random_sample.sample_categorical(self.cpds[i][parent_values])
        return self.decode_values(values)
        ###


## Example Bayesian networks

### A simple Bayesian network

Here is a simple Bayesian network that we will use for testing.

In [4]:
simple_random_variables = ["airline", "weather", "flight_status"]
simple_network = BayesianNetwork(simple_random_variables)

simple_network.set_cpd("airline",
                        [], ["United", "Delta"],
                       {(): [     0.7,     0.3]})
simple_network.set_cpd("weather",
                        [], ["sun", "rain", "snow"],
                       {(): [  0.5,    0.3,    0.2]})
simple_network.set_cpd("flight_status",
                        ["airline", "weather"], ["on-time", "delayed"],
                       {("United",      "sun"): [      0.8,       0.2],
                        ("United",     "rain"): [      0.5,       0.5],
                        ("United",     "snow"): [      0.1,       0.9],
                        ( "Delta",      "sun"): [      0.9,       0.1],
                        ( "Delta",     "rain"): [      0.4,       0.6],
                        ( "Delta",     "snow"): [      0.2,       0.8]})

The VertexLabeledDirectedGraph class now has a `plot` method that we can use to visualize this Bayesian network and verify that we have set it up correctly.

In [5]:
simple_network.plot()

### A lac operon Bayesian network
We will use this simple Bayesian network class to model the lac operon regulatory network, which we discussed in the lectures.  The CPDs specified are the same as those in the Day 23 quiz.

In [6]:
lac_operon_random_variables = ["L", "I", "G", "C", "lacI-unbound", "CAP-bound", "Z"]
lac_operon_network = BayesianNetwork(lac_operon_random_variables)

lac_operon_network.set_cpd("L",
                            [], ["absent", "present"],
                           {(): [     0.9,       0.1]})

lac_operon_network.set_cpd("I",
                            [], ["absent", "present"],
                           {(): [     0.1,       0.9]})

lac_operon_network.set_cpd("G",
                            [], ["absent", "present"],
                           {(): [     0.5,       0.5]})

lac_operon_network.set_cpd("C",
                            [], ["absent", "present"],
                           {(): [     0.1,       0.9]})

lac_operon_network.set_cpd("lacI-unbound",
                            [      "L",       "I"], ["true", "false"],
                           {( "absent",  "absent"): [   0.9,     0.1],
                            ( "absent", "present"): [   0.1,     0.9],
                            ("present",  "absent"): [   0.9,     0.1],
                            ("present", "present"): [   0.9,     0.1]})

lac_operon_network.set_cpd("CAP-bound",
                            [     "G",        "C"], ["true", "false"],
                           {( "absent",  "absent"): [   0.1,     0.9],
                            ( "absent", "present"): [   0.9,     0.1],
                            ("present",  "absent"): [   0.1,     0.9],
                            ("present", "present"): [   0.2,     0.8]})

lac_operon_network.set_cpd("Z",
                            ["lacI-unbound", "CAP-bound"], ["absent", "low", "high"],
                           {(        "true",     "false"): [     0.1,   0.8,    0.1],
                            (        "true",      "true"): [     0.1,   0.1,    0.8],
                            (       "false",     "false"): [     0.8,   0.1,    0.1],
                            (       "false",      "true"): [     0.8,   0.1,    0.1]})

In [7]:
lac_operon_network.plot()

## PROBLEM 1: Computing the log probability of a configuration for a Bayesian network (2 POINTS)

Implement the `log_probability` method of the `BayesianNetwork` class above, which computes the log joint probability of a single configuration of the random variables in the network.  By "configuration" we are referring to a set of observed values for the random variables, one value per variable.  To avoid numerical underflow errors, you should compute this probability by summing log-transformed probabilities.

*Hint: for each variable, you will need to look up the entry in its CPD that corresponds to the values of its parents.  Note that the DirectedGraph class has a `parents` method that allows you to obtain the indices of the parents of a given vertex (variable).*

In [8]:
# test log_probability (simple network)
simple_config1 = ('United', 'rain', 'on-time')
assert round(simple_network.log_probability(simple_config1), 2) == -2.25
simple_config2 = ('Delta', 'sun', 'on-time')
assert round(simple_network.log_probability(simple_config2), 2) == -2.0
print("SUCCESS: log_probability (simple) passed all tests")

SUCCESS: log_probability (simple) passed all tests


In [9]:
# test log_probability (lac operon)
config1 = ('absent', 'present', 'present', 'present', 'false', 'false', 'absent')
assert round(lac_operon_network.log_probability(config1), 2) == -1.56
config2 = ('absent', 'present', 'absent', 'present', 'true', 'false', 'low')
assert round(lac_operon_network.log_probability(config2), 2) == -5.84
print("SUCCESS: log_probability (lac operon) passed all tests")

SUCCESS: log_probability (lac operon) passed all tests


## PROBLEM 2: Sample a configuration from a Bayesian network (2 POINTS)

Implement the `sample` method of the `BayesianNetwork` class above, which samples a single configuration from the joint probability distribution represented by the network.  This should be accomplished by sampling a single value for each variable from its CPD, given values for its parents.  This requires that the values for the parents of a random variable be sampled before sampling that variable.  To do this, you should traverse the vertices (random variables) of the network in a *topological* order, which is an ordering of the vertices such that all parents of a vertex come before it in the ordering.  You should use the `topological_order` method of the base `DirectedGraph` class to obtain such an ordering.  You should also use the `random_sample.sample_categorical` function (which we have used many times before) to obtain a sample from a discrete probability distribution.

In [10]:
# test for sample (simple)
random.seed(42)
assert simple_network.sample() == ('United', 'rain', 'on-time')
random.seed(1)
assert simple_network.sample() == ('Delta', 'sun', 'on-time')
print("SUCCESS: sample (simple) passed all tests")

SUCCESS: sample (simple) passed all tests


In [11]:
# test for sample (lac operon)
random.seed(42)
assert lac_operon_network.sample() == ('absent', 'present', 'absent', 'present', 'false', 'true', 'low')
random.seed(1)
assert lac_operon_network.sample() == ('absent', 'present', 'present', 'present', 'false', 'false', 'absent')
print("SUCCESS: sample (lac operon) passed all tests")

SUCCESS: sample (lac operon) passed all tests
