## Probability Distribution Tables Representation and Inference

**COMP9418-19T3, W02 Tutorial**

- Instructor: Gustavo Batista
- School of Computer Science and Engineering, UNSW Sydney
- Questions by Gustavo Batista with inputs of Jeremy Gillen
- Last Update 22nd September at 18:00pm, 2019
$$
% macros
\newcommand{\indep}{\perp \!\!\!\perp}
$$

In this week's tutorial, we will design a data structure for probability table representation and implement four operations over this representation. This code will be used in the next tutorials to perform inference over graphical models.

## Technical prerequisites

You will need specific packages installed to run this notebook.

If you are using ``conda``'s default
[full installation](https://www.anaconda.com/distribution),
these requirements should all be satisfied already.

We will use the [tabulate](https://pypi.org/project/tabulate/) library to print probability tables for debugging. If you don't have it installed, use the command ```pip install tabulate```.

Let's import some useful modules for later use.

In [None]:
# Make division default to floating-point, saving confusion
from __future__ import division
from __future__ import print_function

# ordered dictionaries are useful for keeping ordered sets of variables
from collections import OrderedDict as odict
# combinatorics
from itertools import product
# table formating for screen output
from tabulate import tabulate

## Representing probability tables

We will represent the distributions of variables using probability tables. For example, here are 3 random variables, $X$, $Y$, and $Z$, each on $\{0,1\}$.

  | X | Y | Z | p(X,Y,Z) |
  |---|---|---|----------|
  | 0 | 0 | 0 | 0 | 
  | 0 | 0 | 1 | 1/12 | 
  | 0 | 1 | 0 | 1/12 | 
  | 0 | 1 | 1 | 1/6 | 
  | 1 | 0 | 0 | 1/12 | 
  | 1 | 0 | 1 | 1/6 | 
  | 1 | 1 | 0 | 1/6 | 
  | 1 | 1 | 1 | 1/4 | 

Another example is a table that represents a conditional distribution for, say, $p(Z|X,Y)$.

  | X | Y | Z | p(Z &#124; X,Y)        |
  |---|---|---|---------------------------|
  | 0 | 0 | 0 | 0 | 
  | 0 | 0 | 1 | 1 | 
  | 0 | 1 | 0 | 1/3 | 
  | 0 | 1 | 1 | 2/3 | 
  | 1 | 0 | 0 | 1/3 | 
  | 1 | 0 | 1 | 2/3 | 
  | 1 | 1 | 0 | 2/5 | 
  | 1 | 1 | 1 | 3/5 | 

We will use the term **factor** to denote a probability table, joint or conditional. 

The natural question is how we represent tables like these in Python. One possible convention is to store these tables in dictionaries. However, we should note that to define a factor table completely, we need to specify two pieces of information:

1. The domain of the factor, i.e., which variables belong to the factor;

2. The probabilities associated with each possible combination of variables values in the factor domain.

Therefore, we will use a nested dictionary with two keys. The key 'dom' maps to a tuple of variables in the factor domain and the key 'table' hashes to an ordered dictionary with a probability table.

For instance, the table:

| S      | P(S)   |
|:------:|:------:|
| summer | 0.5    |
| winter | 0.5    |

is represented as:

In [None]:
s_prob = {
    'dom': ('S'), 
    'table': odict([
        ((0,), 0.5),
        ((1,), 0.5),
    ])
}

Likewise, the table

|   S    | T    | P(T&#124;S)    |
|:------:|:----:|:---------:|
| summer |  hot | 0.7       |
| summer | cold | 0.3       |
| winter | hot  | 0.3       |
| winter | cold | 0.7       |

is represented as:

In [None]:
t_prob = {
    'dom': ('S', 'T'), 
    'table': odict([
        ((0, 0), 0.7),
        ((0, 1), 0.3),
        ((1, 0), 0.3),
        ((1, 1), 0.7),    
    ])
}

### Exercise

It is your turn, represent the table:

|  S     |  T   |  W   | P(W&#124;S,T)   |
|:------:|:----:|:----:|:----------:|
| summer |  hot |  sun |   0.86     |
| summer |  hot | rain |   0.14     |
| summer | cold |  sun |   0.67     |
| summer | cold | rain |   0.33     |
| winter |  hot |  sun |   0.67     |
| winter |  hot | rain |   0.33     |
| winter | cold |  sun |   0.43     |
| winter | cold | rain |   0.57     |


In [None]:
w_prob = {
    'dom': None,       # Insert the factor domain here: 1 line
    'table': odict([
                       # Insert the probability values here: 8 lines
    ])
}

In [None]:
#Answer

w_prob = {
    'dom': ('S', 'T', 'W'), 
    'table': odict([
        ((0, 0, 0), 0.86),
        ((0, 0, 1), 0.14),
        ((0, 1, 0), 0.67),
        ((0, 1, 1), 0.33),
        ((1, 0, 0), 0.67),
        ((1, 0, 1), 0.33),
        ((1, 1, 0), 0.43),
        ((1, 1, 1), 0.57),
    ])
}

We also need to specify the domain of each variable. Since this information does not belong to any particular table, but to all variables together, we will use a separate dictionary to store it.

In [None]:
outcomeSpace = dict(
    S=(0,1),
    T=(0,1),
    W=(0,1),
)

Notice that the variables domain do not need to be restricted to 0 and 1 values. We can specify larger domains with more values or even strings (such as 'summer' and 'winter' as in the exercise. We are using binary values now since they are more convenient to type.

We need to implement four basic operations over factors:
    
1. Factor join: given two factors $f_1$ and $f_2$ with at least one variable in common, join these factors, creating a new factor $f$. The domain of the new factor has all variables in $dom(f_1) \cup dom(f_2)$.

2. Factor marginalization: given a factor $f$ eliminate one variable $v \in dom(f)$ by summing over all values of $v$.

3. Evidence observation: given a variable $X$ and a value $x$, set the evidence $X=x$. This means that the variable $X$ has been observed as having the value $x$. Consequently, the join and marginalization operations will restrict themselves to $x$ and ignore the remaining values of $X$.

4. Factor normalization: normalize the entries in a given factor so that all entries sum up to one.

Before we start, let's define a function to print out factors nicely to the screen. This function will help us to debug our code. For this task, we will use the [tabulate library](https://pypi.org/project/tabulate/).

In [None]:
def printFactor(f):
    """
    argument 
    `f`, a factor to print on screen
    """
    # Create a empty list that we will fill in with the probability table entries
    table = list()
    
    # Iterate over all keys and probability values in the table
    for key, item in f['table'].items():
        # Convert the tuple to a list to be able to manipulate it
        k = list(key)
        # Append the probability value to the list with key values
        k.append(item)
        # Append an entire row to the table
        table.append(k)
    # dom is used as table header. We need it converted to list
    dom = list(f['dom'])
    # Append a 'Pr' to indicate the probabity column
    dom.append('Pr')
    print(tabulate(table,headers=dom,tablefmt='orgtbl'))
    
#####################################
# Test code
#####################################

printFactor(w_prob)

Let's start with a helper function that will simplify our code. The subroutine **prob** will return the probability associated a given factor entry.

### Exercise

It is your turn, implement the following function:

In [None]:
def prob(factor, *entry):
    """
    argument 
    `factor`, a dictionary of domain and probability table,
    `entry`, a list of values, one for each variable, in the same order as specified in the factor domain.
    
    Returns p(entry)
    """

    return None                      # insert your code here, 1 line

#####################################
# Test code
#####################################

print(prob(t_prob, 0,0))
print(prob(t_prob, 0,1))
print(prob(t_prob, 1,0))
print(prob(t_prob, 1,1))

In [None]:
#Answer

def prob(factor, *entry):
    """
    argument 
    `factor`, a dictionary of domain and probability values,
    `entry`, a list of values, one for each variable in the same order as specified in the factor domain.
    
    Returns p(entry)
    """

    return factor['table'][entry]     # insert your code here, 1 line

#####################################
# Test code
#####################################

print(prob(t_prob, 0,0))
print(prob(t_prob, 0,1))
print(prob(t_prob, 1,0))
print(prob(t_prob, 1,1))

If you implemented the prob function correctly, you should see the following output:

```
0.7
0.3
0.3
0.7
```

## Observing Evidence

Observing a value $x$ for a variable $X$ will limit the join and marginalization operations. These operations will only iterate over the observed value of $X$, ignoring the remaining ones. 

To achieve such a result, we will use a simple trick: we will replace the domain tuple of the variable $X$ by a tuple with a single entry $(x)$. 

Let's implement the evidence function. We will avoid messing with the outcomeSpace dictionary defined above by creating a copy of this dictionary. We will modify and return the copy.

### Exercise

It is your turn, implement the evidence function.

In [None]:
def evidence(var, obs, outcomeSpace):
    """
    argument 
    `var`, a valid variable identifier.
    `e`, the observed value for var.
    `outcomeSpace`, dictionary with the domain of each variable
    
    Returns dictionary with a copy of outcomeSpace with var = e
    """
    newOutcomeSpace = None              # Make a copy of outcomeSpace with a copy to method copy(). 1 line
    newOutcomeSpace[var] = None         # Replace the domain of variable var with a tuple with a single element e. 1 line
    return newOutcomeSpace

#####################################
# Test code
#####################################

print(evidence('S', 0, outcomeSpace))
print(evidence('T', 1, outcomeSpace))
print(evidence('W', 0, outcomeSpace))

In [None]:
# Answer

def evidence(var, e, outcomeSpace):
    """
    argument 
    `var`, a valid variable identifier.
    `e`, the observed value for var.
    `outcomeSpace`, dictionary with the domain of each variable
    
    Returns dictionary with a copy of outcomeSpace with var = e
    """    
    newOutcomeSpace = outcomeSpace.copy()      # Make a copy of outcomeSpace with a copy to method copy(). 1 line
    newOutcomeSpace[var] = (e,)                # Replace the domain of variable var with a tuple with a single element e. 1 line
    return newOutcomeSpace

#####################################
# Test code
#####################################

print(evidence('S', 0, outcomeSpace))
print(evidence('T', 1, outcomeSpace))
print(evidence('W', 0, outcomeSpace))

If you implemented your code correctly, you should see the following output:

```
{'S': (0,), 'T': (0, 1), 'W': (0, 1)}
{'S': (0, 1), 'T': (1,), 'W': (0, 1)}
{'S': (0, 1), 'T': (0, 1), 'W': (0,)}
```

## Factor Join Operation

The central operation of inference is the factor multiplication or join. This operation will collapse in a single factor. This operation should carefully match the values of the variables to provide the correct output.

### Exercise

Let's implement the function join. We will provide most of the code for you. You will need to fill in a few gaps to complete the implementation. To simplify the code, we use the [product iterator provided by itertools](https://docs.python.org/2/library/itertools.html). In summary, this operator will generate all the possible combinations of variable values for a given factor.

In [None]:
def join(f1, f2, outcomeSpace):
    """
    argument 
    `f1`, first factor to be joined.
    `f2`, second factor to be joined.
    `outcomeSpace`, dictionary with the domain of each variable
    
    Returns a new factor with a join of f1 and f2
    """
    
    # First, we need to determine the domain of the new factor. It will be union of the domain in f1 and f2
    # But it is important to eliminate the repetitions
    common_vars = list(f1['dom']) + list(set(f2['dom']) - set(f1['dom']))
    
    # We will build a table from scratch, starting with an empty list. Later on, we will transform the list into a odict
    table = list()
    
    # Here is where the magic happens. The product iterator will generate all combinations of varible values 
    # as specified in outcomeSpace. Therefore, it will naturally respect observed values
    for entries in product(*[outcomeSpace[node] for node in common_vars]):
        
        # We need to map the entries to the domain of the factors f1 and f2
        entryDict = dict(zip(common_vars, entries))
        f1_entry = (entryDict[var] for var in f1['dom'])
        f2_entry = (entryDict[var] for var in f2['dom'])
        
        #########################
        # Insert your code here #
        #########################
        p1 = None        # Use the fuction prob to calculate the probability in factor f1 for entry f1_entry. 1 line.
        p2 = None        # Use the fuction prob to calculate the probability in factor f2 for entry f2_entry. 1 line.
        
        # Create a new table entry with the multiplication of p1 and p2
        table.append((entries, p1 * p2))
    return {'dom': tuple(common_vars), 'table': odict(table)}

#####################################
# Test code
#####################################

printFactor(join(s_prob, t_prob, outcomeSpace))
o = evidence('S', 1, outcomeSpace)
print()
printFactor(join(s_prob, t_prob, o))

In [None]:
# Answer

def join(f1, f2, outcomeSpace):
    """
    argument 
    `f1`, first factor to be joined.
    `f2`, second factor to be joined.
    `outcomeSpace`, dictionary with the domain of each variable
    
    Returns a new factor with a join of f1 and f2
    """
    
    # First, we need to determine the domain of the new factor. It will be union of the domain in f1 and f2
    # But it is important to eliminate the repetitions
    common_vars = list(f1['dom']) + list(set(f2['dom']) - set(f1['dom']))
    
    # We will build a table from scratch, starting with an empty list. Later on, we will transform the list into a odict
    table = list()
    
    # Here is where the magic happens. The product iterator will generate all combinations of varible values 
    # as specified in outcomeSpace. Therefore, it will naturally respect observed values
    for entries in product(*[outcomeSpace[node] for node in common_vars]):
        
        # We need to map the entries to the domain of the factors f1 and f2
        entryDict = dict(zip(common_vars, entries))
        f1_entry = (entryDict[var] for var in f1['dom'])
        f2_entry = (entryDict[var] for var in f2['dom'])
        
        # Insert your code here
        p1 = prob(f1, *f1_entry)           # Use the fuction prob to calculate the probability in factor f1 for entry f1_entry 
        p2 = prob(f2, *f2_entry)           # Use the fuction prob to calculate the probability in factor f2 for entry f2_entry 
        
        # Create a new table entry with the multiplication of p1 and p2
        table.append((entries, p1 * p2))
    return {'dom': tuple(common_vars), 'table': odict(table)}

#####################################
# Test code
#####################################

printFactor(join(s_prob, t_prob, outcomeSpace))
o = evidence('S', 1, outcomeSpace)
print()
printFactor(join(s_prob, t_prob, o))

If you implemented the join operation correctly, you should see the following output:

```
|   S |   T |   Pr |
|-----+-----+------|
|   0 |   0 | 0.35 |
|   0 |   1 | 0.15 |
|   1 |   0 | 0.15 |
|   1 |   1 | 0.35 |

|   S |   T |   Pr |
|-----+-----+------|
|   1 |   0 | 0.15 |
|   1 |   1 | 0.35 |
```

## Factor Marginalization Operation

Marginalization is the operation that eliminates a given variable $X$ from a factor $f$ by summing over all possible values of $X$ in $f$. The marginalize function will return a new factor $f'$ to avoid messing with existing factors. The new factor $f'$ will have the same domain as $f$, but with the elimination of the variable $X$ ($dom(f') = dom(f) - \{X\}$).

### Exercise

Let's implement the marginalize function. We will provide most of the code, and you will fill in a few gaps.

In [None]:
def marginalize(f, var, outcomeSpace):
    """
    argument 
    `f`, factor to be marginalized.
    `var`, variable to be summed out.
    `outcomeSpace`, dictionary with the domain of each variable
    
    Returns a new factor f' with dom(f') = dom(f) - {var}
    """    
    
    # Let's make a copy of f domain and convert it to a list. We need a list to be able to modify its elements
    new_dom = list(f['dom'])
    
    #########################
    # Insert your code here #
    #########################
    None                           # Remove var from the list new_dom by calling the method remove(). 1 line
    table = None                   # Create an empty list for table. We will fill in table from scratch. 1 line
    for entries in product(*[outcomeSpace[node] for node in new_dom]):
        s = None;                  # Initialize the summation variable s. 1 line
        
        
        # We need to iterate over all possible outcomes of the variable var
        for val in outcomeSpace[var]:
            # To modify the tuple entries, we will need to convert it to a list
            entriesList = list(entries)
            # We need to insert the value of var in the right position of entriesList
            entriesList.insert(f['dom'].index(var), val)
            

            #########################
            # Insert your code here #
            #########################
            
            p = None                             # Calculate the probability of factor f for entriesList. 1 line
            s = None                             # Sum over all values of var by accumulating the sum in s. 1 line
            
        # Create a new table entry with the multiplication of p1 and p2
        table.append((entries, s))
    return {'dom': tuple(new_dom), 'table': odict(table)}

#####################################
# Test code
#####################################

t_s_joint_prob = join(s_prob, t_prob, outcomeSpace)
f = marginalize(t_s_joint_prob, 'S', outcomeSpace)
printFactor(f)

In [None]:
# Answer

def marginalize(f, var, outcomeSpace):
    """
    argument 
    `f`, factor to be marginalized.
    `var`, variable to be summed out.
    `outcomeSpace`, dictionary with the domain of each variable
    
    Returns a new factor f' with dom(f') = dom(f) - {var}
    """    
    
    # Let's make a copy of f domain and convert it to a list. We need a list to be able to modify its elements
    new_dom = list(f['dom'])
    
    #########################
    # Insert your code here #
    #########################
    new_dom.remove(var)            # Remove var from the list new_dom by calling the method remove(). 1 line
    table = list()                 # Create an empty list for table. We will fill in table from scratch. 1 line
    for entries in product(*[outcomeSpace[node] for node in new_dom]):
        s = 0;                     # Initialize the summation variable s. 1 line

        # We need to iterate over all possible outcomes of the variable var
        for val in outcomeSpace[var]:
            # To modify the tuple entries, we will need to convert it to a list
            entriesList = list(entries)
            # We need to insert the value of var in the right position of entriesList
            entriesList.insert(f['dom'].index(var), val)
            
            p = prob(f, *entriesList)     # Calculate the probability of factor f for entriesList. 1 line
            s = s + p                            # Sum over all values of var by accumulating the sum in s. 1 line
            
        # Create a new table entry with the multiplication of p1 and p2
        table.append((entries, s))
    return {'dom': tuple(new_dom), 'table': odict(table)}

#####################################
# Test code
#####################################

t_s_joint_prob = join(s_prob, t_prob, outcomeSpace)
f = marginalize(t_s_joint_prob, 'S', outcomeSpace)
printFactor(f)

If you implemented the join operation correctly, you should see the following output:

```
|   T |   Pr |
|-----+------|
|   0 |  0.5 |
|   1 |  0.5 |
```

## Factor Normalization Operation

Factor normalization is useful when we make inference using evidence, since the resulting factor may not sum to one. To renormalize the factor to make it represent a probability distribution. Normalization is a simple, operation: we need to sum over all entries resulting in the value $Z$, and divide each factor entry by $Z$.

### Exercise

It is your turn. This time you will code the normalization function entirely. We have provided a stub for you.

In [None]:
def normalize(f):
    """
    argument 
    `f`, factor to be normalized.
    
    Returns a new factor f' as a copy of f with entries that sum up to 1
    """ 
    return None

#####################################
# Test code
#####################################

o = evidence('W', 1, outcomeSpace)
t_s_joint_prob = join(s_prob, t_prob, o)
t_s_w_joint_prob = join(t_s_joint_prob, w_prob, o)
printFactor(normalize(t_s_w_joint_prob))

In [None]:
# Answer

def normalize(f):
    """
    argument 
    `f`, factor to be normalized.
    
    Returns a new factor f' as a copy of f with entries that sum up to 1
    """ 
    table = list()
    sum = 0
    for k, p in f['table'].items():
        sum = sum + p
    for k, p in f['table'].items():
        table.append((k, p/sum))
    return {'dom': f['dom'], 'table': odict(table)}

#####################################
# Test code
#####################################

o = evidence('W', 1, outcomeSpace)
t_s_joint_prob = join(s_prob, t_prob, o)
t_s_w_joint_prob = join(t_s_joint_prob, w_prob, o)
printFactor(normalize(t_s_w_joint_prob))

The expected output for the normalize function in the test case is the following:

```
|   S |   T |   W |       Pr |
|-----+-----+-----+----------|
|   0 |   0 |   1 | 0.141007 |
|   0 |   1 |   1 | 0.142446 |
|   1 |   0 |   1 | 0.142446 |
|   1 |   1 |   1 | 0.574101 |
```

We have reached the end of this tutorial. Now we have all the tools we need to start making inference on Graphical Models. Also, you can use this code to check the results of your calculations in the theory part of this tutorial. For instance, in **Question 6**, we asked for the joint probability table $P(T,S,W)$. We can calculate this with a single line of python code.

In [None]:
printFactor(join(join(t_prob, s_prob, outcomeSpace), w_prob, outcomeSpace))