# Cluster Graph Belief Propagation

## About this Notebook

This notebook is a demonstration of the cluster graph belief propagation algorithm, implemented in Python 3.5 **without any libraries for probability calculations** in order to not hide any important information processing. The [pandas library](http://pandas.pydata.org/) is used to generally model a tabular distrubution and additional functions are added to provide computations that are relevant for probability theory.

The programming here was done without specifying any classes. It rather relies on python's basic functional programming abilities in order to emphasize the data transformations that are going on in the algorithm. Thus this notebook does not provide a library in any way, but rather a tutorial on how Cluster Graph Belief Propagation works in practice and how it could be implemented with basic python code.

Since we will need to call *reduce()* a couple of times for the functional approach and this is not included anymore by default in Python 3.5, we import it again.

In [15]:
from functools import reduce

## Modeling Factors

The easiest way to represent factors for this example is as distribution tables over a set of variables. The values do not have to correspond to probabilities, since the algorithm works also for markov random fields. Below is an example of how such a table over the variables A and B could look like.

| a | b | $\phi$(A=a, B=b) |
|---|---|---          |
| $a_0$ | $b_0$ | 10  |
| $a_0$ | $b_1$ | 0.1 |
| $a_1$ | $b_0$ | 0.1 |
| $a_1$ | $b_1$ | 5   |

### Variables

For this example, we use the following discrete (even binary in this case) random variables:

In [1]:
variables = {
    'A': [0, 1],
    'B': [0, 1],
    'C': [0, 1],
    'D': [0, 1]
}

### Creating Tables
In order to create a couple of these tables and work with them, we make a function which creates an empty table over a set of variables. The additional parameters come in handy later. 

In [3]:
import pandas as pd

def make_empty_table(variables, subset_keys = None, fill_value = 0):
    """Creates an empty table-dataframe with rows for all variables.
    
        variables (dict): lists of values for every variable
        subset_keys: list of variables to include in the table
        fill_value: initialization value
        
        returns: empty table-dataframe with a row for every variable combination
    """
    
    # if list of subset is set empty: return table with only one entry
    if subset_keys == []:
        return pd.DataFrame(fill_value, index = [0], columns = ['value'])
    
    # filter variable subset
    if subset_keys:
        variables = {key: variables[key] for key in subset_keys}
    
    # create a new pandas dataframe
    # one row for every combination of variable values in the subset
    # (by taking the cartesian product)
    varnames = sorted(variables.keys())
    varvalues = [variables[var] for var in varnames]
    i = pd.MultiIndex.from_product(varvalues, names = varnames)
    df = pd.DataFrame(fill_value, index = i, columns = ['value']).reset_index()
    return df

The underlying usage of pandas dataframes allow for some complex use cases: Here are some examples.

Note: the numbers in the first column are the row indices that pandas generates automatically.

In [4]:
make_empty_table(variables, ['A', 'B'])

Unnamed: 0,A,B,value
0,0,0,0
1,0,1,0
2,1,0,0
3,1,1,0


In [5]:
make_empty_table(
    variables = {
        'A': ['a0', 'a1', 'a2'],
        'B': [0, 1],
        'C': [0, 1, 2]}, 
    subset_keys =['A', 'B'],
    fill_value = 42)

Unnamed: 0,A,B,value
0,a0,0,42
1,a0,1,42
2,a1,0,42
3,a1,1,42
4,a2,0,42
5,a2,1,42


In [6]:
make_empty_table(variables, [], 1)

Unnamed: 0,value
0,1


### Accessing Values

Usual queries for distributions look like, e.g. **P(A=0, B=1)**, but pandas dataframes are not designed to allow for such descriptive access. In oder to make reading and writing values more intuitive, we need additional function wrappers.

#### Reading Variable Names

The first step is to access the actual random variables in a table-dataframe:

In [7]:
def column_varnames(columns):
    """Extract the variable names from a columns object.
    
        columns: a dataframe-columns object
        
        returns: a list of variable names
    """
    
    return sorted([var for var in columns if var != 'value'])



def table_varnames(tab):
    """Extract the variable names from a table-dataframe.
    
        tab: a table-dataframe
        
        returns: a list of variable names
    """
    
    return column_varnames(tab.columns)

Here are some examples of the usage:

In [9]:
tmp_tab = make_empty_table(variables, ['A', 'B'])
tmp_tab

Unnamed: 0,A,B,value
0,0,0,0
1,0,1,0
2,1,0,0
3,1,1,0


In [10]:
table_varnames(tmp_tab)

['A', 'B']

In [11]:
tmp_tab.columns

Index(['A', 'B', 'value'], dtype='object')

In [12]:
column_varnames(tmp_tab.columns)

['A', 'B']

#### Selecting Rows

The next step for accessing values is to get an actual reference on the values in the pandas dataframe. We want to access all rows, that contain a given assignment. So we first need to figure our for every row, if we actually want to include it. Since common variable assignments work as key-value-pair, we use a dictionary here.

In [18]:
def get_row_bools(tab, assignment):
    """For a given table and assignment, check which rows fit.
    
        tab: table-dataframe
        assignment (dict): variable assignment
        
        returns: table with booleans, indicating if the rows fit the assignment
    """
    
    # look at the table column-wise and check for every column, which rows to
    # select for the given assignment
    row_bools_each_column = [
        tab[v] == assignment[v]
        for v in table_varnames(tab)
        if v in assignment.keys()
    ]
    
    # reduce the list for all columns with AND, yields the booleans for the
    # complete assignment
    return reduce(lambda x,y: x & y, row_bools_each_column)

Again an example, of what the function returns for a given table:

In [19]:
tmp_tab

Unnamed: 0,A,B,value
0,0,0,0
1,0,1,0
2,1,0,0
3,1,1,0


In [20]:
get_row_bools(tmp_tab, assignment={'A': 0, 'B': 1})

0    False
1     True
2    False
3    False
dtype: bool

It is also possible to specify incomplete assignments, resulting in a set of rows being selected:

In [22]:
get_row_bools(tmp_tab, assignment={'A': 0})

0     True
1     True
2    False
3    False
Name: A, dtype: bool

#### Actual Access

Now we can define a function to write values for a given variable assignment. This works with the pandas lable-indexer **.loc()**.

In [28]:
def set_assignment(tab, assignment, value):
    """Write values in table for variable assignment.
        
        tab: the table-dataframe
        assignment (dict): the variable assignment
        value: the value to put into all selected rows
    """
    tab.loc[get_row_bools(tab, assignment), 'value'] = value

In [29]:
tmp_tab

Unnamed: 0,A,B,value
0,0,0,0
1,0,1,42
2,1,0,0
3,1,1,0


In [30]:
set_assignment(tmp_tab, {'A': 0, 'B': 1}, 42)
tmp_tab

Unnamed: 0,A,B,value
0,0,0,0
1,0,1,42
2,1,0,0
3,1,1,0


In [31]:
set_assignment(tmp_tab, {'A': 1}, -1)
tmp_tab

Unnamed: 0,A,B,value
0,0,0,0
1,0,1,42
2,1,0,-1
3,1,1,-1
