# Cluster Graph Belief Propagation

## About this notebook

This notebook is a demonstration of the cluster graph belief propagation algorithm, implemented in Python 3.5 **without any libraries for probability calculations** in order to not hide any important information processing. The [pandas library](http://pandas.pydata.org/) is used to generally model a tabular distrubution and additional functions are added to provide computations that are relevant for probability theory.



## Modeling Distributions

The easiest way to represent distributions for this example is as joint distribution tables over a set of variables. The values do not have to probabilities, since the algorithm works also for markov random fields. Below is an example of how such a table over the variables A and B could look like.

| a | b | $\phi$(A=a, B=b) |
|---|---|---          |
| $a_0$ | $b_0$ | 10  |
| $a_0$ | $b_1$ | 0.1 |
| $a_1$ | $b_0$ | 0.1 |
| $a_1$ | $b_1$ | 5   |

For this example, we use the following (binary) variables:

In [13]:
variables = {
    'A': [0, 1],
    'B': [0, 1],
    'C': [0, 1],
    'D': [0, 1]
}

### Creating Tables
In order to create a couple of these tables and work with them, we make a function which creates an empty table over a set of variables.

In [23]:
import pandas as pd

def make_empty_table(variables, subset_keys = None, fill_value = 0):
    """Creates an empty tabular distribution with rows for all combinations of variables.
    
        variables (dict): lists of values for every variable
        subset_keys: list of variables to include in the table
        fill_value: initialization value
        
    """
    # if list of variables to pick is set but empty: return table with only one entry
    if subset_keys == []:
        return pd.DataFrame(fill_value, index = [0], columns = ['value'])
    
    # filter variable subset
    if subset_keys:
        variables = {key: variables[key] for key in subset_keys}
    
    # create a new pandas dataframe with rows for every combination of variable values in the subset
    # (by taking the cartesian product)
    varnames = sorted(variables.keys())
    i = pd.MultiIndex.from_product([variables[var] for var in varnames], names = varnames)
    return pd.DataFrame(fill_value, index = i, columns = ['value']).reset_index()

The underlying usage of pandas dataframes allow for some complex use cases: Here are some examples.

Note: the numbers in the first column are the row indices from pandas

In [29]:
make_empty_table(variables, ['A', 'B'])

Unnamed: 0,A,B,value
0,0,0,0
1,0,1,0
2,1,0,0
3,1,1,0


In [30]:
make_empty_table(
    variables = {
        'A': ['a0', 'a1', 'a2'],
        'B': [0, 1],
        'C': [0, 1, 2]}, 
    subset_keys =['A', 'B'],
    fill_value = 42)

Unnamed: 0,A,B,value
0,a0,0,42
1,a0,1,42
2,a1,0,42
3,a1,1,42
4,a2,0,42
5,a2,1,42


In [33]:
make_empty_table(variables, [], 1)

Unnamed: 0,value
0,1
