# Approximate Inference

In this notebook, you firstly will understand the steps of the algorithms of 
Forward Sampling, Likelihood-Weighted FS, and Gibbs Sampling (for Bayesian networks). We will use <b>pgmpy</b> for the construction of the model.

First of all, we need to import the necessary functions:

In [None]:
import numpy as np
import pandas as pd

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD, DiscreteFactor

## Setting up our model

We will use the classical (enlarged) students example, which has the following graph structure:

<img src="images/students_bn.png" style="width:200px" />

We need to codify the graph structure and the Conditional Probability Distribution families

### Set the structure

First of all, we need to specify that we are constructing a Bayesian network and the set of directed edges as follow:

In [None]:
nodes = ['C', 'D', 'I', 'G', 'S', 'L', 'J', 'H']
G = BayesianModel([('D', 'G'), ('I', 'G'), ('I', 'S'), ('G', 'L'),
                  ('C','D'), ('G','H'), ('J','H'), ('S','J'), ('L','J')])

### Set up the Conditional Probability Distribution families

Once the structure has been defined, we codify the respective CPDs as follow:

In [None]:
c_cpd = TabularCPD('C', 2, [[0.2], [0.8]])
d_cpd = TabularCPD('D', 2, [[0.3, 0.4], 
                            [0.7, 0.6]],
                   evidence=['C'], evidence_card=[2])
i_cpd = TabularCPD('I', 3, [[0.5], [0.3], [0.2]])
g_cpd = TabularCPD('G', 3, [[0.1, 0.2, 0.1, 0.1, 0.2, 0.3],
                            [0.1, 0.3, 0.3, 0.2, 0.2, 0.3],
                            [0.8, 0.5, 0.6, 0.7, 0.6, 0.4]],
                   evidence=['D', 'I'], evidence_card=[2, 3])
s_cpd = TabularCPD('S', 2, [[0.1, 0.2, 0.7],
                            [0.9, 0.8, 0.3]],
                   evidence=['I'], evidence_card=[3])
l_cpd = TabularCPD('L', 2, [[0.1, 0.4, 0.8],
                            [0.9, 0.6, 0.2]],
                   evidence=['G'], evidence_card=[3])
j_cpd = TabularCPD('J', 2, [[0.1, 0.5, 0.4, 0.6],
                            [0.9, 0.5, 0.6, 0.4]],
                   evidence=['L', 'S'], evidence_card=[2, 2])
h_cpd = TabularCPD('H', 3, [[0.7, 0.3, 0.5, 0.3, 0.2, 0.4],
                            [0.1, 0.3, 0.4, 0.4, 0.6, 0.3],
                            [0.2, 0.4, 0.1, 0.3, 0.2, 0.3]],
                   evidence=['G', 'J'], evidence_card=[3, 2])

G.add_cpds(c_cpd, d_cpd, i_cpd, g_cpd, s_cpd, l_cpd, j_cpd, h_cpd)
print('Is the model right?',G.check_model())

## Forward Sampling

Forward sampling takes advantage of an **ancestral ordering** of the Bayesian network to sample each CPD one-by-one. Following that order, once you need to sample any variable $X_i$, all their parents $\mathbf{PA}_i$ will have been sampled before (i.e., $\mathbf{pa}_i$ is known) and we just need to sample the corresponding distribution, $P(X_i|\mathbf{pa}_i)$.

Thus, the first step will be to obtain an ancestral ordering:

In [None]:
"""
Find the ancestral ordering of a BN model
"""
def ancestral_ordering(model):
    nodes = model.nodes()
    ancestors = {}
    ordering = [] # the ordering will be a list
    # go through all the nodes
    for n in nodes:
        n_ancestors = model._get_ancestors_of(n) # returns ancestors + node n
        # if the node has no parent, we can introduce it directly into the ordering
        # if the node does have parent(s), we save the set of parents
        ############################
        ###### YOUR CODE HERE ######
        ############################
    while len(ancestors) >= 1: # whereas there exist nodes to be introduced into the ordering
        act_dict = ancestors.copy()
        for n,ancs in act_dict.items():
            # if all the parents of a node are already in the ordering, we can introduce it too
            if len(ancs) == len(ancs.intersection(set(ordering))):
                ordering.append(n)
                del ancestors[n]
                
    return ordering

Now, we can use this function to get an ancestral ordering for our model $G$:

In [None]:
ancestral_ordering(G)

Check it yourself!

Following the ancestral ordering, we just need to sample distributions. Let's implement a function for this process of sampling a single distribution:

In [None]:
def sample_cpd(probs):
    rval = np.random.random()
    ############################
    ###### YOUR CODE HERE ######
    ############################
    x = # index i where rval < accumulated sum of vector probs
    return x

This function easily returns a sample:

In [None]:
[sample_cpd([0.1,0.4,0.3,0.2]) for _ in range(10)]

Now, we just need to carry out the process of sampling the CPD of each node, one at a time. Following the ancestral ordering, for each node, we select the correct distribution within the CPD to sample and get a sample using the previous function:

In [None]:
def forward_sampling(model,aord=None):
    if aord is None:
        aord = ancestral_ordering(model) # Find the ancestral ordering, if not given
    sampled_vals = {}
    for n in aord: # for each node (CPD), following the ancestral ord.
        cpd = G.get_cpds(n).copy()
        cpd_vars = cpd.variables
        if len(cpd_vars) > 1: # has node 'n' any parent? If so, identify the correct distribution to sample
            red_vals = []
            for v in cpd_vars:
                if v != n:
                    red_vals.append((v,sampled_vals[v]))
            cpd.reduce(red_vals)
        probs = np.ravel(cpd.get_values())
        sampled_vals[n] = sample_cpd(probs) # sample the CPD of node 'n'
    return sampled_vals

Let's get a sample:

In [None]:
for _ in range(10):
    print(forward_sampling(G))

## Likelihood-weighted Sampling

As you know, Forward sampling has a few limitations when answering conditional probability queries (we know the value of a few variables). One might be tempted of just sampling, in the same way as before, the rest of variables. However, we need to account for the probability of observing those given values (evidence) together with the rest of sampled values. 

Thus, LW-sampling is similar to FS in the sense that it samples, one by one, and following an ancestral ordering, the CDPs of all the unobserved variables $X_i$. The difference now is that, alongside with sampling, we will calculate the probability of that evidence happening within the configuration sampled.

In [None]:
def likelihood_weighted_sampling(model, evidence={}, aord=None):
    if aord is None:
        aord = ancestral_ordering(model) # Find the ancestral ordering, if not given
    wei = 1 # we accumulate here the weight (probability)
    sampled_vals = evidence.copy()
    for n in aord: # for each node (CPD), following the ancestral ord.
        cpd = G.get_cpds(n).copy()
        cpd_vars = cpd.variables
        if len(cpd_vars) > 1: # has node 'n' any parent? If so, identify the correct distribution to sample
            red_vals = []
            for v in cpd_vars:
                if v != n:
                    red_vals.append((v,sampled_vals[v]))
            cpd.reduce(red_vals)
        probs = np.ravel(cpd.get_values())
        # if the node 'n' is already observed (evidence), just update the weight of the sample
        # otherwise, sample the CPD of node 'n'
        ############################
        ###### YOUR CODE HERE ######
        ############################
    return sampled_vals, wei


Let's get a sample:

In [None]:
for _ in range(10):
    print(likelihood_weighted_sampling(G, {"D":0,"I":1}))

## Gibbs Sampling

Gibbs sampling is based on the more general Markov Chain Monte Carlo (MCMC) approach. There are a set of state among which you transition given a probabilistic transition matrix. In the long term, you are guaranteed, if a few basic conditions are met, that the samples that you obtain (the states that you visit) could be considered as samples from a probability distribution of interest.

In the specific case of Bayesian networks, you work with a complete sample (whole assignment of values to all the variables). At each step, you just remove the current value of a variable and sample it again. In this case, where all the variables are observed (only the one to be sampled is assumed to be unobserved), only the variables in the Markov Blanket need to be taken into account. In the case of BNs, we only need to consider the CDP of the variable to sample, as well as those where it appears as a parent.

In this specific case, we code it so that we obtain a number of samples:

In [None]:
"""
Gibbs Sampling from Bayesian networks
"""
def GibbsSampling(model, nsample=1):
    initial_state = {}
    for n in model.nodes(): # We start from a randomly chosen state
        initial_state[n] = np.random.choice(model.get_cardinality(n),size=1)[0]
    new_state = initial_state.copy()
    
    sample = []
    count = 0
    while count < nsample: # For each of the samples that we want...
        for n,val in initial_state.items(): # for each of the nodes (order don't matter)
            # we obtain the product of all the CPDs that consider the node 'n'
            phi = model.get_cpds(n).to_factor().copy()
            for cn in model.get_children(n):
                phi.product(model.get_cpds(cn), inplace=True)

            phi_vars = phi.variables
            if len(phi_vars) > 1: # if there are several variables in the resulting factor
                red_vals = []
                for v in phi_vars:
                    if v != n:
                        red_vals.append((v,new_state[v]))
                # identify the correct distribution to sample
                ############################
                ###### YOUR CODE HERE ######
                ############################

            phi.normalize() # (it is a factor, we need to normalize!)
            new_state[n] = sample_cpd(np.ravel(phi.values)) # sample the distribution
        
        count+=1
        initial_state = new_state.copy() # the sample that we just generated will be, next, the 'previous sample'
        sample.append(new_state.copy())

    return sample

Let's get a sample:

In [None]:
GibbsSampling(G,10)

# PGMPY native functions for sampling
All these functions are already implemented in `pgmpy`.

Let's have a look on how to use them. We can start again with Forward Sampling:

In [None]:
from pgmpy.sampling import BayesianModelSampling

inference = BayesianModelSampling(G)
inference.forward_sample(size=10)

The algorithm of likelihood weighted sampling is also in the same `BayesianModelSampling` class, and uses the object `State` to receive the evidence of the CPQ:

In [None]:
from pgmpy.factors.discrete import State
evidence = [State('D', 0), State('J', 1)]
inference.likelihood_weighted_sample(evidence=evidence, size=10)

Finally, we can use also MCMC-based algorithms such as Gibbs Sampling. There is a class, `GibbsSampling`, for this method. We can use the functions two different functions to obtain a sample in different formats. Firstly,  `sample` returns it as a table:

In [None]:
from pgmpy.sampling import GibbsSampling
gibbs_sampler = GibbsSampling(G)
gibbs_sampler.sample(size=10)

And `generate_sample` returns the sample as a genetator object, that includes a list of samples, each of which use the object `State` to denote the assignment of value to the variables of the model:

In [None]:
sample = gibbs_sampler.generate_sample(size=10)
[inst for inst in sample]