# Exact Inference

In this notebook you will understand the steps of the algorithm of Variable Elimination (Sum-Product) for computing marginal probability distributions. We will use <b>pgmpy</b> for the construction of the model.

First of all, we need to import the necessary functions:

In [None]:
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import itertools as it

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD, DiscreteFactor

from pgmpy.inference import VariableElimination, BeliefPropagation

## Setting up our model

We will use the classical (enlarged) students example, which has the following graph structure:

<img src="images/students_bn.png" style="width:200px" />

We need to codify the graph structure and the Conditional Probability Distribution families

### Set the structure

First of all, we need to specify that we are constructing a Bayesian network and the set of directed edges as follow:

In [None]:
nodes = ['C', 'D', 'I', 'G', 'S', 'L', 'J', 'H']
G = BayesianModel([('D', 'G'), ('I', 'G'), ('I', 'S'), ('G', 'L'),
                  ('C','D'), ('G','H'), ('J','H'), ('S','J'), ('L','J')])

### Set up the Conditional Probability Distribution families

Once the structure has been defined, we codify the respective CPDs as follow:

In [None]:
c_cpd = TabularCPD('C', 2, [[0.2], [0.8]])
d_cpd = TabularCPD('D', 2, [[0.3, 0.4], 
                            [0.7, 0.6]],
                   evidence=['C'], evidence_card=[2])
i_cpd = TabularCPD('I', 3, [[0.5], [0.3], [0.2]])
g_cpd = TabularCPD('G', 3, [[0.1, 0.2, 0.1, 0.1, 0.2, 0.3],
                            [0.1, 0.3, 0.3, 0.2, 0.2, 0.3],
                            [0.8, 0.5, 0.6, 0.7, 0.6, 0.4]],
                   evidence=['D', 'I'], evidence_card=[2, 3])
s_cpd = TabularCPD('S', 2, [[0.1, 0.2, 0.7],
                            [0.9, 0.8, 0.3]],
                   evidence=['I'], evidence_card=[3])
l_cpd = TabularCPD('L', 2, [[0.1, 0.4, 0.8],
                            [0.9, 0.6, 0.2]],
                   evidence=['G'], evidence_card=[3])
j_cpd = TabularCPD('J', 2, [[0.1, 0.5, 0.4, 0.6],
                            [0.9, 0.5, 0.6, 0.4]],
                   evidence=['L', 'S'], evidence_card=[2, 2])
h_cpd = TabularCPD('H', 3, [[0.7, 0.3, 0.5, 0.3, 0.2, 0.4],
                            [0.1, 0.3, 0.4, 0.4, 0.6, 0.3],
                            [0.2, 0.4, 0.1, 0.3, 0.2, 0.3]],
                   evidence=['G', 'J'], evidence_card=[3, 2])

G.add_cpds(c_cpd, d_cpd, i_cpd, g_cpd, s_cpd, l_cpd, j_cpd, h_cpd)
print('Is the model right?',G.check_model())

## Variable Elimination

The library <b>pgmpy</b> implements the algorithm, so let us use it to work a few concepts that we have studied in the theory lessons.

First of all, we need to tell the library that we want to use Variable Elimination over the graph $\mathcal{G}$:

In [None]:
inference = VariableElimination(G)


As we explained, the induced graph is an undirected graph that we produce as we execute VE. The width of the induced graph is defined as the number of variables in the largest clique of the induced graph minus one.

The width of the induced graph determines the time complexity of the VE algorithm. Remember that it is exponential in this width. However, the induced graph that we end up with (and, therefore, its width) depends on the elimination ordering followed.

Let us compute the width of the induced graph for the different possible orderings of the variables using pgmpy's `induced_width` (let's go for a coffee, this might take its time):

In [None]:
indwidth = {}

for perm in it.permutations(nodes):
    indwidth.update({perm: inference.induced_width(perm)})

print('The lowest induced width is:', np.min(list(indwidth.values())))


Note that that width is achieved by several orderings:


In [None]:
print("Induced width of all the",len(list(indwidth.values())),"possible orderings")
print("Number of orderings with width=3:",len(np.where(np.array(list(indwidth.values()))==3)[0]))
print("Width of all the orderings:")
print(list(indwidth.values()))

In [None]:
print("Three examples of orderings with the minimum induced width (3):")
bOrd = np.argsort(list(indwidth.values()))[:3]
for i in bOrd:
    print(list(indwidth.keys())[i])


Let us see a few examples in detail:


In [None]:
pos = {'C': (0, 0), 'D': (0, -2), 'I': (2, -2), 'G': (1, -3), 'S': (3, -3), 
       'L': (1, -4), 'J': (2, -5), 'H': (0, -6)} 

orden = list(indwidth.keys())[bOrd[2]]
print("- The ordering",orden,"produces the following graph:") 
ig = inference.induced_graph(orden)
nx.draw(ig, with_labels=True, pos=pos, font_weight='bold',node_color='#cccccc', node_size=1000)
plt.show()
print("   + this graph has induced width:",inference.induced_width(orden))

In [None]:
orden = ('C', 'D', 'I', 'H', 'J', 'L', 'S', 'G')
print("\n- One of the most obvious orderings",orden,"produces the following graph:")
ig = inference.induced_graph(orden)
nx.draw(ig, with_labels=True, pos=pos, font_weight='bold',node_color='#cccccc', node_size=1000)
plt.show()
print("   + this graph has induced width:",inference.induced_width(orden))


Two different orderings produce the same induced graph, with width=3. Note, however, that different induced graphs might show the same width too, as only the maximum clique is considered.


In [None]:
orden = ('G', 'D', 'C', 'I', 'S', 'L', 'H', 'J')
print("\n- A bad ordering",orden,"produces the following graph:")
ig = inference.induced_graph(orden)
nx.draw(ig, with_labels=True, pos=pos, font_weight='bold',node_color='#cccccc', node_size=1000)
plt.show()
print("   + this graph has induced width:",inference.induced_width(orden))


The last example is a bad ordering decision as the first variable to eliminate is the most densely connected.

<hr/>

## Operations of VE sum-product
There are two main operations in the Sum-Product VE algorithm: Marginalization (Sum) and Product.

Let us have a look to the use of the <b>Marginalization</b> operation. First of all, in VE we work with undirected graphs. Thus, we have to convert CPDs to factors. Factors in pgmpy already include the function of marginalization, which takes as parameter the list of variables to marginalize out:


In [None]:
phi_j = G.get_cpds('J').to_factor()
print("The CPD of J|L,S converted to a factor:")
print(phi_j)

phi_j.marginalize(['S'])
print("The factor over J,L,S after marginalizing out the variable S:")
print(phi_j)


Let us have a look to the use of the <b>Product</b> operation. This is also a function of the factor in pgmpy. It just takes as parameter another factor to multiply with. The product of factor $\phi_b(L,S)$ and $\phi_a(J,L)$ (the result of the previous marginalization):


In [None]:
phi_l = G.get_cpds('L').to_factor()
print(phi_l)
print(phi_j)

phi_l.product(phi_j)
print("Product of factors:")
print(phi_l)


Now that we are familiar in <b>pgmpy</b> with the two necessary operations for VE, let us compute the following marginal:
$$p(L)=\sum_{\mathbf{x}}p(\mathbf{x})$$

To do so, first of all we have to provide an ordering. Let us use the following: $[C,D,I,H,J,S,G]$.

$$p(L) = \sum_g p(l|g) \sum_s \sum_j p(j|s,l) \sum_h p(h|g,j) \sum_i p(i)p(s|i)\sum_d p(g|d,i)\sum_c p(d|c)p(c)$$

The first step is, therefore, to marginalize out $C$. To do so, we need the product of all the factors that include $C$ previous to marginalize out $C$:


In [None]:
tau1 = G.get_cpds('C').to_factor()
print("Involved factors:")
print(tau1)
print(G.get_cpds('D').to_factor())

tau1.product(G.get_cpds('D').to_factor())
print("Product of involved factors:")
print(tau1)
tau1.marginalize(['C'])
print("C is marginalized out from the previous factor:")
print(tau1)


The following step is to marginalize out $D$. It is necessary to carry out the product of all the involved factors: the CPD of $G|D,I$ and the result of the previous marginalization:


In [None]:
print("Factor of G (involves D):")
########################
#### YOUR CODE HERE ####
########################
print("D is marginalized out from the product of the two previous factors:")
print(tau1)


To marginalize out $I$, there are three involved factors: the marginal of $I$, the CPD of $S|I$ and the previously marginalized factor:


In [None]:
print("Involved factors:")
########################
#### YOUR CODE HERE ####
########################
print("I is marginalized out from the product of the three previous factors:")
print(tau1)


In the following case, $H$ is only involved in one factor, so no product is required. The operation is limited to a marginalization step:


In [None]:
print("Involved factor:")
print(G.get_cpds('H').to_factor())
tau2 = G.get_cpds('H').to_factor()
tau2.marginalize(['H'])
print("H is marginalized out from the previous factor:")
print(tau2)


Note that we have now two induced factors, $\tau_1$ and $\tau_2$.

The following marginalization, that of $J$ is only related with factor $\tau_2$ and the CPD of $J|L,S$.


In [None]:
print("Involved factor:")
print(G.get_cpds('J').to_factor())
tau2.product(G.get_cpds('J').to_factor())
tau2.marginalize(['J'])
print("J is marginalized out from the product of the two previous factors:")
print(tau2)


To marginalize out $S$ we combine (product) both induced factors, $\tau_1$ and $\tau_2$:


In [None]:
tau1.product(tau2)
tau1.marginalize(['S'])
print("S is marginalized out from the product of the two previously induced factors:")
print(tau1)


At this point, only two variables remain in the model: $G$ and $L$. To marginalize out, we have to combine (product) the previous induced factor and the CPD of $L|G$:


In [None]:
print("Involved factor:")
tau1.product(G.get_cpds('L').to_factor())
tau1.marginalize(['G'])
print("G is marginalized out from the product of the two previous factors:")
print(tau1)


Thus, we obtain the marginal probability distribution of $L$ (note that our factors are in fact CPDs, so no normalization is required), the query that we posed above.

Let us compare our result with that of the implemented function of <b>pgmpy<b/>:


In [None]:
phi_query = inference.query(['L'])
print(phi_query)

<hr />
Exercises:

- Try $p(S)$

- Try $p(J)$

## Inference with evidence

If we want to carry out queries where some evidence about a subset of variables is available, we need to consider a third operation: Reduction.

Let us have a look to the use of the <b>Reduction</b> operation. Factors in pgmpy already include the function of marginalization, which takes as parameter the list of pairs (variable-value) for the reduction:


In [None]:
phi_j = G.get_cpds('J').to_factor()
print("The CPD of J|L,S converted to a factor:")
print(phi_j)

phi_j.reduce([('S', 0)])
print("The factor over J,L,S after reducing the variable S with value 0:")
print(phi_j)


Now that we are familiar in <b>pgmpy</b> with the three necessary operations, let us compute the following marginal:
$$p(L|C=0,J=0)=\frac{1}{\Theta}\sum_{g,s,h,i,d}p(G=g,S=s,H=h,I=i,D=d,C=0,J=0)$$

To do so, first of all we have to provide an ordering. Let us use the following: $[D,I,H,S,G]$.

$$p(L|C=0,J=0) = \sum_g p(l|g) \sum_s p(J=0|s,l) \sum_h p(h|g,J=0) \sum_i p(i)p(s|i)\sum_d p(g|d,i) p(d|C=0)p(C=0)$$

The first step is, therefore, to reduce $C$ from the factors that include it.  After the reduction of both involved factors, we can obtain the product of these temporary factors:


In [None]:
tau1 = 

########################
#### YOUR CODE HERE ####
########################

print(tau1)


The first marginalization, in this case, is that of $D$. As before, we need to combine (product) the previous resulting factor over D and the CPD of $G|D,I$ and marginalize out $D$:


In [None]:
tau1.product(G.get_cpds('G').to_factor())
tau1.marginalize(['D'])
print("The product of the previous factor and that over G,D,I after marginalization of D:")
print(tau1)


The marginalization of $I$ is exactly as before:


In [None]:
tau1.product(G.get_cpds('I').to_factor())
tau1.product(G.get_cpds('S').to_factor())
tau1.marginalize(['I'])
print("The product of the previous factor, that over S,I and the marginal of I after marginalization of I:")
print(tau1)


The marginalization of $H$ and $J$, however, are preceded by the reduction of $J=0$:


In [None]:
tau2 = G.get_cpds('H').to_factor()
tau2.reduce([('J', 0)])
tau2.marginalize(['H'])
print("The marginalization of H,J,G after reduction of J=0:")
print(tau2)

aux = G.get_cpds('J').to_factor()
aux.reduce([('J', 0)])
tau2.product(aux)
print("The product of the previous factor and that of J,G,S after reduction of J=0:")
print(tau2)


Finally, $S$ and $G$ are marginalized out exactly as before. Note, however, that in this case normalization is required due to reduction:


In [None]:
tau1.product(tau2)
tau1.marginalize(['S'])

tau1.product(G.get_cpds('L').to_factor())
tau1.marginalize(['G'])
print("G is marginalized out from the product of the two previous factors:")
print(tau1)
tau1.normalize()
print("And, after normalization:")
print(tau1)


Thus, we obtain the marginal probability distribution of $L$ given $C=0$ and $J=0$, the query that we posed above.

Let us compare our result with that of the implemented function of <b>pgmpy<b/>:


In [None]:
phi_query = inference.query(['L'], evidence = {'C': 0, 'J': 0})
print(phi_query)

<hr />
Exercises:

- Try $p(S|D=1,J=1)$


- Try $p(J|H=1,G=1)$

<hr/>

## MAP inference: max-sum VE

Sometimes, we want to calculate the assignment of values that maximizes the marginal, that is, MAP inference. This  is carried out by the max-sum version of variable elimination, which considers the operations of max-marginalization and sum of factors. The function in <b>pgmpy</b> for the query
$$\arg\max_{g,i} p(G=g,I=i)$$
is: 


In [None]:
print('MAP query, argmax_{g,i} P(G=g, I=i):')
print(inference.map_query(['G', 'I']))

Similarly, evidence can be introduced into a MAP query. The query $$\arg\max_{g,i} p(G=g,I=i|S=0)$$
would be launched as follows: 

In [None]:
print('MAP query, argmax_{g,i} P(G=g, I=i|S=0):')
print(inference.map_query(['G', 'I'], evidence = {'S': 0}))

<hr/>

## Operations of VE max-sum

The operations in this case are equivalent, but slightly different. Marginalization is carried out by taking the maximum value, instead of summing values up. And the combination of factors is performed as a sum, instead of as a product.

The sum of factors just sums up the values for those combinations of value that match up in both factors:

In [None]:
tau1 = G.get_cpds('D').to_factor()

print("Factor over D,C")
print(tau1)
print("Factor over C")
print(G.get_cpds('C').to_factor())
tau1.sum(G.get_cpds('C').to_factor())
print("Sum of both previous factors:")
print(tau1)

Max-marginalization is an operation that takes a factor and returns a copy of it after one variable has been marginalized out. The difference in this case is that marginalization is carried out by taking the maximum value among the repeated rows (combinations of values) that appear after the elimination of that variable:

In [None]:
print("Factor over S,I")
tau1=G.get_cpds('S').to_factor()
print(tau1)
print("Max-marginalization of the previous factor (marginalizing out S):")
tau1.maximize(['S'])
print(tau1)