# Bayesian networks

In this notebook you will know the <b>pgmpy</b> (https://github.com/pgmpy/pgmpy) library for the construction of Bayesian networks.

Let us start with pgmpy. First of all, we need to import the necessary functions:


In [None]:
from pgmpy.factors.discrete import TabularCPD
from pgmpy.models import BayesianModel

## Setting up your model

We will use the following DAG structure:

<img src="images/olympicsTrials.png" style="width:200px" />

A BN is composed by a DAG structure and the CPDs. We need to codify both elements.

### Set the structure

First of all, we need to specify that we are constructing a Bayesian network with a set of edges as follow:

In [None]:
olympic_model = BayesianModel([('Genetics', 'Trials'),
                             ('Practice', 'Trials'),
                             ()]) #### YOUR CODE HERE ####

With this simple code, we specify that there are four variables and that the directed edges are <i>Genetics => Trials</i>,  <i>Practice => Trials</i> and <i>Trials => Offer</i>.

One can easily infer the corresponding factorization:

$$p(G,P,T,O)=p(G)p(P)p(T|G,P)p(O|T)$$

### Set up the conditional probability distributions (CPDs)

Once the structure has been defined, we codify the respective probability distributions. 

Firstly, the <i>Genetics</i> and <i>Practice</i> variables do not have any parent and the corresponding distributions are marginal probability distributions. <i>Genetics</i> takes two possible values with the following probability distribution:

In [None]:
genetics_cpd = TabularCPD(
                variable = 'Genetics',
                variable_card = 2,
                values = [[.2,.8]])


<i>Practice</i> also takes two possible values with probability $0.7$ and $0.3$, respectively:

In [None]:
practice_cpd = TabularCPD(
                variable = 'Practice', 
                variable_card = ,#### YOUR CODE HERE ####
                values = )       #### YOUR CODE HERE ####


<i>Offer</i> and <i>Trials</i> do have parents and the corresponding distributions are conditional probability distributions. <i>Trials</i> takes three possible values and has both <i>Genetics</i> and <i>Practice</i> as parents:


In [None]:
trials_cpd = TabularCPD(
                        variable = 'Trials', 
                        variable_card = 3,
                        values = [[.50, .80, .80, .90],
                                  [.30, .15, .10, .08],
                                  [.20, .05, .10, .02]],
                        evidence = ['Genetics', 'Practice'],
                        evidence_card = [2,2])


<i>Offer</i> takes two possible values and has <i>Trials</i> as its only parent. The corresponding conditional probability distribution table is:

T O | p
----|-----
1 1 | 0.95
1 2 | 0.05
2 1 | 0.80
2 2 | 0.20
3 1 | 0.50
3 2 | 0.50


In [None]:
offer_cpd = TabularCPD(
                    variable = 'Offer',
                    variable_card = 2,
                    values = [[], #### YOUR CODE HERE ####
                              []],
                    evidence = ['Trials'],
                    evidence_card = [3])


Once the CPDs are defined, we only have to include them into the model:


In [None]:
olympic_model.add_cpds (genetics_cpd, practice_cpd, offer_cpd, trials_cpd)


Let us examine our model:


In [None]:
olympic_model.get_cpds()

## Using the model

We have already built our model. It is time to use it!

We can find <b>active trails</b> in the model that show the flows of probabilistic influence. For example, we can see that, when no variable is observed, the active trail related with <i>Genetics</i> is:

In [None]:
olympic_model.active_trail_nodes('Genetics')


However, if variable <i>Offer</i> is observed, the active trail associated to <i>Genetics</i> is: 


In [None]:
olympic_model.active_trail_nodes('Genetics', observed='Offer')


We can want to find the local <b>independencies</b> in the model associated to variable <i>Genetics</i>:


In [None]:
olympic_model.local_independencies('Genetics')


Regarding the variable <i>Trials</i>, the list of independencies is empty:


In [None]:
olympic_model.local_independencies('Trials')


We can simply find for all the independencies present in our model as follow:


In [None]:
olympic_model.get_independencies()


Note that some of them are repeated. Probably because it looks for the independencies of all the variables one by one.


## Making inference

Later in this course, we will know different approaches for inference in PGMs. However, let us consider the approach known as <i>Variable Elimination</i> to observe how the different reasoning patterns work.

We can do probability propagation even when no information is observed:


In [None]:
from pgmpy.inference import VariableElimination
olympic_infer = VariableElimination(olympic_model)


We can get probability distributions that are not explicitly spelled out in our graph, as the marginal probability distribution of the variable <i>Offer</i>:


In [None]:
prob_offer = olympic_infer.query(variables = ['Offer'])
print(prob_offer['Offer'])


or the marginal probability distribution of the variable <i>Trials</i>:


In [None]:
prob_trials = olympic_infer.query(variables = ['Trials'])
print(prob_trials['Trials'])


But, the most common use is to propagate the observation of some variables. We can calculate the marginal probability of variable <i>Offer</i> given that the observed individual has no favorable genetics:


In [None]:
prob_offer_bad_genes = olympic_infer.query(
                                        variables = ['Offer'], 
                                        evidence = {'Genetics':1})
print(prob_offer_bad_genes['Offer'])


The probability of obtaining an offer increases when the individual has good genetics and does practice:


In [None]:
prob_offer_good_genes_did_practice = olympic_infer.query(
                                        variables = ['Offer'], 
                                        evidence = {}) #### YOUR CODE HERE ####
print(prob_offer_good_genes_did_practice['Offer'])


These two queries are examples of <b>causal reasoning</b>.

We can also go upstream logically as in <b>evidential reasoning</b>. For example, evidence about a great performance at the Olympic trials affects the probability distribution of <i>Genetics</i> variable:


In [None]:
prob_good_genes_if_amazing_olympic_trials = olympic_infer.query(
                                        variables = ['Genetics'], 
                                        evidence = {'Trials':2})
print(prob_good_genes_if_amazing_olympic_trials['Genetics'])


Finally, the <b>intercausal reasoning</b> is related with the study of two variables that are parents of a third variables (v-structure of <i>Genetics => Trial <= Practice</i>). If we have evidence only about one of the parents, as they are independent, that evidence would have no effect on the probability distribution of the other variable. 

Once the variable <i>Trial</i> is also observed, both parents become dependent and the evidence about <i>Practice</i> does affect the marginal probability distribution of <i>Genetics</i>:

In [None]:
prob_good_genes_if_no_practice = olympic_infer.query(
                                        variables = ['Genetics'], 
                                        evidence = {'Practice':1})
print(prob_good_genes_if_no_practice['Genetics'])

prob_good_genes_if_no_practice_and_great_perf = olympic_infer.query(
                                        variables = ['Genetics'], 
                                        evidence = {'Practice':1,'Trials':2})
print(prob_good_genes_if_no_practice_and_great_perf['Genetics'])


As one can imagine, if someone performs great in the Olympic trials withtout practice, that person must have very favorable genetics!


<hr />

## Exercices

- Which is the probability of having a regular-performance trial for someone that does practice but has not appropriate genetics?

- Which is the probability of receiving an offer just having good genetics? And having bad genetics?

- Which is the probability of requiring large practice for having a great performance in the Trials without appropriate genetics?