# Chapter 1: Bayesian Network Fundamentals
## Representing independencies

In [1]:
import pgmpy as pgm

In [3]:
from pgmpy.independencies import IndependenceAssertion

In [6]:
# X and Y are independent
assertion1 = IndependenceAssertion('X', 'Y')

In [7]:
assertion1

(X _|_ Y)

In [9]:
# Conditional assertion (conditional independence)
assertion2 = IndependenceAssertion('X', 'Y', 'Z')

In [10]:
assertion2

(X _|_ Y | Z)

With list [...] representation, we can expression other assertions in the form of $(X\bot Y, Z\ |\ A, B)$

In [11]:
# More assertion
assertion3 = IndependenceAssertion('X', ['Y', 'Z'], ['A', 'B'])

In [12]:
assertion3

(X _|_ Y, Z | A, B)

### ```Independencies``` class

An ```Independencies``` object is used to represent a __set__ of assertions in Bayesian or Markove networks.

In [13]:
from pgmpy.independencies import Independencies

In [14]:
# Create an empty object then add IndependenceAssertion to it
independencies = Independencies()
independencies.get_assertions()

[]

In [19]:
# Add assertions to the empty object. Same object can be added multiple time (NOT hashed)
independencies.add_assertions(assertion1, assertion2)

In [20]:
independencies.get_assertions()

[(X _|_ Y), (X _|_ Y | Z), (X _|_ Y), (X _|_ Y | Z), (X _|_ Y), (X _|_ Y | Z)]

In [22]:
independencies = Independencies(assertion1, assertion2, assertion3)

In [23]:
independencies.get_assertions()

[(X _|_ Y), (X _|_ Y | Z), (X _|_ Y, Z | A, B)]

### Representing joint probability distributions

In [24]:
from pgmpy.factors import JointProbabilityDistribution as Joint

In [25]:
distribution = Joint(['coin1', 'coin2'], [2, 2], [0.25, 0.25, 0.25, 0.25])

In [26]:
print(distribution)

coin1    coin2      P(coin1,coin2)
-------  -------  ----------------
coin1_0  coin2_0            0.2500
coin1_0  coin2_1            0.2500
coin1_1  coin2_0            0.2500
coin1_1  coin2_1            0.2500


In [30]:
# Coin1 and coin2 have indepencecy
distribution.check_independence(['coin1'], ['coin2'])

True

## Conditional probability distribution (CPD)

In [31]:
from pgmpy.factors import TabularCPD

In [32]:
quality = TabularCPD(variable='Quality', variable_card=3, values=[[0.3], [0.5], [0.2]])

In [33]:
print(quality)

+-----------+-----+
| Quality_0 | 0.3 |
+-----------+-----+
| Quality_1 | 0.5 |
+-----------+-----+
| Quality_2 | 0.2 |
+-----------+-----+


In [35]:
quality.variables

['Quality']

In [36]:
quality.cardinality

array([3])

In [37]:
quality.values

array([ 0.3,  0.5,  0.2])

In [38]:
quality.variable_card

3

In [44]:
type(quality.values)

numpy.ndarray

In [45]:
location = TabularCPD(variable='Location', variable_card = 2, values = [[0.6], [0.4]])

In [46]:
print(location)

+------------+-----+
| Location_0 | 0.6 |
+------------+-----+
| Location_1 | 0.4 |
+------------+-----+


In [47]:
# Define a conditional distribution based on 2 evidences
cost = TabularCPD(variable='Cost', variable_card=2,values=[[0.8, 0.6, 0.1, 0.6, 0.6, 0.05]
                                                          ,[0.2, 0.4, 0.9, 0.4, 0.4, 0.95]]
                  , evidence=['Q', 'L'], evidence_card=[3,2])

In [48]:
print(cost)

+--------+-----+-----+-----+-----+-----+------+
| Q      | Q_0 | Q_0 | Q_1 | Q_1 | Q_2 | Q_2  |
+--------+-----+-----+-----+-----+-----+------+
| L      | L_0 | L_1 | L_0 | L_1 | L_0 | L_1  |
+--------+-----+-----+-----+-----+-----+------+
| Cost_0 | 0.8 | 0.6 | 0.1 | 0.6 | 0.6 | 0.05 |
+--------+-----+-----+-----+-----+-----+------+
| Cost_1 | 0.2 | 0.4 | 0.9 | 0.4 | 0.4 | 0.95 |
+--------+-----+-----+-----+-----+-----+------+


## Graph theory
### "Late for school" graph implementation.
In this example, the graph (page 17 - book) has 6 nodes. Each node (a random variable) will simply contain discrete 2 possible states {yes, no} 

In [49]:
from pgmpy.models import BayesianModel

In [50]:
model = BayesianModel()

In [51]:
# Add 'rain' and 'traffic_jam' nodes to the model
model.add_nodes_from(['rain', 'traffic_jam'])

In [52]:
# Add an edge between these nodes
model.add_edge('rain', 'traffic_jam')

In [53]:
# Pgmpy will automatically add new nodes when we add edge
model.add_edge('accident', 'traffic_jam')

In [54]:
model.nodes()

['accident', 'traffic_jam', 'rain']

In [55]:
model.edges()

[('accident', 'traffic_jam'), ('rain', 'traffic_jam')]

In the case of Bayesian network, each of the nodes has an associated CPD with it. The following code defines some tabular CPDs to associate with the model:

In [58]:
# from pgmpy.factors import TablarCPD
cpd_rain = TabularCPD(variable='rain', variable_card=2, values=[[0.4], [0.6]])

In [59]:
cpd_accident = TabularCPD('accident', 2, [[0.2], [0.8]])

In [60]:
cpd_traffic_jam = TabularCPD('traffic_jam', 2, 
                            [[0.9, 0.6, 0.7, 0.1],
                            [0.1, 0.4, 0.3, 0.9]],
                            evidence=['rain', 'accident'],
                            evidence_card=[2,2])

In [61]:
print(cpd_traffic_jam)

+---------------+------------+------------+------------+------------+
| rain          | rain_0     | rain_0     | rain_1     | rain_1     |
+---------------+------------+------------+------------+------------+
| accident      | accident_0 | accident_1 | accident_0 | accident_1 |
+---------------+------------+------------+------------+------------+
| traffic_jam_0 | 0.9        | 0.6        | 0.7        | 0.1        |
+---------------+------------+------------+------------+------------+
| traffic_jam_1 | 0.1        | 0.4        | 0.3        | 0.9        |
+---------------+------------+------------+------------+------------+


Here, pgmpy will use the name string of each TabularCPD to add them to the corresponding nodes. We just need to add the CPDs to the graph.

In [62]:
model.add_cpds(cpd_rain, cpd_accident, cpd_traffic_jam)

In [64]:
model.get_cpds()

[<TabularCPD representing P(rain:2) at 0x7fc088cf3e90>,
 <TabularCPD representing P(accident:2) at 0x7fc088cf3a50>,
 <TabularCPD representing P(traffic_jam:2 | rain:2, accident:2) at 0x7fc088d16110>]

In [65]:
print(model.get_cpds()[0])

+--------+-----+
| rain_0 | 0.4 |
+--------+-----+
| rain_1 | 0.6 |
+--------+-----+


In [66]:
model.add_node('long_queues')

In [68]:
model.add_edge('traffic_jam', 'long_queues')

In [69]:
cpd_long_queues = TabularCPD('long_queues', 2, [[0.9, 0.2], [0.1, 0.8]],
                            evidence=['traffic_jam'],
                            evidence_card=[2])

In [70]:
model.add_cpds(cpd_long_queues)

In [71]:
model.add_nodes_from(['getting_up_late', 'late_for_school'])

In [72]:
model.add_edges_from([('getting_up_late', 'late_for_school'), 
                      ('traffic_jam', 'late_for_school')])

In [73]:
cpd_getting_up_late = TabularCPD('getting_up_late', 2, [[0.6], [0.4]])

In [74]:
cpd_late_for_school = TabularCPD('late_for_school', 2, [[0.9, 0.45, 0.8, 0.1],
                                                       [0.1, 0.55, 0.2, 0.9]],
                                evidence=['getting_up_late', 'traffic_jam'],
                                evidence_card=[2,2])

In [75]:
model.add_cpds(cpd_getting_up_late, cpd_late_for_school)

In [76]:
model.get_cpds()

[<TabularCPD representing P(rain:2) at 0x7fc088cf3e90>,
 <TabularCPD representing P(accident:2) at 0x7fc088cf3a50>,
 <TabularCPD representing P(traffic_jam:2 | rain:2, accident:2) at 0x7fc088d16110>,
 <TabularCPD representing P(long_queues:2 | traffic_jam:2) at 0x7fc088d16050>,
 <TabularCPD representing P(getting_up_late:2) at 0x7fc088cf37d0>,
 <TabularCPD representing P(late_for_school:2 | getting_up_late:2, traffic_jam:2) at 0x7fc088d16390>]

In [77]:
# Check if all associated CPDs are consistent
model.check_model()

True

We can also __remove__ the undesired CPD by the syntax:
```python
model.remove_cpds('late_for_school')
```

### Reasoning pattern in Bayesian networks
The Bayesian Reasoning using the network is somewhat like querying "what if". Given information about A (accident), we would like to know $ P(J|A=True) $. 
### d-Separation, active trail in BN
Given information about A, we can see that the information flows to J. It affects the likelihood of J. To find out which node can affect which node, we need to defines the d-Separation and the definition of active trail, and also the algorithm to check if a trail is active or not. If the trail is not active (which means two nodes are independent), then there is no information flow between two nodes.

In [78]:
# To check if the trail is active in pgmpy
model.is_active_trail('accident', 'rain')

False

In [80]:
# 'accident' and 'rain' becomes dependent since information about 'traffic_jam' is given
model.is_active_trail('accident', 'rain', observed='traffic_jam')

True

In [81]:
model.is_active_trail('getting_up_late', 'rain')

False

In [82]:
model.is_active_trail('getting_up_late', 'rain', observed='late_for_school')

True

In [83]:
model.is_active_trail('accident', 'rain', observed='late_for_school')

True

## Relating graphs and distributions
### IMAP
A graph object G is called an IMAP of a probability distribution D if the set of independency assertions in G, denoted by I(G), is a subset of the set of independencies in D, denoted by I(D).
By definition, there are many map describes the subset of I(D). However, the map which is able to best represent the probability distribution is known as the __Perfect Map__

### CPD representations
So far, we only deal with nominal data that can be represented as a CPD table. However, in practice, there are many cases of real random variable. Here are some of the cases that tabular CPD isn't a good choice.
- Deterministic CPDs: The outcome only depends deterministicly on the parent nodes (e.g. logic gates)
- Context-specific CPDs: (Tree CPD or Rule CPD) This case happends somewhat similar to the previous cases, where the outcome depends surely on its parent. In the book, they give example about adding another varialbe 'flat_tire' to the Late for School BN. If 'flat_tire' = True, then 'late_for_school' will surely be 'True'. This condition doubles the size of the table unnecessarily.

#### Tree CPD
Instead of adding Flat Tyre (T) variable to the tree in this manner: T -> L <- J, Tree CPD consider T as parent of J. When T is False, the distribution of L is considered as before, when T is True, the distribution of L is considered with out affect of J (skip J). Fig 1.9 page 29 demonstrated this idea. However, pgmpy developer abandoned TreeCPD implementation because the code is not good and TreeCPD is not used anywhere else.