# Chapter 1: Bayesian Network Fundamentals

* 싸이그래머 / QGM - pgmpy [1]
* 김무성

# Contents

* Probability theory
* Installing tools 
* Representing independencies using pgmpy 
* Representing joint probability distributions using pgmpy 
* Conditional probability distribution 
* Graph theory 
* Bayesian models 
* Relating graphs and distributions 
* CPD representations
* Summary

# Probability theory

* Random variable 
* Independence and conditional independence

#### 참고
* [4] stanford-pgm/slides/1.1.2-Intro-distributions - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/1.1.2-Intro-distributions.pdf
* [5] stanford-pgm/slides/1.1.3-Intro-factors - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/1.1.3-Intro-factors.pdf

## Random variable 

A random variable is a way of representing an attribute of the outcome.

Formally, a random variable X is a function that maps a possible set of outcomes Ω to some set E, which is represented as follows:

X :Ω →E

Random variables can either be discrete or continuous.

For any event whose outcome is represented by some random variable (X), we can assign some value to each of the possible outcomes of X, which represents how probable it is.

This is known as the probability distribution of the random variable and is denoted by P(X).

## Independence and conditional independence

#### 참고
* [6] stanford-pgm/slides/2.1.4-Repn-Ind-conditional-independence - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/2.1.4-Repn-Ind-conditional-independence.pdf
* [8] stanford-pgm/slides/2.1.2-Repn-BNs-patterns - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/2.1.2-Repn-BNs-patterns.pdf
* [9] stanford-pgm/slides/2.1.3-Repn-BNs-flow-influence - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/2.1.3-Repn-BNs-flow-influence.pdf

#### joint probability distribution

P(X1,X2,...,Xn )

#### Independence

<img src="figures/cap1.1.png" width=600 />

#### conditional independence

Q ⊥ N | C

# Installing tools 

* IPython 
* pgmpy 

### 참고 : Docker Setting

1. 각자 도커 기반 셋팅이 되어 있어야 함. (리눅스의 경우 docker 시스템 셋팅, 맥과 윈도우의 경우 docker toolbox 셋팅. 이와 관련 문서는 -https://gist.github.com/haje01/0fb6d63bf065c9831256)
2. 실습 이미지 다운로드
    - <font color="blue">docker pull psygrammer/pgmpy_jupyter</font>
3. 컨테이너 실행
    - <font color="blue">docker run -d -p 8888:8888 -e GRANT_SUDO=yes --name run_pgmpy psygrammer/pgmpy_jupyter</font>
        - (참고) 다음의 문서를 참조한다. 컨테이너 실행 옵션들이 나옴 - https://github.com/jupyter/docker-stacks/tree/master/scipy-notebook
        - 컨테이너 실행
        - // 슈도유저 권한주고(권장),
        - // 별명도 붙여줬음(이건 옵션)
        - // docker run -d -p [접속_포트]:8888 -e GRANT_SUDO=yes --name 
        - // [컨테이너별명] psygrammer/pgmpy_jupyter
    

##### 이 셋팅은 다음(Ipython, pgmpy 항목)설명을 실행해서 만들어진 것임.

## IPython  

## pgmpy

* pgmpy is a Python library to work with Probabilistic Graphical models.
* As it's currently not on PyPi, we will need to build it manually.

# Representing independencies using pgmpy 

* To represent independencies, pgmpy has two classes, namely 
    - IndependenceAssertion and 
    - Independencies.

#### IndependenceAssertion class

In [1]:
# Firstly we need to import IndependenceAssertion
from pgmpy.independencies import IndependenceAssertion
# Each assertion is in the form of [X, Y, Z] 
# meaning X is independent of Y given Z.

In [2]:
assertion1 = IndependenceAssertion('X', 'Y')
assertion1

(X _|_ Y)

In [7]:
# To represent conditional assertions, 
# we just need to add a third argument 
# to IndependenceAssertion:
assertion2 = IndependenceAssertion('X', 'Y', 'Z')
assertion2

(X _|_ Y | Z)

In [8]:
# IndependenceAssertion also allows us to represent assertions 
# in the form of (X ⊥Y,Z | A,B). 
# To do this, we just need to pass 
# a list of random variables as arguments:
assertion3 = IndependenceAssertion('X', 'Y', ['A', 'B'])
assertion3

(X _|_ Y | B, A)

#### Independencies class

In [None]:
# Moving on to the Independencies class, 
# an Independencies object is used to represent a set of assertions. 
# Often, in the case of Bayesian or Markov networks,
# we have more than one assertion corresponding to a given model, 
# and to represent these independence assertions for the models, 
# we generally use the Independencies object. 
#Let's take a few examples:

In [9]:
from pgmpy.independencies import Independencies
# There are multiple ways to create an Independencies object, we
# could either initialize an empty object or initialize with some
# assertions.

In [10]:
independencies = Independencies() # Empty object

In [11]:
independencies.get_assertions()

[]

In [12]:
independencies.add_assertions(assertion1, assertion2)

In [13]:
independencies.get_assertions()

[(X _|_ Y), (X _|_ Y | Z)]

In [15]:
# We can also directly initialize Independencies 
# in these two ways:
independencies = Independencies(assertion1, assertion2)
independencies

(X _|_ Y)
(X _|_ Y | Z)

In [16]:
independencies = Independencies(['X', 'Y'],
                                ['A', 'B', 'C'])
independencies

(X _|_ Y)
(A _|_ B | C)

In [17]:
independencies.get_assertions()

[(X _|_ Y), (A _|_ B | C)]

# Representing joint probability distributions using pgmpy 

#### JointProbabilityDistribution class

In [18]:
# We can also represent joint probability distributions 
# using pgmpy's JointProbabilityDistribution class. 
# Let's say we want to represent the joint distribution 
# over the outcomes of tossing two fair coins. 
# So, in this case, the probability of all the possible outcomes 
# would be 0.25, which is shown as follows:
from pgmpy.factors import JointProbabilityDistribution as Joint

In [19]:
distribution = Joint(['coin1', 'coin2'],
                                 [2, 2],
                                 [0.25, 0.25, 0.25, 0.25])

In [20]:
print(distribution)

╒═════════╤═════════╤══════════════════╕
│ coin1   │ coin2   │   P(coin1,coin2) │
╞═════════╪═════════╪══════════════════╡
│ coin1_0 │ coin2_0 │           0.2500 │
├─────────┼─────────┼──────────────────┤
│ coin1_0 │ coin2_1 │           0.2500 │
├─────────┼─────────┼──────────────────┤
│ coin1_1 │ coin2_0 │           0.2500 │
├─────────┼─────────┼──────────────────┤
│ coin1_1 │ coin2_1 │           0.2500 │
╘═════════╧═════════╧══════════════════╛


In [None]:
# We can also conduct independence queries 
# over these distributions in pgmpy:
distribution.check_independence('coin1', 'coin2')

# Conditional probability distribution 

* Representing CPDs using pgmpy 

<img src="figures/cap1.2.png" width=600 />

Let's begin by representing the marginal distribution of the quality of food with Q. As we mentioned earlier, it can be categorized into three values {good, bad, average}. For example, P(Q) can be represented in the tabular form as follows:

<img src="figures/cap1.3.png" />

Similarly, let's say P(L) is the probability distribution of the location of the restaurant. Its CPD can be represented as follows:

<img src="figures/cap1.4.png" />

As the cost of restaurant C depends on both the quality of food Q and its location L, we will be considering P(C | Q, L), which is the conditional distribution of C, given Q and L:

<img src="figures/cap1.5.png" />

## Representing CPDs using pgmpy 

In [21]:
# Let's first see how to represent the tabular CPD 
# using pgmpy for variables that have no conditional variables:

from pgmpy.factors import TabularCPD
# For creating a TabularCPD object we need to pass three
# arguments: the variable name, its cardinality that is the number
# of states of the random variable and the probability value

In [22]:
# corresponding each state.
quality = TabularCPD(variable='Quality',
                    variable_card=3,
                    values=[[0.3], [0.5], [0.2]])

In [23]:
print(quality)

╒════════════════╤═════╕
│ ['Quality', 0] │ 0.3 │
├────────────────┼─────┤
│ ['Quality', 1] │ 0.5 │
├────────────────┼─────┤
│ ['Quality', 2] │ 0.2 │
╘════════════════╧═════╛


In [24]:
quality.variables

OrderedDict([('Quality',
              [State(var='Quality', state=0),
               State(var='Quality', state=1),
               State(var='Quality', state=2)])])

In [25]:
quality.cardinality

array([3])

In [26]:
quality.values

array([ 0.3,  0.5,  0.2])

In [None]:
# You can see here that the values of the CPD are a 1D array instead of a 2D array, which you passed as an argument.
# Actually, pgmpy internally stores the values of the TabularCPD as a flattened numpy array. 
# We will see the reason for this in the next chapter.

In [27]:
location = TabularCPD(variable='Location',
                                 variable_card=2,
                                 values=[[0.6], [0.4]])

In [28]:
print(location)

╒═════════════════╤═════╕
│ ['Location', 0] │ 0.6 │
├─────────────────┼─────┤
│ ['Location', 1] │ 0.4 │
╘═════════════════╧═════╛


In [29]:
# However, when we have conditional variables, 
# we also need to specify them and the cardinality of those variables. 
# Let's define the TabularCPD for the cost variable:
cost = TabularCPD(
                         variable='Cost',
                         variable_card=2,
                         values=[[0.8, 0.6, 0.1, 0.6, 0.6, 0.05],
                                 [0.2, 0.4, 0.9, 0.4, 0.4, 0.95]],
                         evidence=['Q', 'L'],
                         evidence_card=[3, 2])

In [30]:
print(cost)

╒═════════════╤════════════╤════════════╤════════════╤════════════╤════════════╤════════════╕
│ L           │ ['L', '0'] │ ['L', '0'] │ ['L', '0'] │ ['L', '1'] │ ['L', '1'] │ ['L', '1'] │
├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤
│ Q           │ ['Q', '0'] │ ['Q', '1'] │ ['Q', '2'] │ ['Q', '0'] │ ['Q', '1'] │ ['Q', '2'] │
├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤
│ ['Cost', 0] │ 0.8        │ 0.6        │ 0.1        │ 0.6        │ 0.6        │ 0.05       │
├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤
│ ['Cost', 1] │ 0.2        │ 0.4        │ 0.9        │ 0.4        │ 0.4        │ 0.95       │
╘═════════════╧════════════╧════════════╧════════════╧════════════╧════════════╧════════════╛


# Graph theory  

* Nodes and edges 
* Walk, paths, and trails

## Nodes and edges 

<img src="figures/cap1.6.png" width=600/>

## Walk, paths, and trails

<img src="figures/cap1.7.png" width=600 />

# Bayesian models 

* Representation 
* Factorization of a distribution over a network 
* Implementing Bayesian networks using pgmpy 
* Reasoning pattern in Bayesian networks 
* D-separation

<img src="figures/cap1.8.png" width=600 />

## Representation 

<img src="figures/cap1.9.png" width=600 />

## Factorization of a distribution over a network 

<img src="figures/cap1.10.png" width=600 />

<img src="figures/cap1.11.png" width=600 />

<img src="figures/cap1.12.png" width=600 />

<img src="figures/cap1.13.png" width=600 />

## Implementing Bayesian networks using pgmpy 

* Bayesian model representation 

<img src="figures/cap1.14.png" width=600 />

### Bayesian model representation 

In [31]:
# In pgmpy, we can initialize an empty BN or a model with nodes and edges. 
# We can initializing an empty model as follows:
from pgmpy.models import BayesianModel
model = BayesianModel()

In [32]:
# We can now add nodes and edges to this network:
model.add_nodes_from(['rain', 'traffic_jam'])
model.add_edge('rain', 'traffic_jam')

In [33]:
# If we add an edge, but the nodes, between which the edge is, 
# are not present in the model, 
# pgmpy automatically adds those nodes to the model.
model.add_edge('accident', 'traffic_jam')

In [34]:
model.nodes()

['rain', 'traffic_jam', 'accident']

In [35]:
model.edges()

[('rain', 'traffic_jam'), ('accident', 'traffic_jam')]

In [36]:
# In the case of a Bayesian network, each of the nodes has an associated CPD with it. 
# So, let's define some tabular CPDs to associate with the model:
from pgmpy.factors import TabularCPD

In [37]:
cpd_rain = TabularCPD('rain', 2, [[0.4], [0.6]])

In [38]:
cpd_accident = TabularCPD('accident', 2, [[0.2], [0.8]])

In [39]:
cpd_traffic_jam = TabularCPD(
                            'traffic_jam', 2,
                            [[0.9, 0.6, 0.7, 0.1],
                             [0.1, 0.4, 0.3, 0.9]],
                            evidence=['rain', 'accident'],
                            evidence_card=[2, 2])

In [40]:
# Here, we defined three CPDs. We now need to associate them with our model. 
# To associate them with the model, we just need to use the add_cpd method 
# and pgmpy automatically figures out which CPD is for which node:
model.add_cpds(cpd_rain, cpd_accident, cpd_traffic_jam)

In [41]:
model.get_cpds()

[<TabularCPD representing P(rain:2) at 0x7fc2718b4e48>,
 <TabularCPD representing P(accident:2) at 0x7fc2718b4eb8>,
 <TabularCPD representing P(traffic_jam:2 | rain:2, accident:2) at 0x7fc28e7f24e0>]

In [42]:
# Now, let's add the remaining variables and their CPDs:
model.add_node('long_queues')

In [43]:
model.add_edge('traffic_jam', 'long_queues')

In [44]:
cpd_long_queues = TabularCPD('long_queues', 2,
                                         [[0.9, 0.2],
                                          [0.1, 0.8]],
                                         evidence=['traffic_jam'],
                                         evidence_card=[2])

In [45]:
model.add_cpds(cpd_long_queues)

In [46]:
model.add_nodes_from(['getting_up_late',
                                  'late_for_school'])


In [47]:
model.add_edges_from(
                      [('getting_up_late', 'late_for_school'),
                       ('traffic_jam', 'late_for_school')])

In [48]:
cpd_getting_up_late = TabularCPD('getting_up_late', 2,
                                             [[0.6], [0.4]])

In [49]:
cpd_late_for_school = TabularCPD(
                                  'late_for_school', 2,
                                  [[0.9, 0.45, 0.8, 0.1],
                                   [0.1, 0.55, 0.2, 0.9]],
                                  evidence=['getting_up_late',
                                            'traffic_jam'],
                                  evidence_card=[2, 2])

In [50]:
model.add_cpds(cpd_getting_up_late, cpd_late_for_school)

In [51]:
model.get_cpds()

[<TabularCPD representing P(rain:2) at 0x7fc2718b4e48>,
 <TabularCPD representing P(accident:2) at 0x7fc2718b4eb8>,
 <TabularCPD representing P(traffic_jam:2 | rain:2, accident:2) at 0x7fc28e7f24e0>,
 <TabularCPD representing P(long_queues:2 | traffic_jam:2) at 0x7fc28e7f2668>,
 <TabularCPD representing P(getting_up_late:2) at 0x7fc28e7f2710>,
 <TabularCPD representing P(late_for_school:2 | getting_up_late:2, traffic_jam:2) at 0x7fc27187f0b8>]

In [52]:
# Additionally, pgmpy also provides a check_model method 
# that checks whether the model 
# and all the associated CPDs are consistent:
model.check_model()

True

In [53]:
# In case we have got some wrong CPD associated with the model 
# and we want to remove it, we can use the remove_cpd method. 
# Let's say we want to remove the CPD associated with variable late_for_school, 
# we could simply do as follows:
model.remove_cpds('late_for_school')

In [54]:
model.get_cpds()

[<TabularCPD representing P(rain:2) at 0x7fc2718b4e48>,
 <TabularCPD representing P(accident:2) at 0x7fc2718b4eb8>,
 <TabularCPD representing P(traffic_jam:2 | rain:2, accident:2) at 0x7fc28e7f24e0>,
 <TabularCPD representing P(long_queues:2 | traffic_jam:2) at 0x7fc28e7f2668>,
 <TabularCPD representing P(getting_up_late:2) at 0x7fc28e7f2710>]

## Reasoning pattern in Bayesian networks 

<img src="figures/cap1.15.png" width=600 />

## D-separation

* Direct connection 
* Indirect connection

### Direct connection 

### Indirect connection

<img src="figures/cap1.16.png" width=600 />

## Relating graphs and distributions

* IMAP
* IMAP to factorization

### IMAP

<img src="figures/cap1.17.png" />

### IMAP to factorization

<img src="figures/cap1.18.png" width=600 />

<img src="figures/cap1.19.png" width=600 />

## CPD representations

* Deterministic CPDs
* Context-specific CPDs

### Deterministic CPDs

<img src="figures/cap1.20.png" width=600 />

<img src="figures/cap1.21.png" width=600 />

<img src="figures/cap1.22.png" />

### Context-specific CPDs

* Tree CPD
* Rule CPD

#### Tree CPD

<img src="figures/cap1.23.png" width=600 />

<img src="figures/cap1.24.png" width=600 />

In [55]:
# Now, let's see how we can implement this using pmgpy:
from pgmpy.factors import TreeCPD, Factor

In [56]:
tree_cpd = TreeCPD([
                      ('B', Factor(['A'], [2], [0.8, 0.2]), '0'),
                      ('B', 'C', '1'),
                      ('C', Factor(['A'], [2], [0.1, 0.9]), '0'),
                      ('C', 'D', '1'),
                      ('D', Factor(['A'], [2], [0.9, 0.1]), '0'),
                      ('D', Factor(['A'], [2], [0.4, 0.6]), '1')])

#### Rule CPD

<img src="figures/cap1.25.png"  />

In [57]:
# Let's see the code implementation using pgmpy:
from pgmpy.factors import RuleCPD
rule = RuleCPD('A', {('A_0', 'B_0'): 0.8,
                                ('A_1', 'B_0'): 0.2,
                                ('A_0', 'B_1', 'C_0'): 0.4,
                                ('A_1', 'B_1', 'C_0'): 0.6,
                                ('A_0', 'B_1', 'C_1'): 0.9,
                                ('A_1', 'B_1', 'C_1'): 0.1})


# Summary

# 참고자료 

* [1] Mastering Probabilistic Graphical Models using Python - http://www.amazon.com/Mastering-Probabilistic-Graphical-Models-Python/dp/1784394688
* [2] Probabilistic Graphical Models - https://www.coursera.org/course/pgm
* [3] stanford-pgm/slides/Section-1-Introduction - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/Section-1-Introduction-Combined.pdf
* [4] stanford-pgm/slides/1.1.2-Intro-distributions - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/1.1.2-Intro-distributions.pdf
* [5] stanford-pgm/slides/1.1.3-Intro-factors - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/1.1.3-Intro-factors.pdf
* [6] stanford-pgm/slides/2.1.4-Repn-Ind-conditional-independence - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/2.1.4-Repn-Ind-conditional-independence.pdf
* [7] stanford-pgm/slides/2.1.1-Repn-BNs-semantics - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/2.1.1-Repn-BNs-semantics.pdf
* [8] stanford-pgm/slides/2.1.2-Repn-BNs-patterns - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/2.1.2-Repn-BNs-patterns.pdf
* [9] stanford-pgm/slides/2.1.3-Repn-BNs-flow-influence - http://spark-university.s3.amazonaws.com/stanford-pgm/slides/2.1.3-Repn-BNs-flow-influence.pdf