In [1]:
from tabulate import tabulate
import pgmini
pgmini.__version__

'0.1.0'

# Module 1

For module 1, we offer code to help build DAGs as well as tabular CPDs. 

In [2]:
from pgmini import m1

## Directed Acyclic Graphs (DAGs)

Next we build a DAG for the example in Fig 1.1 (a) of Koller and Friedman's textbook. 

Note that when the DAG is rendered as a string, nodes under the column _child_ are listed in a topological order.

In [3]:
fig1_1_a = m1.DAG(
    ['Season', 'Flu', 'Hayfever', 'Muscle-Pain', 'Congestion'],
    [
        ['Season', 'Flu'],
        ['Season', 'Hayfever'],
        ['Flu', 'Muscle-Pain'],
        ['Flu', 'Congestion'],
        ['Hayfever', 'Congestion'],
    ]
)
fig1_1_a

+---------------+-------------+
| parents       | child       |
|               | Season      |
+---------------+-------------+
| Season        | Hayfever    |
+---------------+-------------+
| Season        | Flu         |
+---------------+-------------+
| Flu, Hayfever | Congestion  |
+---------------+-------------+
| Flu           | Muscle-Pain |
+---------------+-------------+

You can access the topological order:

In [4]:
fig1_1_a.topo

('Season', 'Hayfever', 'Flu', 'Congestion', 'Muscle-Pain')

The DAG implementation also ships a few data structures representing relations that are key to BNs built upon these DAGs.

See for example, _descendants_ and _ancestors_.

In [5]:
print(tabulate([[u, " ".join(vs)] for u, vs in fig1_1_a.descendants.items()], ['node', 'descendant']))

node         descendant
-----------  -----------------------------------
Season       Hayfever Muscle-Pain Flu Congestion
Flu          Congestion Muscle-Pain
Hayfever     Congestion
Muscle-Pain
Congestion


In [6]:
print(tabulate([[v, " ".join(us)] for v, us in fig1_1_a.ancestors.items()], ['node', 'ancestors']))

node         ancestors
-----------  -------------------
Season
Flu          Season
Hayfever     Season
Muscle-Pain  Season Flu
Congestion   Season Hayfever Flu


You can also enumerate trails between any two nodes (be mindful, trail enumeration is not a polynomial-time algorithm):

In [25]:
for trail in fig1_1_a.enumerate_trails('Muscle-Pain', 'Congestion'):
    print(' -- '.join(trail))

Muscle-Pain -- Flu -- Season -- Hayfever -- Congestion
Muscle-Pain -- Flu -- Congestion


## Tabular Conditional Probability Distributions

A conditional probability distribution (CPD) is a _local probability model_ involving a few random variables (rvs). Some are condition_ed on_ and some are condition_ing_. We refer to the former as _parents_ and the latter as _children_, in allusion to the graphical representation of a CPD a small DAG. In general, a CPD have multiple childre, but in pgmini.m1 that's an unnecessary complication, so we develop CPD that each allow for a single child. There are many ways to represent CPDs, some of these will be covered later on in the course. For now, in pgmini.m1, all we need is a CPD represented explicitly as a table (in code, we use a _tensor_, but we typically visualise the tensor as a table by flatenning most of its axes). These _tabular CPDs_ are sufficient for modelling interactions between discrete random variables taking on values in countably finite sample spaces (that is, spaces of finitely many named categories).

Here we demo the use of our _TabularCPD_. The internal representaion is a dense numpy array with one axis for each rv in the CPD, the leading axes are for the parent rvs, and the last axis for the child rv. We also provide helpers to manage the mapping between named outcomes (or categories) and 0-based integer identifiers (so we can efficiently index the underlying table).  

The mapping from named outcomes to 0-based identifiers is realised by _OutcomeSpace_. Below, we demo their use. As we have quite a few, we are organising them in a little dictionary, indexed by the name of the random variable they are associated with.

In [7]:
outcome_spaces = {
    'Season': m1.OutcomeSpace(['Spring', 'Summer', 'Autumn', 'Winter']), 
    'Flu': m1.OutcomeSpace(['no', 'yes']), 
    'Hayfever': m1.OutcomeSpace(['no', 'yes']), 
    'Muscle-Pain': m1.OutcomeSpace(['no', 'yes']), 
    'Congestion': m1.OutcomeSpace(['no', 'yes']), 
}

Below we demo the key functionalities of OutcomeSpace.

In [8]:
outcome_spaces['Season'].outcomes

('Spring', 'Summer', 'Autumn', 'Winter')

In [9]:
outcome_spaces['Season']['Summer']

1

In [10]:
outcome_spaces['Season'].outcomes[1]

'Summer'

In [11]:
for outcome in outcome_spaces['Season']:
    print(outcome)

Spring
Summer
Autumn
Winter


The _OutcomeSpace_ class also offers a class method to help us obtain outcomes in the cross-product space of various outcome spaces. See below:

In [12]:
generator = m1.OutcomeSpace.enumerate_joint_outcomes(
    outcome_spaces['Season'], 
    outcome_spaces['Flu'], 
    outcome_spaces['Hayfever']
)
print(tabulate(generator, headers=['Season', 'Flu', 'Hayfever']))

Season    Flu    Hayfever
--------  -----  ----------
Spring    no     no
Spring    no     yes
Spring    yes    no
Spring    yes    yes
Summer    no     no
Summer    no     yes
Summer    yes    no
Summer    yes    yes
Autumn    no     no
Autumn    no     yes
Autumn    yes    no
Autumn    yes    yes
Winter    no     no
Winter    no     yes
Winter    yes    no
Winter    yes    yes


In [13]:
print(tabulate(m1.OutcomeSpace.enumerate_joint_outcomes(*outcome_spaces.values()), headers=outcome_spaces.keys()))

Season    Flu    Hayfever    Muscle-Pain    Congestion
--------  -----  ----------  -------------  ------------
Spring    no     no          no             no
Spring    no     no          no             yes
Spring    no     no          yes            no
Spring    no     no          yes            yes
Spring    no     yes         no             no
Spring    no     yes         no             yes
Spring    no     yes         yes            no
Spring    no     yes         yes            yes
Spring    yes    no          no             no
Spring    yes    no          no             yes
Spring    yes    no          yes            no
Spring    yes    no          yes            yes
Spring    yes    yes         no             no
Spring    yes    yes         no             yes
Spring    yes    yes         yes            no
Spring    yes    yes         yes            yes
Summer    no     no          no             no
Summer    no     no          no             yes
Summer    no     no          yes 

Below we demo the use of _TabularCPD_.

In [14]:
cSeason = m1.TabularCPD([], 'Season', outcome_spaces, [0.25, 0.25, 0.25, 0.25])
cSeason

  Season=Spring    Season=Summer    Season=Autumn    Season=Winter
---------------  ---------------  ---------------  ---------------
           0.25             0.25             0.25             0.25

In [15]:
cFlu = m1.TabularCPD(
    ['Season'], 'Flu', outcome_spaces,
    [[0.4, 0.6], [0.9, 0.1], [0.6, 0.4], [0.1, 0.9]]
)
cFlu

Season      Flu=no    Flu=yes
--------  --------  ---------
Spring         0.4        0.6
Summer         0.9        0.1
Autumn         0.6        0.4
Winter         0.1        0.9

In [16]:
cFlu.prob({'Season': 'Winter', 'Flu': 'yes'})

0.9

In [17]:
cHayfever = m1.TabularCPD(
    ['Season'], 'Hayfever', outcome_spaces,
    [[0.1, 0.9], [0.4, 0.6], [0.7, 0.3], [0.9, 0.1]]
)
cHayfever

Season      Hayfever=no    Hayfever=yes
--------  -------------  --------------
Spring              0.1             0.9
Summer              0.4             0.6
Autumn              0.7             0.3
Winter              0.9             0.1

In [18]:
cHayfever.prob({'Season': 'Winter', 'Hayfever': 'yes'})

0.1

In [19]:
cMusclePain = m1.TabularCPD(['Flu'], 'Muscle-Pain', outcome_spaces, [[0.5, 0.5], [0.1, 0.9]])
cMusclePain

Flu      Muscle-Pain=no    Muscle-Pain=yes
-----  ----------------  -----------------
no                  0.5                0.5
yes                 0.1                0.9

In [20]:
cCongestion = m1.TabularCPD(
    ['Flu', 'Hayfever'], 'Congestion', outcome_spaces, 
    [[[0.5, 0.5], [0.2, 0.8]], [[0.1, 0.9], [0.2, 0.8]]]
)
cCongestion

Flu    Hayfever      Congestion=no    Congestion=yes
-----  ----------  ---------------  ----------------
no     no                      0.5               0.5
no     yes                     0.2               0.8
yes    no                      0.1               0.9
yes    yes                     0.2               0.8

Suppose we observe an assignment of all rvs, here is how we can get probabilities from each local probability model:

In [21]:
obs = {'Season': 'Winter', 'Flu': 'yes', 'Hayfever': 'no', 'Muscle-Pain': 'yes', 'Congestion': 'yes'}

In [22]:
for cpd in [cSeason, cFlu, cHayfever, cMusclePain, cCongestion]:
    print(f"P({cpd.child}|{' '.join(cpd.parents)})={cpd.prob(obs)}")

P(Season|)=0.25
P(Flu|Season)=0.9
P(Hayfever|Season)=0.9
P(Muscle-Pain|Flu)=0.9
P(Congestion|Flu Hayfever)=0.9
