# PGM Advanced Topics

This section describes some advanced uses of PGMs (and compiled PGMs).

## PGM name ##

In CK it is possible to give a PGM a name.

In [1]:
from ck.pgm import PGM

pgm = PGM('cancer')

print(pgm.name)

cancer


## State names ##

The states of random variables can also be given names, either as a tuple or list. State names can be a mix of types: int, str, bool, float, None. In fact, the default names are integers, 0, 1, ..., $n - 1$, for $n$ number of states.

In [2]:
pollution = pgm.new_rv('pollution', ('low', 'high'))
smoker = pgm.new_rv('smoker', ('true', 'false'))
cancer = pgm.new_rv('cancer', ('true', 'false'))
xray = pgm.new_rv('xray', ('positive', 'negative'))
dyspnoea = pgm.new_rv('dyspnoea', ('true', 'false'))


Remember that a random variable behaves like a list of indicators...

In [3]:
len(pollution)

2

In [4]:
list(pollution)

[Indicator(rv_idx=0, state_idx=0), Indicator(rv_idx=0, state_idx=1)]

It is possibled to directly access the indicators of a random variable...

In [5]:
pollution.indicators

(Indicator(rv_idx=0, state_idx=0), Indicator(rv_idx=0, state_idx=1))

We also have access to the states names of a random variable.

In [6]:
pollution.states

('low', 'high')

It is possible to access a random variable's indicators by state index or state name...

In [7]:
pollution[0]  # access by state index using square brackets

Indicator(rv_idx=0, state_idx=0)

In [8]:
pollution('low')  # access by state name using round brackets

Indicator(rv_idx=0, state_idx=0)

State names are also used when pretty-printing indicators.

In [9]:
pgm.indicator_str(cancer('true'), smoker('false'))

'cancer=true, smoker=false'

## Random variable index and offset ##

Every random variable has an index, which is its location in the PGM array of random variables.

In [10]:
print(pollution.idx, smoker.idx, cancer.idx, xray.idx, dyspnoea.idx)

0 1 2 3 4


The index of a random variable says where it appears in its PGM `rvs` array.

In [11]:
print([str(rv) for rv in pgm.rvs])

['pollution', 'smoker', 'cancer', 'xray', 'dyspnoea']


The `offset` of a random variable is the sum of lengths of random variables that have a lower index than it. This can be useful when indicators of a PGM are laid out in random variable order, then the indicators of a random variable `rv` will be in the range `rv.offset` to `rv.offset + len(rv) - 1`.

In [12]:
print(pollution.offset, smoker.offset, cancer.offset, xray.offset, dyspnoea.offset)

0 2 4 6 8


## Advanced use of WMCProgram ##

Let's add some factors to the PGM and compile it to a `WMCProgram`.

In [13]:
pgm.new_factor(pollution).set_cpt().set_cpd((), (0.9, 0.1))
pgm.new_factor(smoker).set_cpt().set_cpd((), (0.3, 0.7))
pgm.new_factor(cancer, pollution, smoker).set_cpt().set(
    ((0, 0), (0.03,  0.97)),
    ((1, 0), (0.05,  0.95)),
    ((0, 1), (0.001, 0.999)),
    ((1, 1), (0.02,  0.98)),
)
pgm.new_factor(xray, cancer).set_cpt().set(
    (0, (0.9, 0.1)),
    (1, (0.2, 0.8)),
)
pgm.new_factor(dyspnoea, cancer).set_cpt().set(
    (0, (0.65, 0.35)),
    (1, (0.3,  0.7)),
)


<ck.pgm.CPTPotentialFunction at 0x1e96ebc64b0>

In [14]:
from ck.pgm_circuit.wmc_program import WMCProgram
from ck.pgm_compiler import DEFAULT_PGM_COMPILER as pgm_compiler

wmc = WMCProgram(pgm_compiler(pgm))

State names can make probability queries more intuitive.

In [15]:
wmc.probability(cancer('true'), condition=smoker('false'))

0.0029000000000000002

A `WMCProgram` uses a `ProbabilityMixin` to provide many additional queries based on probabilities.

For example, consider a marginal distribution, which is returned as a numpy array...

In [16]:
wmc.marginal_distribution(cancer)

array([0.01163, 0.98837])

Let's make that more pretty...

In [17]:
for state, pr in zip(cancer.states, wmc.marginal_distribution(cancer)):
    print(f'P({cancer}={state}) is {pr}')

P(cancer=true) is 0.01163
P(cancer=false) is 0.98837


MAP calculation are also possible using functionality from `ProbabilityMixin`

In [18]:
pr, states = wmc.map(cancer, xray, condition=smoker('true'))
pgm.indicator_str(cancer[states[0]], xray[states[1]]) + f' with probability {pr}'

'cancer=false, xray=negative with probability 0.7744'

Many other probabilistic calculations are possible.

In [19]:
print('correlation =', wmc.correlation(cancer[0], smoker[0]))
print('total_correlation =', wmc.total_correlation(cancer, smoker))
print('entropy =', wmc.entropy(cancer))
print('conditional entropy =', wmc.conditional_entropy(cancer, smoker))
print('joint entropy =', wmc.joint_entropy(cancer, smoker))
print('mutual information =', wmc.mutual_information(cancer, smoker))
print('covariant normalised mutual information =', wmc.covariant_normalised_mutual_information(cancer, smoker))
print('uncertainty =', wmc.uncertainty(cancer, smoker))
print('symmetric uncertainty =', wmc.symmetric_uncertainty(cancer, smoker))
print('information quality ratio =', wmc.iqr(cancer, smoker))



correlation = 0.12438070141828558
total_correlation = 0.11027571817587367
entropy = 0.09141503487673329
conditional entropy = 0.08133417625362895
joint entropy = 0.9626250754843215
mutual information = 0.010080858623104302
covariant normalised mutual information = 0.03551641048843064
uncertainty = 0.011438741319017601
symmetric uncertainty = 0.020727453734215563
information quality ratio = 0.010472258493819478


There are tw0 underlying methods used for many probabilistic queries.

The first is `wmc` which provides the weight of worlds matching given indicators.

In [20]:
wmc.wmc(cancer[0], smoker[0])

0.0096

The second is `z` which returns the summed weight of all possible worlds.

In this case `z` is 1, but that is not always the case for a PGM.

In [21]:
wmc.z

1.0

## Extra PGM methods ##


A PGM (and related objects) also have other useful methods.

Here are the factors of the PGM...

In [22]:
pgm.factors

(<ck.pgm.Factor at 0x1e96ebc5d30>,
 <ck.pgm.Factor at 0x1e94febac30>,
 <ck.pgm.Factor at 0x1e96ebc6d80>,
 <ck.pgm.Factor at 0x1e96ebc6e10>,
 <ck.pgm.Factor at 0x1e96ebc6300>)

In [23]:
for factor in pgm.factors:
    print(factor)

('pollution')
('smoker')
('cancer', 'pollution', 'smoker')
('xray', 'cancer')
('dyspnoea', 'cancer')


We can iterate over all the factors connected to a random variable...

In [24]:
for factor in pollution.factors():
    print(factor)

('pollution')
('cancer', 'pollution', 'smoker')


We can get the Markov blanket of a random variable, which is the set of random variables directly connected to it by a factor.

In [25]:
pollution.markov_blanket()

{<ck.pgm.RandomVariable at 0x1e94feba030>,
 <ck.pgm.RandomVariable at 0x1e94febb020>}

In [26]:
for rv in pollution.markov_blanket():
    print(rv)

smoker
cancer
