Program for the illustration of Bayesian Belief Networks using 5 nodes using Lung Cancer data. (The Conditional Probabilities are given).

In [1]:
!pip install pgmpy

Collecting pgmpy
[?25l  Downloading https://files.pythonhosted.org/packages/a3/0e/d9fadbfaa35e010c04d43acd3ae9fbefec98897dd7d61a6b7eb5a8b34072/pgmpy-0.1.14-py3-none-any.whl (331kB)
[K     |█                               | 10kB 13.7MB/s eta 0:00:01[K     |██                              | 20kB 12.3MB/s eta 0:00:01[K     |███                             | 30kB 8.7MB/s eta 0:00:01[K     |████                            | 40kB 7.7MB/s eta 0:00:01[K     |█████                           | 51kB 4.3MB/s eta 0:00:01[K     |██████                          | 61kB 4.8MB/s eta 0:00:01[K     |███████                         | 71kB 4.8MB/s eta 0:00:01[K     |████████                        | 81kB 5.2MB/s eta 0:00:01[K     |█████████                       | 92kB 5.3MB/s eta 0:00:01[K     |█████████▉                      | 102kB 5.6MB/s eta 0:00:01[K     |██████████▉                     | 112kB 5.6MB/s eta 0:00:01[K     |███████████▉                    | 122kB 5.6MB/s eta 0:00

In [2]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

In [4]:
#Define a structure with nodes and edge
cancer_model = BayesianModel([('Pollution', 'Cancer'),
                              ('Smoker', 'Cancer'),
                              ('Cancer', 'XRay'),
                              ('Cancer', 'Dyspnoea')])

print('Bayesian network nodes are: ')
print('\t',cancer_model.nodes())
print('Bayesian network edges are: ')
print('\t',cancer_model.edges())

Bayesian network nodes are: 
	 ['Pollution', 'Cancer', 'Smoker', 'XRay', 'Dyspnoea']
Bayesian network edges are: 
	 [('Pollution', 'Cancer'), ('Cancer', 'XRay'), ('Cancer', 'Dyspnoea'), ('Smoker', 'Cancer')]


In [9]:
#Creation of Conditional Probability Table

cpd_poll = TabularCPD(variable='Pollution', variable_card=2,
                      values=[[0.9],[0.1]])
cpd_smoke = TabularCPD(variable='Smoker', variable_card=2,
                       values=[[0.3],[0.7]])
cpd_cancer = TabularCPD(variable='Cancer', variable_card=2,
                        values=[[0.03,0.05,0.001,0.02],
                                [0.97,0.95,0.999,0.98]],
                        evidence=['Smoker','Pollution'],
                        evidence_card=[2,2])
cpd_xray = TabularCPD(variable='XRay', variable_card=2,
                      values=[[0.9,0.2],[0.1,0.8]],
                      evidence=['Cancer'],
                      evidence_card=[2])
cpd_dysp = TabularCPD(variable='Dyspnoea', variable_card=2,
                      values=[[0.65,0.3],[0.35,0.7]],
                      evidence=['Cancer'],
                      evidence_card=[2])

In [10]:
#Associating the parameters with the model structure

cancer_model.add_cpds(cpd_poll, cpd_smoke, cpd_cancer, cpd_xray, cpd_dysp)
print('Model generated by adding conditional probability distributions(cpds)')



Model generated by adding conditional probability distributions(cpds)


In [11]:
#Checking if the cpds are valid for the model

print('Checking for correctness of model: ', end="")
print(cancer_model.check_model())

Checking for correctness of model: True


In [26]:
#print('All local dependencies are as follows: \n',cancer_model.get_independencies())

print("Displaying CPDs")
print(cancer_model.get_cpds('Pollution'))
print(cancer_model.get_cpds('Smoker'))
print(cancer_model.get_cpds('Cancer'))
print(cancer_model.get_cpds('XRay'))
print(cancer_model.get_cpds('Dyspnoea'))

All local dependencies are as follows: 
 (Pollution ⟂ Smoker)
(Pollution ⟂ XRay, Dyspnoea | Cancer)
(Pollution ⟂ Dyspnoea | XRay, Cancer)
(Pollution ⟂ XRay, Dyspnoea | Smoker, Cancer)
(Pollution ⟂ XRay | Dyspnoea, Cancer)
(Pollution ⟂ Dyspnoea | XRay, Smoker, Cancer)
(Pollution ⟂ XRay | Smoker, Dyspnoea, Cancer)
(Smoker ⟂ Pollution)
(Smoker ⟂ XRay, Dyspnoea | Cancer)
(Smoker ⟂ Dyspnoea | XRay, Cancer)
(Smoker ⟂ XRay | Dyspnoea, Cancer)
(Smoker ⟂ XRay, Dyspnoea | Pollution, Cancer)
(Smoker ⟂ Dyspnoea | XRay, Pollution, Cancer)
(Smoker ⟂ XRay | Dyspnoea, Pollution, Cancer)
(XRay ⟂ Smoker, Pollution, Dyspnoea | Cancer)
(XRay ⟂ Dyspnoea, Pollution | Smoker, Cancer)
(XRay ⟂ Smoker, Dyspnoea | Pollution, Cancer)
(XRay ⟂ Smoker, Pollution | Dyspnoea, Cancer)
(XRay ⟂ Dyspnoea | Smoker, Pollution, Cancer)
(XRay ⟂ Pollution | Smoker, Dyspnoea, Cancer)
(XRay ⟂ Smoker | Dyspnoea, Pollution, Cancer)
(Dyspnoea ⟂ XRay, Smoker, Pollution | Cancer)
(Dyspnoea ⟂ Smoker, Pollution | XRay, Cancer)
(Dyspnoe

In [13]:
#Inferencing with Bayesian Network

#Computing the probability of Cancer given smoke.
cancer_infer = VariableElimination(cancer_model)

In [19]:
print("\nInferencing with Bayesian Network")

print("\nProbability of Cancer given Smoker")
q = cancer_infer.query(variables=['Cancer'], evidence={'Smoker':1})
print(q)

Finding Elimination Order: : 100%|██████████| 3/3 [00:00<00:00, 845.40it/s]
Eliminating: Pollution: 100%|██████████| 3/3 [00:00<00:00, 349.91it/s]


Inferencing with Bayesian Network

Probability of Cancer given Smoker
+-----------+---------------+
| Cancer    |   phi(Cancer) |
| Cancer(0) |        0.0029 |
+-----------+---------------+
| Cancer(1) |        0.9971 |
+-----------+---------------+





In [18]:
print("\nProbability of Cancer given Smoker, Pollution")
q = cancer_infer.query(variables=['Cancer'], evidence={'Smoker':1, 'Pollution': 1})
print(q)

Finding Elimination Order: : 100%|██████████| 2/2 [00:00<00:00, 280.23it/s]
Eliminating: Dyspnoea: 100%|██████████| 2/2 [00:00<00:00, 227.22it/s]


Probability of Cancer given Smoker, Pollution
+-----------+---------------+
| Cancer    |   phi(Cancer) |
| Cancer(0) |        0.0200 |
+-----------+---------------+
| Cancer(1) |        0.9800 |
+-----------+---------------+



