In this notebook, we show examples for using the Structure Learning Algorithms in pgmpy. Currently, pgmpy has implementation of 3 main algorithms:
1. PC with stable and parallel variants.
2. Hill-Climb Search
3. Exhaustive Search

For PC the following conditional independence test can be used:
1. Chi-Square test (https://en.wikipedia.org/wiki/Chi-squared_test)
2. Pearsonr (https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression)
3. G-squared (https://en.wikipedia.org/wiki/G-test)
4. Log-likelihood (https://en.wikipedia.org/wiki/G-test)
5. Freeman-Tuckey (Read, Campbell B. "Freeman—Tukey chi-squared goodness-of-fit statistics." Statistics & probability letters 18.4 (1993): 271-278.)
6. Modified Log-likelihood
7. Neymann (https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma)
8. Cressie Read (Cressie, Noel, and Timothy RC Read. "Multinomial goodness‐of‐fit tests." Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464)
9. Power Divergence (Cressie, Noel, and Timothy RC Read. "Multinomial goodness‐of‐fit tests." Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.)

For Hill-Climb and Exhausitive Search the following scoring methods can be used:
1. K2 Score
2. BDeu Score
3. Bic Score

## Generate some data

In [1]:
from itertools import combinations

import networkx as nx
from sklearn.metrics import f1_score

from pgmpy.estimators import PC, HillClimbSearch, ExhaustiveSearch
from pgmpy.estimators import K2Score
from pgmpy.utils import get_example_model
from pgmpy.sampling import BayesianModelSampling

In [2]:
model = get_example_model('alarm')
samples = BayesianModelSampling(model).forward_sample(size=int(1e3))
samples.head()

  warn(
Generating for node: CVP: 100%|██████████| 37/37 [00:00<00:00, 150.14it/s]         


Unnamed: 0,MINVOLSET,VENTMACH,DISCONNECT,VENTTUBE,INTUBATION,PULMEMBOLUS,SHUNT,PAP,FIO2,KINKEDTUBE,...,HRBP,LVFAILURE,HISTORY,HYPOVOLEMIA,STROKEVOLUME,CO,BP,LVEDVOLUME,PCWP,CVP
0,NORMAL,NORMAL,True,ZERO,NORMAL,False,NORMAL,NORMAL,NORMAL,False,...,HIGH,False,False,False,LOW,NORMAL,LOW,NORMAL,NORMAL,NORMAL
1,NORMAL,NORMAL,False,LOW,NORMAL,False,NORMAL,NORMAL,LOW,False,...,HIGH,True,True,False,LOW,LOW,LOW,LOW,LOW,LOW
2,HIGH,HIGH,False,HIGH,NORMAL,False,NORMAL,NORMAL,NORMAL,False,...,HIGH,False,False,False,NORMAL,HIGH,LOW,NORMAL,NORMAL,NORMAL
3,NORMAL,LOW,False,ZERO,ONESIDED,False,HIGH,NORMAL,NORMAL,False,...,HIGH,False,False,False,NORMAL,HIGH,HIGH,NORMAL,NORMAL,NORMAL
4,NORMAL,NORMAL,False,LOW,ONESIDED,False,HIGH,NORMAL,NORMAL,False,...,HIGH,False,False,False,NORMAL,HIGH,HIGH,NORMAL,LOW,NORMAL


In [3]:
# Funtion to evaluate the learned model structures.
def get_f1_score(estimated_model, true_model):
    nodes = estimated_model.nodes()
    est_adj = nx.to_numpy_matrix(estimated_model.to_undirected(), nodelist=nodes, weight=None)
    true_adj = nx.to_numpy_matrix(true_model.to_undirected(), nodelist=nodes, weight=None)
    
    f1 = f1_score(np.ravel(true_adj), np.ravel(est_adj))
    print("F1-score for the model skeleton: ", f1)

## Learn the model structure using PC

In [4]:
est = PC(data=samples)
estimated_model = est.estimate(variant='stable', max_cond_vars=4)
get_f1_score(estimated_model, model)

  warn("Reached maximum number of allowed conditional variables. Exiting")
Working for n conditional variables: 4: 100%|██████████| 4/4 [00:23<00:00,  5.82s/it]

F1-score for the model skeleton:  0.7887323943661972





In [5]:
est = PC(data=samples)
estimated_model = est.estimate(variant='orig', max_cond_vars=4)
get_f1_score(estimated_model, model)

Working for n conditional variables: 4: 100%|██████████| 4/4 [00:28<00:00,  7.15s/it]

F1-score for the model skeleton:  0.7887323943661972





## Learn the model structure using Hill-Climb Search

In [6]:
scoring_method = K2Score(data=samples)
est = HillClimbSearch(data=samples)
estimated_model = est.estimate(scoring_method=scoring_method, max_indegree=4, max_iter=int(1e4))
get_f1_score(estimated_model, model)

  1%|          | 61/10000 [00:36<1:39:54,  1.66it/s]

F1-score for the model skeleton:  0.8076923076923076



