BIF (Bayesian Interchange Format) is a legacy format for representing Bayesian Networks. For a long period between late 1990s, BIF is not the only standard format for Bayesian Networks. Nowadays, there are emerging formats and modeling languages, such as [stan](http://www.stat.columbia.edu/~gelman/research/published/stan-paper-revision-feb2015.pdf), [UAI format](https://www.cs.huji.ac.il/project/PASCAL/fileFormat.php), and [PMML](https://dmg.org/pmml/v4-4-1/BayesianNetwork.html). 

I have been working with BIF-represented models for a while, during which I wrote this BN class to import BIF into pyagrum and pymc3 for sampling and inference. Hope this could help you.

## Version is important !
Because these code were written for older versions of pymc3 and pyagrum, they are not compatible with current versions of pymc3 and pyagrum. 

The compatible versions are python 3.8, pymc3 3.9.3, pyAgrum 0.18.0, theano 1.0.5, arviz 0.11.0

You are recommended to set up a separate virtualenv or conda environment for running these scripts.

## Conda environment setup

    conda create -n pymc3 python=3.8.13 pip
    conda activate pymc3
    pip install arviz==0.11.0 pyagrum==0.18.0 pymc3==3.9.3


In [5]:
ls models

alarm.bif   cancer.bif      hailfinder.bif  mildew.bif      water.bif
andes.bif   child.bif       hepar2.bif      munin.bif       win95pts.bif
asia.bif    diabetes.bif    insurance.bif   pathfinder.bif
barley.bif  earthquake.bif  link.bif        pigs.bif
(pymc3) 

: 1

In [2]:
conda activate pymc3

(pymc3) 

: 1

In [15]:
bnSample.py 

Usage: bnSample.py model_name <csv|netCDF> sample_size
sample will be saved in samples/model_name/ as CSV files or samples/model_name.nc as netCDF file
actual sampled size may be slightly larger than sample_size
in which case, discard some initial samples for better convergence
(pymc3) 

: 1

Sample from the BayesNet defined by models/asia.bif, output format CSV, target sample size 100000

In [23]:
bnSample.py asia csv 100000

Multiprocess sampling (16 chains in 16 jobs)
BinaryGibbsMetropolis: [asia, tub, smoke, lung, bronc, either, xray, dysp]
Sampling 16 chains for 1_000 tune and 6_251 draw iterations (16_000 + 100_016 draws total) took 59 seconds.
The number of effective samples is smaller than 25% for some parameters.
(pymc3) 

: 1

The results are stored in "samples/asia" directory:

In [24]:
ls samples/

[0m[01;34masia[0m
(pymc3) 

: 1

In [26]:
ls samples/asia

chain-0.csv   chain-12.csv  chain-15.csv  chain-3.csv  chain-6.csv  chain-9.csv
chain-10.csv  chain-13.csv  chain-1.csv   chain-4.csv  chain-7.csv
chain-11.csv  chain-14.csv  chain-2.csv   chain-5.csv  chain-8.csv
(pymc3) 

: 1

In [21]:
sampleCount.py

Usage: sampleCount.py dataset_name [dataset2_name] ...
(pymc3) 

: 1

In [32]:
sampleCount.py asia

Counting asia....
31 uniques, sorting by frequency....Sort Complete
Writing to count.txt.gz
(pymc3) 

: 1

An ASCII count file will be stored in samples/asia/count.txt.gz; Binary value tuples, count and their probabilities from the BayesNet is stored in samples/asia/proj in numpy.ndarray format.

In [28]:
ls samples/asia

chain-0.csv   chain-13.csv  chain-2.csv  chain-6.csv  [0m[01;31mcount.txt.gz[0m
chain-10.csv  chain-14.csv  chain-3.csv  chain-7.csv  [01;34mproj[0m
chain-11.csv  chain-15.csv  chain-4.csv  chain-8.csv
chain-12.csv  chain-1.csv   chain-5.csv  chain-9.csv
(pymc3) 

: 1

In [29]:
ls samples/asia/proj

all.count.npy  all.npy  all.prob.npy
(pymc3) 

: 1

Now, we can invoke JSDivergences.py to compute the JS divergence between sample and distribution. It draws data from all.count.npy and all.prob.npy

In [34]:
JSDivergences.py asia

RawJSD, RealJSD, SampleCoverage = 0.005578738820948832 , 0.15180886603046445 , 0.9351579002977749
(pymc3) 

: 1