# Exercise 3- Bayesian Networks and the Darwiche Compiler

In this exercise, we will learn the parameters of a Bayesian Network and perform inference. 
We will then compare our results to those of the Darwiche compiler.

In the event of a persistent problem, do not hesitate to contact the course instructors under
- paul.kahlmeyer@uni-jena.de

### Submission

- Deadline of submission:
        13.11.2022
- Submission on [moodle page](https://moodle.uni-jena.de/course/view.php?id=34630)

### Help
In case you cannot solve a task, you can use the saved values within the `help` directory:
- Load arrays with [Numpy](https://numpy.org/doc/stable/reference/generated/numpy.load.html)
```
np.load('help/array_name.npy')
```
- Load functions, classes and other objects with [Dill](https://dill.readthedocs.io/en/latest/dill.html)
```
import dill
with open('help/some_func.pkl', 'rb') as f:
    func = dill.load(f)
```

to continue working on the other tasks.

# Bayesian Networks

## Dataset
In this exercise we will used a discretized version of the [Pima Indians Dataset](https://www.kaggle.com/uciml/pima-indians-diabetes-database/version/1). In its original form, nine attributes were collected:

- `Pregnancies` : Number of times pregnant 
- `Glucose` : Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- `BloodPressure` : Diastolic blood pressure (mm Hg)
- `SkinThickness` : Triceps skin fold thickness (mm)
- `Insulin` : 2-Hour serum insulin (mu U/ml)
- `BMI` : Body mass index (weight in kg/(height in m)^2)
- `DiabetesPedigreeFunction` : Value indicating the presence of diabetes in the family
- `Age` : Age
- `Outcome`:  1- Diabetes, 0- No diabetes

For this exercise, we will use a binarized version of this dataset. Each attribute has been labeled
- `0` below average
- `1` above average

The dataset is stored as `dataset.csv`. 


### Task 1
Load the dataset.

In [34]:
# TODO: load dataset
import numpy as np
import pandas as pd
from enum import Enum
data = pd.read_csv("dataset.csv").to_numpy()

class Columns(Enum):
    Pregnancies = 0
    Glucose = 1
    BloodPressure = 2
    SkinThickness = 3
    Insulin = 4
    BMI = 5
    DiabetesPedigreeFunction = 6
    Age = 7
    Outcome = 8

## Binary Bayesnet

A Bayesian network is a multivariate categorical where certain conditional independencies hold that are specified by a directed acyclic graph (DAG) on the variables.
Namely, any variable $x_i$ is conditionally independent from its non-descendants given the values of its parents.
This means that the joint distribution factorizes over each node given its parents.

Consider the following example of three variables:

<div>
<img src="images/bayesnet_example.png" width="200"/>
</div>

Here we have that 
\begin{equation}
p(x_0, x_1, x_2) = p(x_0)p(x_2|x_0)p(x_1|x_0,x_2)
\end{equation}

Such a categorical distribution can be represented by **Conditional Probability Tables** (CPTs), that hold the distributions $p(x_i| \text{parents}(x_i))$. 

In our example we would have three CPTs:

| $x_0$ | $p(x_0)$|
| :- | -: | 
| $0$ | $p(x_0=0)$|
| $1$ | $p(x_0=1)$|

| $x_0$ | $x_2$| $p(x_2|x_0)$|
| :- | :- | -: | 
| $0$ | $0$ | $p(x_2=0|x_0=0)$ | 
| $0$ | $1$ | $p(x_2=1|x_0=0)$ | 
| $1$ | $0$ | $p(x_2=0|x_0=1)$ | 
| $1$ | $1$ | $p(x_2=1|x_0=1)$ | 

| $x_0$ | $x_2$| $x_1$| $p(x_1|x_0, x_2)$|
| :- | :- | :- | -: | 
| $0$ | $0$ | $0$ | $p(x_1=0|x_0=0, x_2=0)$ | 
| $0$ | $0$ | $1$ | $p(x_1=0|x_0=0, x_2=1)$ | 
| $0$ | $1$ | $0$ | $p(x_1=0|x_0=1, x_2=0)$ | 
| $0$ | $1$ | $1$ | $p(x_1=0|x_0=1, x_2=1)$ | 
| $1$ | $0$ | $0$ | $p(x_1=1|x_0=0, x_2=0)$ | 
| $1$ | $0$ | $1$ | $p(x_1=1|x_0=0, x_2=1)$ | 
| $1$ | $1$ | $0$ | $p(x_1=1|x_0=1, x_2=0)$ | 
| $1$ | $1$ | $1$ | $p(x_1=1|x_0=1, x_2=1)$ | 


### Task 2

For our dataset, we have the adjacency matrix of such a DAG stored in `adj.npy`.

An entry $A_{ij}$ is 1, if there is an edge from $x_i$ to $x_j$. 

Load and display the adjacency matrix.

In [17]:
# TODO: load and display adjacency matrix
adj = np.load("adj.npy")

### Task 3

Implement the following `BinaryBayeNet` class.

Then create and fit a Bayesnet on the Diabetes dataset and calculate the [loglikelihood](https://en.wikipedia.org/wiki/Likelihood_function).

In [47]:
import itertools
from typing import Self


class BinaryBayesNet:

    def __init__(self, A: np.ndarray, prob_tables: dict[int, tuple[np.ndarray, np.ndarray, np.ndarray]] = {}) -> None:
        '''
        Bayesian Network of binary categorical variables.

        @Params:
            A...            adjacency matrix of the DAG
            prob_tables...  probability tables of the nodes, dictionary where
                            key = node index
                            value = tuple (sample space, probs, indices of sample space)
                            .fit(X) will estimate those prob_tables
        '''
        self.A = A
        self.n = self.A.shape[0]
        self.domain = [0, 1]
        self.prob_tables = prob_tables
        self.sample_space = np.array(list(itertools.product(self.domain, repeat=self.n)))

    def fit(self, dataset: np.ndarray, pseudo_obs: int = 0) -> Self:
        '''
        Calculates the CPTs for the Bayesian Network.

        @Params:
            dataset... Nxd matrix, binary vectors as rows
            pseudo_obs... pseudo observations that are added for laplace regularization
        '''

        self.prob_tables = {}
        for i in range(self.n):
            # make prob table for i
            parents = self.parents(i)
            # marginalize on i and parents
            marg = dataset[:, [i] + parents]
            # sample space for the parents
            parents_sample_space = np.array(list(itertools.product(self.domain, repeat=len(parents))))
            # build prob_table
            probs = np.empty(len(parents_sample_space))
            for entry_index, parents_entry in enumerate(parents_sample_space):
                # condition on parents
                fit_condition = np.all(marg[:, 1:] == parents_entry, axis=1)
                cond = marg[fit_condition, 0]
                total = len(cond)
                # count how many 1s and 0s are in conditioned data set
                ones_count = np.count_nonzero(cond)
                zero_count = len(cond) - ones_count
                # laplace regularization adds each sample c times. This is c * 2^(n - k) entries where k is the number of set indices (k = 1 + len(parents)).
                laplace_amount = pseudo_obs * 2**(self.n - len(parents) - 1)
                # 1 index less is set if i is not set, but the parents still are
                # p(x_i = 0 | x_pa(i) = parents_entry)
                probs[entry_index] = (zero_count + laplace_amount) / (total + 2 * laplace_amount)
            self.prob_tables[i] = (parents_sample_space, probs, parents)
        return self

    def proba(self, X: np.ndarray) -> np.ndarray:
        '''
        Calculates the probabilities of samples X.

        @Params:
            X... numpy array with samples as rows

        @Returns:
            numpy array with p(x)
        '''

        return np.array([self.single_prob(x) for x in X])

    def single_prob(self, x: np.ndarray) -> np.floating:
        # self.prob_tables[i][0] has the sample space, self.prob_tables[i][2] has the parent indices of i
        # np.where(np.all(self.prob_tables[i][0] == x[self.prob_tables[i][2]], axis=1))[0][0] is the index of the entry of the observed parents
        find_entry = lambda i: np.where(np.all(self.prob_tables[i][0] == x[self.prob_tables[i][2]], axis=1))[0][0]
        # p(x_i = 0 | x_pa(i)) for all indices
        prob_0s = np.array([self.prob_tables[i][1][find_entry(i)] for i in range(self.n)])
        # p(x) = x * p_0 + (1 - x) * (1 - p_0)
        return np.prod(x * prob_0s + (1 - x) * (1 - prob_0s))

    def parents(self, node: int) -> list[int]:
        return list(np.argwhere(self.A[:, node] == 1)[:, 0])


# TODO: fit Bayesian Network + calculate loglikelihood
bayesnet: BinaryBayesNet = BinaryBayesNet(adj).fit(data)
likelyhood = bayesnet.proba(data)


### Task 4

Use the bayesian network to answer the following queries:

1. What is the posterior marginal distribution of `Diabetes` if we observe a high `BMI` and a high `Glucose` level?
2. What is the prior marginal distribution of `Pregnancies` and `Glucose`?
3. What is the MAP hypothesis of `Age` given a high `BloodPressure`?

In [53]:
# TODO: answer inference queries
import utils
prob_table = np.c_[bayesnet.sample_space, bayesnet.proba(bayesnet.sample_space)]
task_1 = utils.posterior_marginal(prob_table, [Columns.Outcome.value], [Columns.BMI.value, Columns.Glucose.value], [1, 1])
task_2 = utils.prior_marginal(prob_table, [Columns.Pregnancies.value, Columns.Glucose.value])
task_3 = utils.max_a_posteriori(prob_table, [Columns.Age.value], [Columns.BloodPressure.value], [1])
print(f"task_1:\n{task_1}\ntask_2:\n{task_2}\ntask_3:\n{task_3}")

task_1:
[[0.         0.68899732]
 [1.         0.31100268]]
task_2:
[[0.         0.         0.3114246 ]
 [0.         1.         0.25682388]
 [1.         0.         0.26939603]
 [1.         1.         0.16235549]]
task_3:
[0.]


# Darwiche Compiler

The [Darwiche Compiler](http://reasoning.cs.ucla.edu/c2d/) compiles a logical CNF formula into an arithmetic circuit.

For our bayesian network, we can create the CNF out of the network polynomial and compile it using the `utils.to_circuit` function.

With the following tasks we will create this CNF step-by-step.

## State Monomials

First we need monomials of the form $\theta_{i:z_i}$ that are true if node $i$ takes the value $z_i$.


### Task 5
Create a dictionary, which holds these monomials in the format
- key = monomial index, an ID for the monomial
- value = tuple (node index, node value).

Example: 
```
6 : (2, 1)
```
stands for the monomial $\theta_{2:1}$ which we name with index 6.

*Important:*

Later we want to encode a negative index as the negation of the monomial. Therefore we need to **start counting at 1** (so that we have a different negative value).


In [None]:
# TODO: create monomials

## Conditional Monomials
Next we need monomials of the form $\theta_{i, pa(i): z_i, z_{pa(i)}}$, that are true if node $i$ takes the value $z_i$ and the parent nodes of $i$, namely $pa(i)$ take the values $z_{pa(i)}$.

### Task 6
Create a dictionary, which holds these monomials and their probability in the format
- key = monomial index, an ID for the monomial
- value = tuple (probability, tuple of tuples (node idx, node value)).

Example:

```
24 : (0.3, ((8, 0), (1, 1)))
```

stands for the monomial $\theta_{1, 8 : 1, 0}$ with the probability $p(x_1=1|x_8=0) = 0.3$ which we name with index 24.

*Important:*

In order for the compiler to work, the tuple of tuples **first has to specify the parents and then the actual node**.

In [None]:
# TODO: create monomials

## CNF of the network polynomial

Let $\theta_{I:Z}$ be the monomial that stands for the node indices $I$ being set to values $Z$.
Then the CNF of the network polynomial is made from the following disjunctions:

1. For each node $i$
    - $\theta_{i:0}\vee\theta_{i:1}$
    - $\neg\theta_{i:0}\vee\neg\theta_{i:1}$
  
2. For each combination of a node $i$ and its parents $pa(i)$
    - $\neg\theta_{i,pa(i) : z_i, z_{pa(i)}}\vee\theta_{i:z_i}$
    - $\neg\theta_{i,pa(i) : z_i, z_{pa(i)}}\vee\theta_{j:z_j}$ for each $j\in pa(i)$
    - $\theta_{i,pa(i) : z_i, z_{pa(i)}}\vee\neg\theta_{i:z_i}\vee\bigvee_{j\in pa(i)}\neg\theta_{j:z_j}$
    

### Task 7

Create the CNF as a list of lists, where each list represents a disjunction with the monomial indices (negative indices stand for negated monomials).

Example:

```
[[1 , 2], [-5 , -6]]
```
would stand for the CNF

$\left(\theta_1\vee\theta_2\right) \wedge \left(\neg\theta_5\vee\neg\theta_6\right)$.

In [None]:
# TODO: create CNF list

### Task 8

Use the `utils.to_circuit` function to compile the CNF into an arithmetic circuit.

As a sanity check compare the probabilities of the arithmetic circuit that you obtain with `.eval` to those of your Bayesnet.

*Note:* `.eval` takes $x\in\mathbb{R}^n$ and outputs $p(x)$.

In [None]:
# TODO: convert into arithmetic circuit

# TODO: sanity check - are probabilities correct?