# Introduction to Custom Classes

In this notebook, we provide a brief overview of how to use the helper classes (*BayesNet* and *Variable*). 

At first, we import our custom classes in order to use them:

In [1]:
import numpy as np
from bayesian_network import BayesNet, Variable
from typing import Iterator

Consider the following Bayesian Network (all variables are binary):

<img width='30%' src='bn.svg'>

The conditional probability tables are given as:

<table style="float: left;margin:5px;"><tr><th>P(A)</th><th>$a_0$<br></th><th>$a_1$</th></tr><tr><td>-</td><td>0.2</td><td>0.8</td></tr></table>

<table style="float: left;margin:5px;"><tr><th>P(B | A)</th><th>$a_0$<br></th><th>$a_1$</th></tr><tr><td>$b_0$</td><td>0.9</td><td>0.2</td></tr><tr><td>$b_1$</td><td>0.1</td><td>0.8</td></tr></table>

<table style="float: left;margin:5px;"><tr><th rowspan="2">P(D | A, B)</th><th colspan="2">$a_0$<br></th><th colspan="2">$a_1$</th></tr><tr><td>$b_0$</td><td>$b_1$</td><td>$b_0$</td><td>$b_1$</td></tr><tr><td>$d_0$<br></td><td>0.1</td><td>0.2</td><td>0.01</td><td>0.8</td></tr><tr><td>$d_1$</td><td>0.9</td><td>0.8</td><td>0.99</td><td>0.2</td></tr></table>

<table style="float: left;margin:5px;"><tr><th>P(C|D)</th><th>$d_0$<br></th><th>$d_1$</th></tr><tr><td>$c_0$</td><td>0.95</td><td>0.15</td></tr><tr><td>$c_1$</td><td>0.05</td><td>0.85</td></tr></table>

<table style="float: left;margin:5px;"><tr><th>P(E | C)</th><th>$c_0$</th><th>$c_1$</th></tr><tr><td>$e_0$</td><td>0.9</td><td>0.4</td></tr><tr><td>$e_1$</td><td>0.1</td><td>0.6</td></tr></table>

### Creating a BayesNet

Let's create a *BayesNet* object representing the above Bayesian Network:

In [5]:
_A_, _B_, _C_, _D_, _E_ = 0, 1, 2, 3, 4

A = np.array([0.2, 0.8])
B_A = np.array([[0.9, 0.2], [0.1, 0.8]])
C_D = np.array([[0.95, 0.15], [0.05, 0.85]])
D_AB = np.array([[[0.1, 0.2], [0.01, 0.8]], [[0.9, 0.8], [0.99, 0.2]]])
E_C = np.array([[0.9, 0.4], [0.1, 0.6]])
              
bayes_net = BayesNet(
    (A, [_A_]),
    (B_A, [_B_, _A_]),
    (C_D, [_C_, _D_]),
    (D_AB, [_D_, _A_, _B_]),
    (E_C, [_E_, _C_])
)

The constructor takes an arbitrary amount of tuples of NumPy arrays and integer lists. Each tuple corresponds to a variable in the Bayesian Network, and each tuple holds one NumPy array and one list of integers:
 - The NumPy array represents the (conditional) probability distribution table of the variable. The table holds the probability distribution(s) over the variable in the first dimension (dimension 0); the additional dimensions encode all possible assignments to the variable's parents.
 - The integer list maps the dimensions in the probability table to variable IDs (i.e., it gives the semantics of the table). The first entry in the list always corresponds to the variable ID of the current variable; the following ones are the variable's parents.

For example, above, we first defined `_A_`, `_B_` etc. to have readable names for the numeric variable IDs. For describing the conditional probability table `P(B | A)` denoted by NumPy array `B_A`, we simply mapped the 2 table dimensions to the variables using `[_B_,_A_]`.

From these specification, the *BayesNet* object creates a set of *Variable* objects representing the random variables, their parents and children, and their probability distributions.

### Variable objects

To access the *Variable* object representing $A$, we can write:

In [9]:
bayes_net[_A_]

<bayesian_network.Variable at 0x22138156390>

We can use the *Variable* objects to get some basic information on the random variable. 

Each *Variable* object has the following attributes:

- **id**: The id of the variable. Type: int
- **parents**: A set containing the ids of the parent variables. Type: set of ints
- **children**: A list containing the ids of the child variables. Type: list of ints
- **num_values**: Number of values this variable can take. Type: int 
- **pdt**: The (conditional) probability distribution table. **Note:** It has a separate dimension for each variable in the Bayesian Network, ordered by ids. The size of each dimension corresponds to the number of possible values; non-parent variables have a dimension size of 1. Type: np.ndarray

In [12]:
variable = bayes_net[1]
print(f'id: {variable.id}\nparents: {variable.parents}\nchildren: {variable.children}\nnum_values: {variable.num_values}')

id: 1
parents: frozenset({0})
children: {3}
num_values: 2


### Distribution tables
We can also access an expanded and sorted version of the conditional distribution table:

In [15]:
variable = bayes_net[1]
print(variable.pdt)

print('\nvariable.pdt.shape =', variable.pdt.shape)

[[[[[0.9]]]


  [[[0.1]]]]



 [[[[0.2]]]


  [[[0.8]]]]]

variable.pdt.shape = (2, 2, 1, 1, 1)


Note that the **dimensions** of this (conditional) probability distribution table are **sorted by variable id**, and **singleton dimensions** of size 1 are inserted **for non-parent variables**. Here, variable 1 can take 2 values and its parent, variable 0, can also take 2 values, so the shape starts with two dimensions of size 2. Variables 2, 3, and 4 are not parents of variable 1, so the remaining dimensions are of size 1.

This design makes computations and broadcasting a lot easier. For instance, as we will see later, it makes computing the full joint distribution easy.

### Topological Sort
Furthermore, we want to use the *BayesNet* object to **iterate over the random variables** of the represented Bayesian Network **in a topological ordering**. This is e.g. required when we want to sample an event from the network. Your first task in Problem Set 2 is to add this functionality to the `BayesNet` class by implementing its `__iter__` function. Checkout *Problem 1.ipynb* for the details of this task, then come back and **copy your implementation to the cell below**.

In [19]:
def __iter__(self) -> Iterator[Variable]:
    """
    Iterates over all variables in the bayesian network in topological ordering, i.e.,
    for an edge from a variable X to a variable Y, X is returned before Y.  
    Since a bayesian network is a directed acyclic graph, a topological ordering can always be found. 

    :yields: variable after variable according to the network's topology.
    """
    
    # list of topologically sorted variables to be returned
    sorted_variables = list()  
    # all variables in the network encoded as a dictionary (keys: variable ids, values: Variable objects)
    variables = self.nodes  
    
    # remove this placeholder!
    # YOUR CODE HERE
    raise NotImplementedError()

    for node in sorted_variables:
        yield node

# bind the implemented iterator method to the BayesNet
BayesNet.__iter__ = __iter__

In such an ordering, no variable appears before its parents. Feel free to verify your implementation by comparing to the figure on the top of this notebook (variable id `0` is $A$, `1` is $B$, and so on).

In [22]:
# iterate over all variable in a topological ordering
for variable in bayes_net:
    print(variable.id)

NotImplementedError: 

### Computing the full joint distribution

In [25]:
fjdt = 1

for variable in bayes_net:
    fjdt = fjdt * variable.pdt
    print(variable.id,fjdt)

print(fjdt.shape, fjdt.sum())

NotImplementedError: 

We can just multiply all conditional distributions, knowing the variables match up, and rely on numpy to broadcast across singleton dimensions.

### Sampling given evidence
To obtain the conditional probability distribution of a variable given the evidence variables' values, call the *Variable* object (as if it was a function), and pass a sample with the evidence variables' values set as desired.

To demonstrate this, let us first create an uninitialized sample vector to hold the sampled value of each variable. Since it gives a value for each random variable in our world, it must have the same number of dimensions as there are variables in the Bayesian Network:

In [None]:
sample = np.empty(len(bayes_net), np.int64)
# sample = [?, ?, ?, ?, ?]

Now, let's get the probability distribution over $P(A)$ to sample a value for random variable $A$. We can do this by passing the sample to the *Variable* object, i.e., 

` bayes_net[_A_](sample)`.

In general, this type of call will look up the distribution of a variable conditioned on the evidence in the sample. Since $A$ has no parents, all the values in `sample` will be ignored, so we left them uninitialized above. We get the correct distribution for $A$ (given at the top of the notebook):

In [None]:
distribution = bayes_net[_A_](sample)
print("P(A) =", distribution)

We provide a function that can sample a value from such a distribution, called `sample_categorical`:

In [None]:
from utils import sample_categorical
a = sample_categorical(distribution)
print("a =", a)

To sample further variables, we would update the value of $A$ in the sample:

In [17]:
sample[_A_] = a
# sample = [a, ?, ?, ?, ?]

NameError: name 'a' is not defined

Now that we have a value for random variable $A$, we can get the distribution $P(B \mid A = a)$ by again passing the (still incomplete) sample to the variable object of $B$:

In [None]:
distribution = bayes_net[_B_](sample)
print("P(B|A=a) =", distribution)

Again, all values for non-parent variables in the sample will be ignored, but the value of $a$ determines which column in the $P(B \mid A)$ table on the top of the notebook is returned.