# Bayesian Network: Basics

**Bayesian (belief) network** is a very powerful model to represent the uncertainty in the world, including the **dependencies** between different random variables (events) in the real world, and the corresponding **(conditional) probabilities**.

Bayesian network has a number of **advantages** for representing knowledge about an uncertain "world".

- The model encodes dependencies among all variables, it readily **handles situations where some data entries are missing**.
- A Bayesian network can be used to learn causal relationships, and hence can be used to **gain understanding about a problem** and to **predict the consequences of intervention**. 
- The model has both a causal and probabilistic semantics, it is an ideal representation for **combining prior knowledge** (which often comes in causal form) and data. 
- Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for **avoiding overfitting to data**.

In this tutorial, we will introduce the basics of Bayesian network.

## The "Alarm World"

Let us consider the following "*alarm world*": you installed an **alarm** system in your house against **burglary**. However, New Zealand frequently has **earthquakes**, and the alarm system can be occasionally set off by an earthquake as well. In addition, the alarm can be set off by mistake with a very small probability. You have two neighbours, **John** and **Mary**. They might call you if they hear the alarm from your house while you are away. On the other hand, they might still call you for other issues even if they do not hear the alarm. However, they do not know each other, thus they will not communicate with each other about calling you.

<img src="img/alarm.png" width=300></img>

## Random Variables

In this "alarm world", there are five binary **random variables**.

- $B$: Whether a **b**urglar breaks into the house or not.
- $E$: Whether there is an **e**arthquake or not.
- $A$: Whether the **a**larm is set off or not.
- $J$: Whether your neighbour **J**ohn calls you or not.
- $M$: Whether your neighbour **M**ary calls you or not.

## Causal Dependencies

From domain knowledge, we have the following **causal dependencies** between the random variables.

- The alarm can be set off by a burglar.
- The alarm can be set off by an earthquake.
- Whether a burglar breaks into the house is independent from whether there is an earthquake.
- John might call you if they hear the alarm.
- Mary might call you if they hear the alarm.
- Since John and Mary do not communicate, **given the alarm condition**, whether John calls is independent from whether Mary calls.

## Directed Acyclic Graph

Based on the above domain knowledge, we can represent the random variables in the alarm world and their (in)dependencies by the following **Directed Acyclic Graph (DAG)**. 

<img src="img/alarm-dag.png" width=150></img>

We can see that each **node** represents a random variable, and each **directed edge** represents a causal dependency between the variables. For example, the directed edge $(B, A)$ means that the burglary variable is a cause of the alarm variable, and the alarm variable is an effect of the burglary variable.

## (Conditional) Probabilities

The causal dependencies are qualitative. For quantitative reasoning, we need the (conditional) probabilities for each random variable in the DAG.

Again, from domain knowledge, we have the following probabilities.

- A burglar breaks into the house with probability of 0.1%.
- The probability of an earthquake is 0.2%.
- If there were both a burglar and an earthquake, the alarm is set off with probability of 95%.
- If there was a burglar but no earthquake, the alarm is set off with probability of 94%.
- If there was no burglar and an earthquake, the alarm is set off with probability of 29%.
- If there was no burglar and no earthquake, the alarm is set off by mistake with probability of 0.1%.
- If the alarm is set off, John will hear it and call you with probability of 90%.
- If the alarm is not set off, John will call you for other issues with probability of 5%.
- If the alarm is set off, Mary will hear it and call you with probability of 70%.
- If the alarm is not set off, Mary will call you for other issues with probability of 1%.

Based on the above probabilities, we can have the following **Bayesian network** for the alarm world.

<img src="img/alarm-bn.png" width=500></img>

## Summary

From the above example, we can see that to define a Bayesian network, we need to define

1. A **Directed Acyclic Graph (DAG)**, where each **node** represents a **random variable** in the world, and each **directed edge** represents a **causal dependency** between two random variables
2. A **Conditional Probability Table (CPT)** for each node $X$ in the graph. The conditional probabilities are $P(X\ |\ parents(X))$, where $parents(X)$ are the parents (incoming neighbours) of $X$ in the graph. They are direct causes of $X$.

---

More tutorials can be found [here](https://github.com/meiyi1986/tutorials).

[Yi Mei's homepage](https://meiyi1986.github.io/)