# Foundations of Data Science (GDW) 2023



# Exercise XI: Graphical Models

This weeks' exercise is about (probabilistic) graphical models, in short (P)GMs.

## Part 1: Monty Hall
Execute the code below:

In [None]:
from PIL import Image
from IPython.display import display
import urllib.request
from io import BytesIO 

url = 'https://brilliant-staff-media.s3-us-west-2.amazonaws.com/tiffany-wang/gWotbuEdYC.png'
with urllib.request.urlopen(url) as url:
    img = Image.open(BytesIO(url.read()))

display(img)


Consider the following problem:

Suppose you are on a game show, and you are given the choice of three doors: 

Behind one door is a car; behind the others, goats. 

You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

### Task 1.1
Think about the problem and come up with a justified answer.

*write your answer here*

Thankfully, we can also model the given problem with a Bayesian network.
The graph should have three nodes, each representing the door chosen:
- The door selected by the Guest
- The door containing the prize (car)
- The door Monty chooses to open

For this, we first install the `pgmpy` library.

*Note: There are other libraries that you can do this with, that each offer a different set of functionalities: 
`pomegranate`, `pyAgrum` and `bnlearn` (CRAN-R)*

In [None]:
!pip install pgmpy

Let us now create a probabilistic model for the problem.

In [None]:
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD

# Defining the network structure
edgelist = [("Guest", "Monty"), ("Prize", "Monty")]
model = BayesianNetwork(edgelist)

# Defining the conditional probability distribution tables:
# Documentation can be found here: https://pgmpy.org/factors/discrete.html
cpd_guest = TabularCPD("Guest", 3, [[0.33], [0.33], [0.33]])
cpd_prize = TabularCPD("Prize", 3, [[0.33], [0.33], [0.33]])
cpd_monty = TabularCPD("Monty", 3, [[0, 0, 0, 0, 0.5, 1, 0, 1, 0.5],
                                    [0.5, 0, 1, 0, 0, 0, 1, 0, 0.5],
                                    [0.5, 1, 0, 1, 0.5, 0, 0, 0, 0],
                                   ],
                       evidence=["Guest", "Prize"],
                       evidence_card=[3, 3],
                      )

# Associating the CPDs with the network structure.
model.add_cpds(cpd_guest, cpd_prize, cpd_monty)

And we may also check for validity issues:

In [None]:
# check model validity
model.check_model()

Now we infer the posterior probability of the process:

In [None]:
# Infering the posterior probability
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
posterior_prize = infer.query(["Prize"], evidence={"Guest": 0, "Monty": 2})
print(posterior_prize)

*write your answers here*

## Part 2: Conditional Independence
The structure of Bayesian Networks encodes the conditional independence of the modelled distribution.
Every variable is conditional independent from its non ancestors given its parents. This statement is
hard to apply. Instead, we provide an algorithm to test for conditional independence.

1. Construct heritage graph, of all variables that occur in the question, their parents, their parents, etc. (You obtain a reduced version of the original network)
2. Moralize the graph, by connecting all parent nodes with the same children with undirected edges, in case of more than two parents connect all pairs
3. Replace directed by undirected edges
4. Remove given variables and their edges from the graph
5. Two variables are conditionally independent, if they are not connected by any path in the resulting network

Given the following structure of a Bayesian Network below, use the algorithm above to answer the questions
of conditional independence.

In [None]:
import networkx as nx
import matplotlib.pyplot as plt

edgelist = [("A", "C"), ("B", "C"), ("C", "E"), ("C", "D"), ("D", "F"), ("F", "G")]
labels = {"A": "A", "B": "B", "C": "C", "D": "D", "E": "E", "F": "F", "G": "G"}
G = nx.DiGraph(edgelist)
pos = nx.planar_layout(G)
nx.draw(G, pos, with_labels=False, node_size=800, node_color='#aaf4d9')
nx.draw_networkx_labels(G, pos, labels, font_size=22)
plt.show()

### Task 2.1
Construct the network following the algorithm above.

- Are A and B conditionally independent given D and F ?
- Are D and E conditionally independent given C?
- Are D and E conditionally independent given A and B?

*write your answer below*