# Monty Hall problem (fragments of code originally from pgmpy.org)

First make sure that all necessary libraries are installed and imported (more information about pgmpy, including tutorials, can be found on https://pgmpy.org/):

In [None]:
!pip install pgmpy

In [None]:
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

import networkx as nx
import matplotlib.pyplot as plt

---
The Monty Hall problem was just explained to you. Recall that there are three variables:

C, your choice of door

H, door openend by host 

P, door with real prize

Each of these variables has 3 values (0,1,2) representing the three different doors.

---

Now first define the Bayesian network structure by passing the list of edges:

In [None]:
# Define the network structure:
# DIY: replace the (directed) edges ("Vi","Vj") with relevant edges among C, H and P
model = BayesianNetwork([("V1", "V2"), ("V3", "V4"), ("V5","V6"), ("V7","V8")])

# Draw the just defined structure:
bngraph = nx.DiGraph(model.edges())
nx.draw(bngraph, with_labels = True) 
plt.show()

When the network structure is done, you can define the associated (conditional) probability distributions. To this end, you use TabularCPD. To explain its syntax, consider the 
general form for a 2-valued variable V1 with 2-valued parents V2...Vn:

TabularCPD("V1",2, [ [p(V1=0|V2=0,...,Vn=0), p(V1=0|V2=0,...,Vn=1),..., p(V1=0|V2=1,...,Vn=1)],
                      [p(V1=1|V2=0,...,Vn=0), p(V1=1|V2=0,...Vn=1),..., p(V1=1|V2=1,...,Vn=1)]  ],
             evidence=["V2",...,"Vn"],
             evidence_card=[2,...,2]
            )   

The first argument is the variable's name, which is followed by its cardinality (number of possible values). Then a list of lists with probabilities follows. Subsequently the evidence argument provides the list of parents and their cardinalities (in the same order) are given in evidence_card. The order in which you list the parents in the evidence argument determines the order in which they are listed in the CPD: per value v of V1 (the columns in your table), there is a list of probabilities for v conditioned on all possible values of the parents (the rows in your table). 

Let's add and check the CPDs one by one:


In [None]:
# DIY: define the CPD for variable "C"         
cpd_c = TabularCPD("C", 3, [ [ ], 
                             [ ], 
                             [ ]  ],
                   evidence=[],
                   evidence_card=[] )


# Associate the CPD with the network structure:
model.add_cpds(cpd_c)

# Return the defined CPD
for i in model.get_cpds():
    print(i)

In [None]:
# DIY: define the CPD for variable "P"
cpd_p = TabularCPD("P", 3, [ [ ], 
                             [ ], 
                             [ ]  ],
                   evidence=[],
                   evidence_card=[] )

# Associate the CPD with the network structure:
model.add_cpds(cpd_p)

# Return the defined CPDs
for i in model.get_cpds():
    print(i)

In [None]:
# DIY: define the CPD for variable "H"

cpd_h = TabularCPD("H", 3, [ [ ], 
                             [ ], 
                             [ ]  ],
                   evidence=[],
                   evidence_card=[] )


# Associate the CPD with the network structure:
model.add_cpds(cpd_h)

# Return the defined CPDs
for i in model.get_cpds():
    print(i)

In [None]:
# check the model structure and the associated CPDs: returns True if model syntax is correct otherwise throws an exception
model.check_model()

Now that the model is specified we can compute probabilities from it. We can enter evidence into the network: 
- the door of your choice (e.g. C=0 for the first door; change below if you want), and 
- the door opened by the host (e.g. H=2 for the third door; change below if you want). 

Then we compute the posterior distribution over P given the evidence to determine which door is most likely to have the real prize behind it.

Are you going to switch doors?

In [None]:
# Infering the posterior over P given C and H
infer = VariableElimination(model)
posterior_p = infer.query(["P"], evidence={"C": 0, "H": 2})
print(posterior_p)

We can also enter evidence for P and H and compute a posterior for C. What is the interpretation of this posterior?

In [None]:
# Infering the posterior over C given P and H
posterior_c = infer.query(["C"], evidence={"P": 1, "H": 2})
print(posterior_c)