# CSC421 Assignment 4 - Part I Discrete Bayesian Networks (5 points) #
### Author: George Tzanetakis 

This notebook is based on the supporting material for topics covered in **Chapter 14 Probabilistic Reasoning** from the book *Artificial Intelligence: A Modern Approach.* 

This part relies on the provided notebook code probability.ipynb 

```
Misunderstanding of probability may be the greatest of all impediments
to scientific literacy.

Gould, Stephen Jay
```



In [1]:
from probability import *
from utils import print_table
from notebook import psource, pseudocode, heatmap
from timeit import default_timer as timer

## Introduction 

In this part of assignment 4 we will be exploring Discrete Bayesian Networks (DBNs) and answering queries with exact and approximate inference methods. We will be using the following network: 

<img src="dispnea.png">

## Question 4.1A (Minimum) CSC421 -  (1 point, CSC581C - 0 points) 

Using the convetions for DBNs used in probability.ipynb (from the AIMA authors) encode the diapnea network shown above. Once you have constructed the Bayesian network display the cpt for the Lung Cancer Node (using the API provided not just showing the numbers). 


In [10]:
dyspnea = BayesNet([
    ('Asia', '', 0.01),
    ('Smoker', '', 0.5),
    ('TB', ['Asia'], {True: 0.05, False: 0.01}),
    ('Cancer', ['Smoker'], {True: 0.1, False: 0.01}),
    ('Bronchitis', ['Smoker'], {True: 0.6, False: 0.03}),
    ('Either', ['TB', 'Cancer'], {(True, True): 1, (True, False): 1, (False, True): 1, (False, False): 0}),
    ('Xray', ['Either'], {True: 0.98, False: 0.05}),
    ('Dyspnea', ['Either', 'Bronchitis'], {(True, True): 0.9, (True, False): 0.7, (False, True): 0.8, (False, False): 0.1})
])

dyspnea.variable_node('Cancer').cpt

{(True,): 0.1, (False,): 0.01}

## Question 4.1B (Minimum) (CSC421 - 1 point, CSC581C - 0 point) 

Answer using exact inference with enumeration the following query: given that a
patient has been in Asia and has a positive xray, what is the likelihood of having dyspnea?

Write down using markdown the expression that corresponds to this query and the corresponding 
numbers from the CPT. Calculate the result using a calculator. 

Write code for the same query using *enumeration_ask* and confirm that the result is the same for the same query. 

P(Dyspnea|Asia,Xray)

In [16]:
enumeration_ask('Dyspnea', {'Asia': True, 'Xray': True}, dyspnea)[True]

0.6396226028223374

## Question 4.1C (Expected) 1 point 

Answer using variable elimination i.e the function *elimination_ask*  using the same query. Compare the timing using %%timeit the query using *enumeration_ask* and *eliimination_ask*. 

In [19]:
elimination_ask('Dyspnea', {'Asia': True, 'Xray': True}, dyspnea)[True]

0.6396226028223374

In [42]:
start = timer()
enumeration_ask('Dyspnea', {'Asia': True, 'Xray': True}, dyspnea)[True]
end = timer()
print(f'enumeration: {end - start}')

start = timer()
elimination_ask('Dyspnea', {'Asia': True, 'Xray': True}, dyspnea)[True]
end = timer()
print(f'elimination: {end - start}')

enumeration: 0.0020841000000473286
elimination: 0.004639200000042365


## QUESTION 4.1D (Expected ) 1 point

Answer using approximate inference the same query using both rejection sampling and likelihood weighting. Compare the runtime of the two approximate inference algorithms and the two exact inference algorithms for this query. 

In [72]:
start = timer()
rejection_sampling('Dyspnea', {'Asia': True, 'Xray': True}, dyspnea)[True]
end = timer()
print(f'rejection sampling: {end - start}')

rejection sampling: 0.12401279999994586


In [73]:
start = timer()
likelihood_weighting('Dyspnea', {'Asia': True, 'Xray': True}, dyspnea, 2000)[True]
end = timer()
print(f'likelihood weighting: {end - start}')

likelihood weighting: 0.024634999999989304


Fastest Algorithms:
1. enumeration: 0.0020841000000473286
2. elimination: 0.004639200000042365
3. likelihood weighting: 0.024634999999989304
4. rejection sampling: 0.12401279999994586

## QUESTION 4.1E (Advanced) 1 point 

A Naive Bayes classifier can be considered as a Bayesian Network. The classification problem can then be expressed as setting all the variables corresponding to the features as evidence and querying the probability for the class. Express the Bernoulli Naive Bayes classifier you implemented in the previous assignment as a Bayesian Network using the probability.ipynb conventions. Now that you have a DBN express and solve the classification problem as a query and go over all the previous steps for this particular problem. More specifically do exact inference by enumeration, exact inference by variable elimination, approximate inference by rejection sampling and approximate inference by likelihood weighting to answer the query and show the results. 


In [17]:
# polarity      awful    bad    boring    dull    effective    enjoyable    great    hilarious
# ----------  -------  -----  --------  ------  -----------  -----------  -------  -----------
# pos           0.034  0.28      0.054   0.025        0.154        0.096    0.485        0.132
# neg           0.122  0.545     0.175   0.101        0.086        0.054    0.32         0.059

review = BayesNet([
    ('Positive', '', 0.5),
    ('Negative', '', 0.5),
    ('Aweful', ['Positive', 'Negative'], {(True, True): 0.0, (True, False): 0.034, (False, True): 0.122, (False, False): 0.0}),
    ('Bad', ['Positive', 'Negative'], {(True, True): 0.0, (True, False): 0.28, (False, True): 0.545, (False, False): 0.0}),
    ('Boring', ['Positive', 'Negative'], {(True, True): 0.0, (True, False): 0.054, (False, True): 0.175, (False, False): 0.0}),
    ('Dull', ['Positive', 'Negative'], {(True, True): 0.0, (True, False): 0.025, (False, True): 0.101, (False, False): 0.0}),
    ('Effective', ['Positive', 'Negative'], {(True, True): 0.0, (True, False): 0.154, (False, True): 0.086, (False, False): 0.0}),
    ('Enjoyable', ['Positive', 'Negative'], {(True, True): 0.0, (True, False): 0.096, (False, True): 0.054, (False, False): 0.0}),
    ('Great', ['Positive', 'Negative'], {(True, True): 0.0, (True, False): 0.485, (False, True): 0.32, (False, False): 0.0}),
    ('Hilarious', ['Positive', 'Negative'], {(True, True): 0.0, (True, False): 0.132, (False, True): 0.059, (False, False): 0.0}),
])

for method in [enumeration_ask, elimination_ask, rejection_sampling, likelihood_weighting]:
    print(method.__name__)
    print(f'Positive (Great, Hilarious): {method("Positive", {"Great": True, "Hilarious": True}, review)[True]}')
    print(f'Positive (Aweful, Bad): {method("Positive", {"Aweful": True, "Bad": True}, review)[True]}')
    print(f'Negative (Great, Hilarious): {method("Negative", {"Great": True, "Hilarious": True}, review)[True]}')
    print(f'Negative (Aweful, Bad): {method("Negative", {"Aweful": True, "Bad": True}, review)[True]}\n')

enumeration_ask
Positive (Great, Hilarious): 0.7722557297949337
Positive (Aweful, Bad): 0.1252466780686752
Negative (Great, Hilarious): 0.22774427020506635
Negative (Aweful, Bad): 0.8747533219313248

elimination_ask
Positive (Great, Hilarious): 0.7722557297949337
Positive (Aweful, Bad): 0.1252466780686752
Negative (Great, Hilarious): 0.22774427020506635
Negative (Aweful, Bad): 0.8747533219313248

rejection_sampling
Positive (Great, Hilarious): 0.7751196172248804
Positive (Aweful, Bad): 0.1746987951807229
Negative (Great, Hilarious): 0.22026431718061673
Negative (Aweful, Bad): 0.8873873873873874

likelihood_weighting
Positive (Great, Hilarious): 0.7700235141288507
Positive (Aweful, Bad): 0.1253805393908918
Negative (Great, Hilarious): 0.2310604322888647
Negative (Aweful, Bad): 0.8752757621931632



## QUESTION 4.1F (ADVANCED) (CSC421 - 0 points, CSC581C - 1 point)

This question requires that you have completed the previous question 4.1E. Do a comparison of the 4 inference algorithms on DBNs as well as the standard Bernoulli Naive Bayes classifier you implemented in assignment 3 in terms of two aspects: classification accuracy and run-time. For both classification accuracy and run-time comparison using %%timeit use the training set of positive and negative reviews for testing. 

In [9]:
# YOUR CODE GOES HERE 

## QUESTION 4.1G (ADVANCED) (CSC421 - 0 points, CSC581C - 1 point)

Encode both the Dispnea DBN and the Naive Bayes Network from the previous question as DBNs using the *pgmpy* package for Probablistic Graphical Models in Python. Answer the same queries you did above using variable elimination. 

http://pgmpy.org/

In [None]:
# YOUR CODE GOES HERE 