By: Noah Crowley

Case ID: nwc17

## The Problem

As a student who lives on campus at CWRU, hearing the fire alarm is an all-too-regular occurrence. It would be helpful to know what the most likely cause is given just the information that I, as a resident, have access to. It would be especially helpful to know if there had been an actual fire.

My body of evidence variables is limited. I can only really know the following:

- Whether the fire department showed up
  - Fd ∈ {True, False}
- Whether we were allowed back inside quickly
  - Quick ∈ {True, False}
- If smoke is seen coming from any of the windows
  - Smoke ∈ {True, False}
  
And the most common possibilities that I know of are as follows:

- Someone made a simple mistake, such as making easy mac without water or spraying too much air freshener
  - Mistake ∈ {True, False}
- It was all just a fire drill (does not rule out the possibility of another reason happening simultaneously)
  - Drill ∈ {True, False}
- An actual fire happened that could have hurt people
  - Fire ∈ {True, False}

This DAG illustrates the situation more graphically:

<img src="images/exercise_1_dag.png" width="500">

## Defining Probabilities

While I am no expert on these matters, I will trust my intuition to create the prior probabilities and joint probabilities. Potentially, I could easily later retool these probabilities with numbers generated from an actual data set.

For now, my priors are as follow:

$$
\begin{align}
P(Mistake) & = 0.75 \\
P(Drill) & = 0.35 \\
P(Fire) & = 0.05 \\
\end{align}
$$

And I will assume that the joint probabilities of these three variables are equal to the multiple of their individual probabilities. That is:

$$
\begin{align}
P(Mistake, Fire) & = P(Mistake)P(Fire) \\
P(Mistake, Drill) & = P(Mistake)P(Drill) \\
P(Fire, Drill) & = P(Fire)P(Drill) \\
P(Mistake, Fire, Drill) & = P(Mistake)P(Fire)P(Drill)
\end{align}
$$

Finally, I can produce the following table detailing the probabilities

| Variable | Value | Value | Value | Value |
| :--- | ---: | ---: | ---: | ---: |
| __Mistake__ | **True** | **True** | **False** | **False** |
| __Fire__ | **True** | **False** | **True** | **False** |
| Fd | 0.99 | 0.95 | 0.99 | 0.00 |

| Variable | Value | Value | Value | Value |
| :--- | ---: | ---: | ---: | ---: |
| __Mistake__ | **True** | **True** | **False** | **False** |
| __Drill__ | **True** | **False** | **True** | **False** |
| Quick | 0.15 | 0.35 | 0.99 | 0.00 |

| Variable | Value | Value | Value | Value |
| :--- | ---: | ---: | ---: | ---: |
| __Mistake__ | **True** | **True** | **False** | **False** |
| __Fire__ | **True** | **False** | **True** | **False** |
| Smoke | 0.90 | 0.15 | 0.85 | 0.00 |

## Utilization of Probability Theory

In pure theory, this should be the case:

$$
\begin{align}
P(Fire \mid Fd, Quick, Smoke) & = \sum_{Mistake} \sum_{Drill} P(Fire, Mistake, Drill \mid Fd, Quick, Smoke) \\
                              & = \sum_{Mistake} \sum_{Drill} \dfrac{P(Fire, Mistake, Drill, Fd, Quick, Smoke)}{P(Fd, Quick, Smoke)}\\
                              & = \dfrac
                      {P(Fire) \sum_{Mistake} P(Mistake) \sum_{Drill} P(Drill) P(Fd, Quick, Smoke \mid Fire, Mistake, Drill)}
                      {\sum_{Fire}P(Fire) \sum_{Mistake} P(Mistake) \sum_{Drill} P(Drill) P(Fd, Quick, Smoke \mid Fire, Mistake, Drill)}\\
                              & = \dfrac
                      {P(Fire) \times \sum_{Mistake} P(Mistake) \times \sum_{Drill} P(Drill) \times P(Fd \mid Fire, Mistake, Drill) \times P(Quick \mid Fire, Mistake, Drill) \times P(Smoke \mid Fire, Mistake, Drill)}
                      {\sum_{Fire}P(Fire) \times \sum_{Mistake} P(Mistake) \times \sum_{Drill} P(Drill) \times P(Fd \mid Fire, Mistake, Drill) \times P(Quick \mid Fire, Mistake, Drill) \times P(Smoke \mid Fire, Mistake, Drill)}\\
                              & = \dfrac
                      {P(Fire) \times \sum_{Mistake} P(Mistake) \times P(Fd \mid Fire, Mistake) \times P(Smoke \mid Fire, Mistake) \times \sum_{Drill} P(Drill) \times P(Quick \mid Mistake, Drill)}{\sum_{Fire}P(Fire) \times \sum_{Mistake} P(Mistake) \times P(Fd \mid Fire, Mistake) \times P(Smoke \mid Fire, Mistake) \times \sum_{Drill} P(Drill) \times P(Quick \mid Mistake, Drill)}
\end{align}
$$

To make this easier, I will take the numerator and turn it into a function of the value of Fire and the evidence variables:

$$
f(Fire, Fd, Quick, Smoke) = P(Fire) \times \sum_{Mistake} P(Mistake) \times P(Fd \mid Fire, Mistake) \times P(Smoke \mid Fire, Mistake) \times \sum_{Drill} P(Drill) \times P(Quick \mid Mistake, Drill)
$$

So that now my posterior simplifies:

$$
P(Fire \mid Fd, Quick, Smoke) = \dfrac{f(Fire, Fd, Quick, Smoke)}{\sum_{Fire} f(Fire, Fd, Quick, Smoke)}
$$

In order to actually compute the value, it would be easier if I first just computed the values for $f(Fire, Fd, Quick, Smoke)$ and $f(\bar{Fire}, Fd, Quick, Smoke)$:

$$
\begin{align}
f(Fire, Fd, Quick, Smoke) & = 0.05 \times (
0.75 \times 0.99 \times 0.90 \times (0.35 \times 0.15 + 0.65 \times 0.35) + 
0.25 \times 0.99 \times 0.85 \times (0.35 \times 0.99 + 0.65 \times 0.00)
)\\
                          & = 0.05 \times (0.66825 \times (0.0525 + 0.2275) + 0.210375 \times (0.3465 + 0))\\
                          & = 0.05 \times (0.66825 \times 0.28 + 0.210375 \times 0.3465)\\
                          & = 0.05 \times (0.18711 + 0.0728949375)\\
                          & = 0.05 \times (0.2600049375)\\
                          & \approx 0.013
\end{align}
$$

$$
\begin{align}
f(\bar{Fire}, Fd, Quick, Smoke) & = 0.95 \times (
0.75 \times 0.95 \times 0.15 \times (0.35 \times 0.15 + 0.65 \times 0.35) +
0.25 \times 0 \times 0 \times (0.35 \times 0.99 + 0.65 \times 0.00)
)\\
                                & = 0.95 \times (0.106875 \times (0.0525 + 0.2275) + 0 \times (0.3465 + 0))\\
                                & = 0.95 \times (0.106875 \times 0.28 + 0 \times 0.3465)\\
                                & = 0.95 \times (0.029925 + 0)\\
                                & = 0.95 \times 0.029925\\
                                & \approx 0.028
\end{align}
$$

$$
\begin{align}
P(Fire \mid Fd, Quick, Smoke) & = \dfrac{f(Fire, Fd, Quick, Smoke)}{\sum_{Fire} f(Fire, Fd, Quick, Smoke)} \\ \\
                              & \approx \dfrac{0.013}{0.013 + 0.028}\\
                              & = \dfrac{0.013}{0.041}\\
                              & = 0.317
\end{align}
$$

## Using Python

While I trust my mathematical skills, I would much rather not have to compute all the values by hand. Rather, I would like to be able to generate the posteriors given what I see the next time a fire alarm goes off, and to do that it would be helpful to have a computer run a short program. As a result, I will write this up in Python.

In [64]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD as tcpd

bayes_model = BayesianModel([
    ('Mistake', 'Fd'), ('Fire', 'Fd'), ('Mistake', 'Quick'), ('Drill', 'Quick'), ('Mistake', 'Smoke'), ('Fire', 'Smoke')
])

prior_mistake = tcpd(variable = 'Mistake', variable_card = 2, values = [[0.25, 0.75]])
prior_drill = tcpd(variable = 'Drill', variable_card = 2, values = [[0.65, 0.35]])
prior_fire = tcpd(variable = 'Fire', variable_card = 2, values = [[0.95, 0.05]])

cpd_fd = tcpd(
    variable = 'Fd', variable_card = 2, evidence = ['Mistake', 'Fire'], evidence_card = [2, 2],
    values = [
        [1.00, 0.01, 0.05, 0.01],
        [0.00, 0.99, 0.95, 0.99]
    ]
)

cpd_quick = tcpd(
    variable = 'Quick', variable_card = 2, evidence = ['Mistake', 'Drill'], evidence_card = [2, 2],
    values = [
        [1.00, 0.01, 0.65, 0.85],
        [0.00, 0.99, 0.35, 0.15]
    ]
)

cpd_smoke = tcpd(
    variable = 'Smoke', variable_card = 2, evidence = ['Mistake', 'Fire'], evidence_card = [2, 2],
    values = [
        [1.00, 0.15, 0.85, 0.10],
        [0.00, 0.85, 0.15, 0.90]
    ]
)

bayes_model.add_cpds(prior_mistake, prior_drill, prior_fire, cpd_fd, cpd_quick, cpd_smoke)

Just to make sure all of my values are valid, I should check my Bayesian model.

In [65]:
bayes_model.check_model()

True

## Checking Theory with Python

Now that I've got my model into a computer program, I want to check that it all follows according to theory.

In [66]:
from pgmpy.inference import VariableElimination as proc

inference_result = proc(bayes_model)
print(inference_result.query(['Fire'], {'Fd' : 1, 'Quick' : 1, 'Smoke' : 1}) ['Fire'])

╒════════╤═════════════╕
│ Fire   │   phi(Fire) │
╞════════╪═════════════╡
│ Fire_0 │      0.6862 │
├────────┼─────────────┤
│ Fire_1 │      0.3138 │
╘════════╧═════════════╛


This value, 0.3138, is quite close to my approximated value, 0.317, which gives me confidence that I input the values to this model correctly.

Now I can create a quick function that I can run to get the probability of any cause given my evidence:

In [67]:
def get_cause_posterior(cause_name, fd, quick, smoke):
    fd_value = 0
    if (fd):
        fd_value = 1
        
    quick_value = 0
    if (quick):
        quick_value = 1
        
    smoke_value = 0
    if (smoke):
        smoke_value = 1
        
    query_result = inference_result.query([cause_name], {'Fd' : fd_value, 'Quick' : quick_value, 'Smoke' : smoke_value})
    
    return query_result[cause_name].values[1]

And, lastly, I will just check that my function gives me the correct result

In [68]:
print(get_cause_posterior('Fire', True, True, True))

0.31379584000608274


And it appears to be working!