# Binary Hidden Markov Model Filtering

In this notebook assignment, you will implement filtering on a simple HMM with binary states and evidence variables, walking through an example step-by-step.

You will also learn how the HMM model can be represented as two matrices and subsequently both stages of filtering reduce to the efficient operation of matrix multiplication.

You will implement all of your code in the `BinaryHMM` class located in `mae6710/binhmm.py`

## Some Preliminary Settings and Imports 

In [None]:
# Autoreload your code when it changes, without having to restart the kernel
%load_ext autoreload
%autoreload 2

import numpy as np
from matplotlib import pyplot as plt

from mae6710.binhmm import BinaryHMM

## Set up the Model

Let's take an example of whether your advisor hates you, which is the hidden variable. The evidence is whether they sent you a curt email. We will assume that their mood doesn't change throughout the day and that they send you one email per day. 

---
### Initial Probabilities - $P(X_1)$

Initially, we do not know anything about whether they hate you on Day 1 of the semester.

$$
P(X_1=hates) = 0.5
$$
And therefore
$$
P(X_1=\neg hates) = 0.5
$$

In other words, we start with a uniform distribution about our Beliefs at time $t=1$. We will simplify by just writing $P(X_i=h)$ and $P(X_i=\neg h)$.

---
### Transition Model - $P(X_{i+1}|X_i)$


The good thing about this advisor is that they don't hold a grudge. Or, more likely, they don't remember stuff from one day to the next. This is lucky, because if they hate you at time $t=i$, you can't say much about $t=i+1$. It's a complete toss-up what will happen the next day.  

Formally

$$
P(X_{i+1}=h | X_i=h) = 0.5 \quad \text{and therefore}  \quad P(X_{i+1}=\neg h | X_i=h) = 0.5 \\
$$



If they don't hate you at $t=i$, your odds are actually better the next day. The chance of them hating you all of a sudden are small, a mere 10%. 


$$
P(X_{i+1}=h | X_i=\neg h) = 0.1 \quad \text{and therefore}  \quad  P(X_{i+1}=\neg h | X_i=\neg h) = 0.9 
$$

---
### Evidence (Sensor) Model - $P(e_i|X_i)$

The only way for us to guess if they hate us or not is based on whether they sent us a curt email. From talking to other students, we discover that there's a pattern to their email sending conditioned on their feelings for us. If they hate us, they are almost certain to sent us a curt email:

$$
P(e_i=curt | X_i=h) = 0.99 \quad \text{and therefore}  \quad  P(e_i=\neg curt | X_i=h) = 0.01 \\
$$

Again, we will shorten to $P(e_i=c)$.

It turns out that if they don't hate us, they will still usually send a curt email, just because they're insanely busy. In fact that will happen 70% of the time they don't hate us.

$$
P(e_i=c | X_i=\neg h) = 0.7 \quad \text{and therefore}  \quad  P(e_i=\neg c | \ X_i=\neg h) = 0.3 \\
$$


### (a) Implement the initiatialization of your HMM class using the information above.

In [None]:
startprob = None
transmat = None
emissionprob = None

# Set up matrices representing the above model and store them in the above variables

# YOUR CODE HERE
raise NotImplementedError()

The current belief $B(X_1)$ should be 50%-50%.

In [None]:
model = BinaryHMM("Hates", startprob, transmat, emissionprob)
model.print_belief()

assert (model._belief == [0.5, 0.5]).all()

# Time Passage / Dynamics

You wait a day, and now have a new belief based on your Transition Model. This is $B^*(X_2)$, or the **intermediate** belief after time has passed, but **before** you got today's email. 

This time step belief update is given by

$$
B^*(X_{t+1}) = \sum_{x_t} B(x_t)P(X_{t+1} | x_t)
$$

### (b) Show your calculation of this updated belief using the data listed above. 

YOUR ANSWER HERE

### (c) Now implement the `transition` function of the `BinaryHMM` class. 

Note that this calculation can be compactly represented as a dot product of the transition matrix and the previous day's belief. 


In [None]:
model.print_belief()

In [None]:
# Test your implementation to see that it results in the correct belief
model.transition()
model.print_belief()



# Evidence Observation

Now you got your daily email, and what do you know? It's curt. 

You want to update your belief based on the evidence and get $B(X_2)$

This evidence-based belief update is done by re-weighing each of your belief values by the likelihood (the Bayesian "flip") of the evidence given the belief value. In other words, you take your intermediate belief $B^*(X_{t+1})$ and for each value you multiply it by the probability to have seen the specific evidence given that value, normalized by the posterior to sum up to 1.

Formally:

$$
B(X_{t+1}) = \frac{B^*(x_{t+1})P(e_{t+1} | x_{t+1})}{P(e_{t+1})}
$$

Remember that the denominator is just the sum of all the values for the numerator. 

### (d) Show your calculation of this updated belief using the information listed above. 

YOUR ANSWER HERE

### (e) Now implement the `evidence` function of the `BinaryHMM` class.

In [None]:
model.evidence(True)
model.print_belief()



### (f) Analyze these result and comment

Did the evidence sway the outcome much? Why or why not? How did the transition model and the evidence affect the observed probabilities?  

YOUR ANSWER HERE

Let's see what happens when we run this for twenty days, with you receiving a curt email every day.

In [None]:
# Reset the model
model = BinaryHMM("Hates", startprob, transmat, emissionprob)

belief_seq = []
for i in range(20):
    
    model.transition()
    belief_seq.append(model._belief[0])

    model.evidence(True)
    belief_seq.append(model._belief[0])

plt.plot(belief_seq, 'r-')
plt.gca().set_ylim(0, .4)

Every day that passes the probability of hatred goes down ($B^*$ before evidence), and every curt email, it goes up. Eventually, your belief should settle on an oscillation between 0.21 and 0.27. This convergence is due to the fact that our evidence is constant. 

What would happen if we get a long email all of a sudden? Try to guess before running the code.

In [None]:
model.transition()
belief_seq.append(model._belief[0])

model.evidence(False)
belief_seq.append(model._belief[0])

plt.plot(belief_seq, '-r')
plt.gca().set_ylim(0, .4)


You should see that since the likelihood of a long email when the advisor hates you is so low (0.01), even a single evidence of a long email drastically reduces the belief $B(X_t=h)$ to less than 0.01. Numerically speaking, this drop in probability happened when the likelihood of that event (long email given hate) was multiplied with the already low probability of the hidden state (hating).

Now experiment with different sequences and models of the system. 

What happens if after one long email, responsese go back to being short? 

What if the advisor does hold grudges? How would you model that? 

What if they are likely to send long emails if they don't hate you? 

How do these change affect the dynamics of the inference system?

Enjoy, and remember: your advisor is just busy!

![Average time spent composing one e-mail](http://phdcomics.com/comics/archive/phd072508s.gif)
