<a href="https://colab.research.google.com/github/jrg94/CSE5522/blob/lab3/lab3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CSE 5522 - Lab 3
By Jeremy Grifski

In this lab, we'll take a look at Hidden Markov Models (HMMs) for the Eisner Ice Cream Problem. 

## Part 1: Viterbi Algorithm

Implement the Viterbi algorithm for HMMs for Eisner's Ice Cream Problem (predict whether each day is hot or cold based on the number of ice creams eaten).  Remember that the Viterbi algorithm computes the most likely sequence for an input.

Your solution should be able to handle variable length sequences (in the range of 3-5).

[This zip file has observation probabilities, transition probabilities, and test data for evaluation](https://osu.instructure.com/courses/76815/files/18485497/download).  Please read the probabilities and observations from a file, do not hard-code them. (This is so that we can test with different data/probabilities.)

The observation and transition probabilities have rows being the variable of interest, and columns being the conditioning variables.    For example, P(2|H) is in the 3rd row (including header), 3rd column (including row label).  The columns sum to 1.

The test data has one line per sequence.  When a sequence is less than five observations long, the last columns are filled with zeros.

**1.0**: Let's setup the environment for data loading.

In [0]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

**1.1**: Now, we'll need to load all the data from the CSV files.

In [0]:
observation_dataframe = pd.read_csv("observationProbs.csv")
test_dataframe = pd.read_csv("testData.csv")
transition_dataframe = pd.read_csv("transitionProbs.csv")

**1.2**: Let's now take a peak at our data.

In [6]:
display(
    observation_dataframe.shape,
    observation_dataframe.head(), 
    test_dataframe.shape, 
    test_dataframe.head(),
    transition_dataframe.shape,
    transition_dataframe.head()
)

(3, 3)

Unnamed: 0,P(x|...),C,H
0,1,0.6407,0.0002
1,2,0.1481,0.5341
2,3,0.2122,0.4657


(10, 6)

Unnamed: 0,SeqNumber,Obs1,Obs2,Obs3,Obs4,Obs5
0,1,2,3,3,2,3
1,2,2,3,2,2,0
2,3,3,1,3,3,1
3,4,2,1,1,0,0
4,5,1,1,1,2,3


(3, 4)

Unnamed: 0,P(x|...),C,H,START
0,C,0.86,0.07,0.5
1,H,0.07,0.86,0.5
2,STOP,0.07,0.07,0.0


**1.3**: With our data loaded, we can begin to construct our M and C matrices. 

In [14]:
m_height = test_dataframe.shape[1] - 1  # ASSUMES HEIGHT OF 5 (see zeroes in a given sequence)
m_width = observation_dataframe.shape[1] - 1
m = np.zeros(shape=(m_height, m_width))
c = np.zeros(shape=(m_height - 1, m_width))
display(m, c)

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

**1.4**: At this point, we can initialize the first row of the m matrix using the following formula: M<sub>1, k</sub> = π<sub>k</sub>B<sub>k,y<sub>1</sub></sub>. Here, π represents the prior probabilities and B is the emission probability.

We can get π from the START column of our transition matrix. Meanwhile, we can get B from a sequence in our test data and our observation data.  

In [32]:
COND_PROB_LABEL = "P(x|...)"
HOT_LABEL = "H"
COLD_LABEL = "C"
START_LABEL = "START"

p_hot_given_start = transition_dataframe.loc[transition_dataframe[COND_PROB_LABEL] == HOT_LABEL, START_LABEL].iloc[0]
p_cold_given_start = transition_dataframe.loc[transition_dataframe[COND_PROB_LABEL] == COLD_LABEL, START_LABEL].iloc[0]

p_two_scoops_given_hot = observation_dataframe.loc[observation_dataframe[COND_PROB_LABEL] == 2, HOT_LABEL].iloc[0]
p_two_scoops_given_cold = observation_dataframe.loc[observation_dataframe[COND_PROB_LABEL] == 2, COLD_LABEL].iloc[0]

display(
    p_hot_given_start,
    p_cold_given_start,
    p_two_scoops_given_hot,
    p_two_scoops_given_cold
)

0.5

0.5

0.5341

0.1481

## Part 2: Likelihood Sampling



Using the same network, implement likelihood sampling for approximate inference.  For any test sequence, sample complete sequences of the hidden states n times, where n can range from 10 to 100000 samples. The goal is to approximate the likelihood of all possible sequences.

Assuming the Viterbi sequence is "correct", how long (how many samples) does it take the sampler to converge so that you get the highest match between samples and the Viterbi sequence?

How do I sample a sequence?  In essence, pick a length (3, 4, or 5) - pick the same lengths as each test sample.  Then, sample each weather-day (Hot/Cold) according to the distribution given by the transition network.  You will need to sample Day 1 before sampling Day 2, for example.  You will then have a complete sample of sequence length 3/4/5).  The weight of that sequence sample will be the product of the observation probabilities given the sample (why?).  You can then judge by the overall weight which the most likely weather sequence would be.  Does the best string match your Viterbi answer?

Note: Technically, in the original problem there is the probability of sampling STOP given either HOT or COLD.  For this section of the homework, please just remove the STOP probability and renormalize the other two probabilities so that they sum to one.