# Day 18 notebook

The objectives of this notebook are to practice using a hidden Markov model (HMM) to

* simulate sequences
* calculate the (log) joint probability of a sequence and path of hidden states
* predict a most probable hidden path of states for a sequence 

In [1]:
# Modules used in this activity
import random  # used by sample_categorical
import math    # for log

## A `HiddenMarkovModel` class
In this activity we will implement a hidden Markov model as a class.  You will be implementing three methods of this class:

* one to simulate sequences from the hidden Markov model (`simulate`),
* another that computes the (log) joint probability of a hidden state path and observed sequence (`log_joint_probability`), 
* and one that computes the Viterbi dynamic programming matrix for an observed sequence (`viterbi_matrix`).

Like the Markov chain activity, we will be representing each state by a single character and a path of states as a string of state characters.  We will also be representing an observed sequence as a string of characters.  Again, like the Markov chain activity, the transition probability matrix, initial probabilities, and emission probabilty matrix will be indexed by integers corresponding to the indices of the characters within the state string of the model and the characters within the observed character string.  Methods are provided that convert from a string of state or observed characters to a list of indices and vice versa, for your convenience.

The hidden Markov Model class that we implement will not explicitly represent a begin state and will instead represent the probability of starting in any given state with an `initial_probs` list of probabilities, as in the `MarkovModel` class.  In addition, we will not represent an end state in our class.

Provided are parameters for the occasionally dishonest casino described in the lecture/textbook, which are then used to construct instances of the HiddenMarkovModel class.

In [3]:
class HiddenMarkovModel:
    def __init__(self, states, chars, 
                 transition_prob_matrix, initial_probs, emission_prob_matrix):
        """Initializes a HiddenMarkovModel
        
        Models represented by this class do not explicitly represent a begin state and do
        not allow for an end state.
        
        Args:
            states: a string giving the characters representing the hidden states
                of the model (1 character per state)
            chars: a string giving the set of characters possibly emitted by the
                states of the model
            transition_prob_matrix: a list of lists of probabilities representing a
                transition probability matrix. transition_prob_matrix[s][t] should equal 
                P(pi_i = t | pi_{i-1} = s). Row s is thus the conditional probability 
                distribution P(pi_i | pi_{i-1} = s). The indices in this matrix correspond 
                to the indices of the states in the states argument
            initial_probs: a list of probabilities representing the initial state 
                probabilities. Entry s of this list is P(pi_1 = s), i.e., the probability that
                the first hidden state in the chain is s.  The indices of this list correspond to the
                indices of the states in the states argument.
            emission_prob_matrix: a list of lists of probabilities representing an emission
                probability matrix.  emission_prob_matrix[s][c] should equal 
                P(X_i = c | pi_i = s), i.e., the probability of state s emitting character c. 
                Row s is thus the conditional probability distribution P(X_i | pi_i = s).
                The row indices of this matrix correspond to the indices of the states in
                the states argument.  The column indices of the matrix correspond to the 
                indices of the characters in the chars argument.
        
        """
        self.states = states
        self.chars = chars
        self.transition_prob_matrix = transition_prob_matrix
        self.initial_probs = initial_probs
        self.emission_prob_matrix = emission_prob_matrix
        
        # Precompute log-transformations of the model parameters
        # to avoid computing these many times
        self.log_transition_prob_matrix = log_transform_matrix(transition_prob_matrix)
        self.log_initial_probs = log_transform_vector(initial_probs)
        self.log_emission_prob_matrix = log_transform_matrix(emission_prob_matrix)
    
    def encode_states(self, state_sequence):
        """Encodes a string of state characters as a list of indices of the states."""
        return [self.states.index(char) for char in state_sequence]

    def decode_states(self, indices):
        """Decodes a sequence of state indices into a string of the state characters."""
        return "".join(self.states[index] for index in indices)

    def encode_sequence(self, sequence):
        """Encodes a string of observed characters as a list of indices of the characters."""
        return [self.chars.index(char) for char in sequence]

    def decode_sequence(self, indices):
        """Decodes a sequence of observed character indices into a string of characters."""
        return "".join(self.chars[index] for index in indices)
    
    def simulate(self, length):
        """Simulates a sequence of hidden states and emitted characters of
        the given length from this HMM.
        
        Args:
            length: the length of the sequence to simulate
        Returns:
            A tuple of the form (hidden_state_string, char_string) where hidden_state_string is a
            string of state characters and char_string is a string of observed characters.
        """
        ###
        ### YOUR CODE HERE
        state_indices = [None] * length
        char_indices = [None] * length
        for i in range(length):
            state_probs = self.transition_prob_matrix[state_indices[i - 1]] if i > 0 else self.initial_probs
            state_indices[i] = sample_categorical(state_probs)
            char_indices[i] = sample_categorical(self.emission_prob_matrix[state_indices[i]])
            
        return (self.decode_states(state_indices), self.decode_sequence(char_indices))
        ###
        
    def log_joint_probability(self, hidden_state_string, char_string):
        """Calculates the (natural) log joint probability of a path of hidden states
        and an observed sequence given this HMM.
        
        Args:
            hidden_state_string: a string representing the sequence of hidden states (pi)
            char_string: a string representing the sequence of observed characters (X)
        Returns:
            log(P(hidden_states, observed_chars))
        """
        state_indices = self.encode_states(hidden_state_string)
        char_indices = self.encode_sequence(char_string)

        ###
        ### YOUR CODE HERE
        log_p = 0.0
        last_state_index = None
        for state_index, char_index in zip(state_indices, char_indices):
            if last_state_index is None:
                log_p += self.log_initial_probs[state_index]
            else:
                log_p += self.log_transition_prob_matrix[last_state_index][state_index]
            log_p += self.log_emission_prob_matrix[state_index][char_index]
            last_state_index = state_index
        return log_p
        ###
        
    def most_probable_path(self, char_string):
        """Computes a most probable path of hidden states for the observed sequence."""
        V = self.viterbi_matrix(char_string)
        return self.viterbi_traceback(V)
        
    def viterbi_matrix(self, char_string):
        """Computes the (log-transformed) Viterbi dynamic programming matrix V for
        the given observed sequence.

        Args:
            char_string: a string representing the sequence of observed characters (X)
        Returns:
            A matrix (list of lists) representing the Viterbi dynamic programming matrix,
            with rows corresponding to states and columns corresponding to positions in the
            sequence.
        """
        char_indices = self.encode_sequence(char_string)
        
        # Initialize the viterbi dynamic programming matrix
        # the entry V[k][i] corresponds to the subproblem V_k(i+1)
        # where i is a 0-based index (e.g., V[k][0] corresponds to the subproblem
        # of the most probable path of the prefix of length = 1). We will not explicitly
        # represent the begin or end states.  As a result, we will not explicitly store the
        # initialization values described in the textbook and lecture.
        V = matrix(len(self.states), len(char_string))
        if not char_string: return V
        
        # initialization (first position in sequence)
        for ell in range(len(self.states)):    # loop over hidden state indices
            V[ell][0] = (self.log_initial_probs[ell] + 
                         self.log_emission_prob_matrix[ell][char_indices[0]])

        # main fill stage
        for i in range(1, len(char_string)):    # loop over positions
            for ell in range(len(self.states)): # loop over hidden state indices
                ###
                ### YOUR CODE HERE
                V[ell][i] = (self.log_emission_prob_matrix[ell][char_indices[i]] + 
                             max(V[k][i - 1] + self.log_transition_prob_matrix[k][ell]
                                 for k in range(len(self.states))))
                ###

        return V
    
    def viterbi_traceback(self, V):
        """Computes a most probable path given a (log) Viterbi dynamic programming matrix.
        
        Uses a traceback procedure that does not require traceback pointers.  In the case of
        ties, this traceback prefers the state with the largest index.
        
        Args:
            V: A matrix (list of lists) representing the Viterbi dynamic programming matrix
               containing log-transformed values.
        Returns:
            A string representing a most probable sequence of hidden states
        """
        L = len(V[0])               # deduce the length of the sequence from # columns in V
        if L == 0: return ""        # empty string base case
        state_indices = [None] * L  # initialize hidden state path
        
        # determine the state at the last position in a most probable path
        max_prob, max_state = max((V[k][L - 1], k) for k in range(len(self.states)))
        state_indices[L - 1] = max_state
        
        # traceback from this last state by redoing the recurrence calculation at each step.
        # the emission probabilities are not included in the calculations because they are
        # irrelevant for determining the maximizing state
        for i in range(L - 1, 0, -1):
            max_prob, max_state = max((V[k][i - 1] + self.log_transition_prob_matrix[k][max_state], k)
                                      for k in range(len(self.states)))
            state_indices[i - 1] = max_state
            
        # return string representation of hidden state path
        return self.decode_states(state_indices)

def log_transform_vector(v):
    """Returns a new vector (a list) with log-transformed values"""
    return list(map(math.log, v))

def log_transform_matrix(m):
    """Returns a new matrix (a list of lists) with log-transformed values"""
    return list(map(log_transform_vector, m))

def round_matrix(m, digits=2):
    """Returns a new matrix (a list of lists) with rounded values"""
    return [round_vector(v, digits) for v in m]
    
def round_vector(v, digits=2):
    """Returns a new vector (a list) with rounded values"""
    return [round(x, digits) for x in v]

def matrix(num_rows, num_cols, initial_value=None):
    """Constructs a matrix (a list of lists)"""
    return [[initial_value] * num_cols for i in range(num_rows)]

# Using the class above, we construct an HMM for the occasionally dishonest casino example
# described in the lecture and textbook
casino_states = "FL"     # F = fair die, L = loaded die
casino_chars = "123456"  # the six sides of the die
casino_initial_probs = [0.5, 0.5]
casino_transition_prob_matrix = [
    [0.95, 0.05],
    [0.10, 0.90]
]

casino_emission_prob_matrix = [
    [ 1/6,  1/6,  1/6,  1/6,  1/6, 1/6],
    [1/10, 1/10, 1/10, 1/10, 1/10, 1/2]
]
casino_hmm = HiddenMarkovModel(casino_states, 
                               casino_chars, 
                               casino_transition_prob_matrix, 
                               casino_initial_probs,
                               casino_emission_prob_matrix)

Below is a function that you will need to use in implementing the `simulate` method.

In [4]:
def sample_categorical(distribution):
    """Randomly sample from a categorical distribution (a discrete distribution over K categories).
    
    Args:
        distribution: a list of probabilities representing a discrete distribution over K categories.
    Returns:
        The index of the category sampled.
    """
    r = random.random()
    for i, prob in enumerate(distribution):
        if r < prob:
            return i
        else:
            r -= prob
    # in case we encounter floating point issues return the last index
    return len(distribution) - 1

## PROBLEM 1: Simulate a hidden state path and sequence from an HMM (1 POINT)

Implement the `simulate` method of the `HiddenMarkovModel` class.  You should call the `sample_categorical` function provided to you above to sample each state and emission.  *IMPORTANT IMPLEMENTATION NOTE:* you should simulate the random variables in the following order so that you may pass the tests: $\pi_1, x_1, \pi_2, x_2, \ldots, \pi_L, x_L$.

In [5]:
# tests for simulate
random.seed(8)
assert casino_hmm.simulate(1) == ('F', '6')
random.seed(8)
assert casino_hmm.simulate(2) == ('FF', '65')
random.seed(8)
assert casino_hmm.simulate(4) == ('FFFL', '6523')
random.seed(8)
assert casino_hmm.simulate(10) == ('FFFLLLLFFF', '6523556226')
random.seed(17)
assert casino_hmm.simulate(10) == ('LLLLFFFFFF', '6362322665')
print("SUCCESS: simulate passed all tests!")

SUCCESS: simulate passed all tests!


## PROBLEM 2: Calculate the (log)  joint probability of a hidden path of states and an observed sequence given a hidden Markov model (1 POINT)

Implement the `log_joint_probability` method of the `HiddenMarkovModel` class.  To avoid numerical issues, be sure to implement this as a sum of log-transformed probability parameters from the model. If you implement this by taking the logarithm of the product of the probabilities, you will run into numerical problems for long sequences.

In [6]:
# tests for log_joint_probability
assert round(casino_hmm.log_joint_probability('F', '6'), 2) == -2.48
assert round(casino_hmm.log_joint_probability('L', '1'), 2) == -3.0
assert round(casino_hmm.log_joint_probability('FL', '35'), 2) == -7.78
assert round(casino_hmm.log_joint_probability('LF', '24'), 2) == -7.09
assert round(casino_hmm.log_joint_probability('LL', '24'), 2) == -5.4
assert round(casino_hmm.log_joint_probability('LFL', '246'), 2) == -10.78
assert round(casino_hmm.log_joint_probability('FFFLLLLFFF', '6523556226'), 2) == -24.86
assert round(casino_hmm.log_joint_probability('FL' * 100, '16' * 100), 2) == -776.71
print("SUCCESS: log_joint_probability passed all tests!")

SUCCESS: log_joint_probability passed all tests!


## PROBLEM 3: Computing the Viterbi dynamic programming matrix (1 POINT)

Implement the `viterbi_matrix` method of the `HiddenMarkovModel` class, which computes the (log) Viterbi dynamic programming matrix given an observed sequence.  You do not need to keep track of traceback pointers.  Provided is a traceback method that does not require traceback pointers.

Your Viterbi matrix should use log-transformed values to avoid numerical issues.  Note that the `HiddenMarkovModel` class precomputes log-transformed parameters, which you should use for convenience and efficiency.

In [7]:
# tests for viterbi_matrix
assert round_matrix(casino_hmm.viterbi_matrix('6')) == [[-2.48], 
                                                        [-1.39]]
assert round_matrix(casino_hmm.viterbi_matrix('1')) == [[-2.48], 
                                                        [-3.0]]
assert round_matrix(casino_hmm.viterbi_matrix('16')) == [[-2.48, -4.33], 
                                                         [-3.0, -3.79]]
assert round_matrix(casino_hmm.viterbi_matrix('165')) == [[-2.48, -4.33, -6.17], 
                                                          [-3.0, -3.79, -6.2]]
assert round_matrix(casino_hmm.viterbi_matrix('666661111')) == [
    [-2.48, -4.33, -6.17, -7.08, -7.88, -8.67, -10.52, -12.36, -14.2],
    [-1.39, -2.18, -2.98, -3.78, -4.58, -6.99,   -9.4,  -11.8, -14.21]]
assert round_matrix(casino_hmm.viterbi_matrix('4631262516')) == [
    [-2.48, -4.33, -6.17, -8.01, -9.86, -11.7, -13.54, -15.39, -17.23, -19.07],
    [-3.0, -3.79, -6.2, -8.61, -11.02, -11.82, -14.22, -16.63, -19.04, -19.84]]
print("SUCCESS: viterbi_matrix passed all tests!")

SUCCESS: viterbi_matrix passed all tests!


### Exploration activity: how well does the most probable path predict the true path of hidden states?

Now that you have successfully implemented the Viterbi algorithm, try simulating a large number of sequences from the casino HMM and see how well the most probable path (obtained by calling the `most_probable_path` method) matches the true path of hidden states.  How accurate is the most probable path in predicting the truth when the model we are using for prediction is the same as the model used for simulation?  

In [None]:
###
### YOUR CODE HERE
###


###
### Your thoughts here
###
