# Hidden Markov Model: Tensor Flow Implementation   
- Author: Alejandro Meza


## Introduction

"The Hidden Markov Model is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution. Transitions among the states are governed by a set of probabilities called transition probabilities." (http://jedlik.phy.bme.hu/~gerjanos/HMM/node4.html)


A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent (or "hidden") Markov process (referred to as X). An HMM requires that there be an observable process Y whose outcomes depend on the outcomes of X in a known way. Since X cannot be observed directly, the goal is to learn about state of X by observing Y.
(https://en.wikipedia.org/wiki/Hidden_Markov_model)

Basically, we're going to have a bunch of observations that depend on some hidden states. The value of a state X depends only on the value of state X-1 (the previous state). The goal is to obtain the hidden states that maximize the probability of seeing the current observation.


## Important concepts


- States: Represents different situations or conditions in the model.

- Observations: Represents the observable outcomes associated with each state.

- Initial Distribution: specifies the probabilities of starting in each state. I

- Transition distribution: Defines the probabilities of transitioning from one state to another.

- Observation distribution/emission probability: Describes the likelihood of observing a particular outcome given the current hidden state.

## Real case application

One real application of the hidden markov model is the following:

Hidden Markov Models (HMMs) have been widely used in Natural Language Processing (NLP), particularly for Part-of-Speech (POS) tagging. POS tagging involves assigning grammatical categories (such as noun, verb, adjective, etc.) to each word in a sentence. Here's how HMMs are applied in POS tagging:

- Model Representation: in the context of POS tagging, an HMM can be represented with hidden states corresponding to POS tags (e.g., noun, verb) and observable events corresponding to words in a sentence.

- Hidden States: hidden states in the HMM represent the underlying POS tags. For example, you may have hidden states like "Noun," "Verb," "Adjective," etc.

- Observations are the words in a given sentence. Each word is associated with a specific POS tag. The goal is to infer the most likely sequence of hidden states (POS tags) given the observed sequence of words.

- Transition Probabilities: transition probabilities in the HMM represent the likelihood of transitioning from one POS tag to another. For example, the probability of transitioning from a noun to a verb.


- Emission Probabilities: emission probabilities represent the likelihood of observing a specific word given a particular POS tag. For example, the probability of observing the word "run" given the POS tag "Verb."

After this, it is the moment to see how can we solve two different problems using **tensorflow**.

## **USE CASE 1**  -Categorical Data-

##SOLVING WIKIPEDIA EXAMPLE

Consider two friends, Alice and Bob, who live far apart from each other and who talk together daily over the telephone about what they did that day. Bob is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is determined exclusively by the weather on a given day. Alice has no definite information about the weather, but she knows general trends. Based on what Bob tells her he did each day, Alice tries to guess what the weather must have been like.

Alice believes that the weather operates as a discrete Markov chain. There are two states, "Rainy" and "Sunny", but she cannot observe them directly, that is, they are hidden from her. On each day, there is a certain chance that Bob will perform one of the following activities, depending on the weather: "walk", "shop", or "clean". Since Bob tells Alice about his activities, those are the observations. The entire system is that of a hidden Markov model (HMM).

Alice knows the general weather trends in the area, and what Bob likes to do on average. In other words, the parameters of the HMM are known. They can be represented as follows in Python:

(https://en.wikipedia.org/wiki/Hidden_Markov_model)

In [None]:
#REQUIREMENTS

#Possible states based on the activities that Bob performs. Hidden Markov Process X
states = ("Rainy", "Sunny")

#Observations that atre vissible: Y
observations = ("walk", "shop", "clean")

#Weather trends
start_probability = {"Rainy": 0.6, "Sunny": 0.4}

transition_probability = {
    "Rainy": {"Rainy": 0.7, "Sunny": 0.3},
    "Sunny": {"Rainy": 0.4, "Sunny": 0.6},
}

emission_probability = {
    "Rainy": {"walk": 0.1, "shop": 0.4, "clean": 0.5},
    "Sunny": {"walk": 0.6, "shop": 0.3, "clean": 0.1},
}

So, in our case, Alice cannot observe the weather, but she knows the activities that Bob performs.

Here, we have two hidden states: "Rainy" and "Sunny." For each hidden state, there is a set of emission probabilities associated with the observable activities ("walk", "shop", "clean"). Each entry in the emission_probability dictionary provides the probability of observing a particular activity given the current hidden state.


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/HMMGraph.svg/1280px-HMMGraph.svg.png" alt="Colab Image" width="400" height="300">

##Definition of our system
Now, let's solve the problem!

In [None]:
import tensorflow_probability as tfp  # We are using a different module from tensorflow this time
import tensorflow as tf

In [None]:
tfd = tfp.distributions  # making a shortcut for later on

# 1) Define states and observations
states = ("Rainy", "Sunny")
activities = ("walk", "shop", "clean")

# Initial distribution: probability for each state
start_probability = {"Rainy": 0.6, "Sunny": 0.4}
initial_distribution = tfd.Categorical(probs=[start_probability[state] for state in states])

# 2) Transition distribution: probability of transitions between transitions
transition_probability = {
    "Rainy": {"Rainy": 0.7, "Sunny": 0.3},
    "Sunny": {"Rainy": 0.4, "Sunny": 0.6},
}
transition_distribution_probs = [[transition_probability[from_state][to_state] for to_state in states] for from_state in states]
transition_distribution = tfd.Categorical(probs=transition_distribution_probs)


# 3) Observation distribution: values associated with each state and observation.
#The probability of Y given a hidden X
emission_probability = {
    "Rainy": {"walk": 0.1, "shop": 0.4, "clean": 0.5},
    "Sunny": {"walk": 0.6, "shop": 0.3, "clean": 0.1},
}
observation_distribution_probs = [[emission_probability[state][activity] for activity in activities] for state in states]
observation_distribution = tfd.Categorical(probs=observation_distribution_probs)

##Test our system

After defining our markov system, it's moment to test it, and make predictions with it!

In [None]:
# Generate a sample sequence
num_steps = 10  # specify the length of the sequence
hidden_states = [states[initial_distribution.sample().numpy()]]  # initialize with the initial state

#predict next state, based on the transition distribution
for _ in range(1, num_steps):
    #generates a single sample, and convert this sample from a tensor to a numpy array.
    #Finally, it obtains the value, and stores it in next_state variable
    next_state = states[transition_distribution.sample(sample_shape=(1,)).numpy()[0][0]]
    hidden_states.append(next_state)

# Sample observations based on sampled hidden states
observations = [activities[observation_distribution.sample(sample_shape=(1,)).numpy()[0][0]] for _ in range(num_steps)]

# Print the sampled sequence in a readable format
print("Hidden States:")
print(hidden_states)

print("\nObserved States:")
print(observations)

Hidden States:
['Rainy', 'Sunny', 'Rainy', 'Sunny', 'Rainy', 'Rainy', 'Rainy', 'Rainy', 'Rainy', 'Sunny']

Observed States:
['clean', 'clean', 'shop', 'shop', 'shop', 'shop', 'clean', 'clean', 'clean', 'clean']


And that's it! We have made a succesfull implementation of a markov model using tensorflow ^^

## **USE CASE 2**  -Discrete Data-

We are going to try to predict the temperature given the following requisites:
- Cold days are encoded by a 0 and hot days are encoded by a 1.
- The first day in our sequence has an 80% chance of being cold.
- A cold day has a 30% chance of being followed by a hot day.
- A hot day has a 20% chance of being followed by a cold day.
- On each day the temperature is normally distributed with mean and standard deviation 0 and 5 on a cold day and mean and standard deviation 15 and 10 on a hot day.

I used the following material for this example:
-  https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/HiddenMarkovModel  
- https://colab.research.google.com/drive/15Cyy2H7nT40sGR7TBN5wBvgTd57mVKay#forceEdit=true&sandboxMode=true&scrollTo=ssOcn-nIOCcV

## Definition of our system

In [None]:
tfd = tfp.distributions

# Initial distribution: probability for each state
initial_distribution = tfd.Categorical(probs=[0.2, 0.8])

# 2) Transition distribution: probability of transitions between transitions
transition_distribution = tfd.Categorical(probs=[[0.7, 0.3], #cold: cold, hot
                                                 [0.2, 0.8]])  #hot: cold, hot

# 3) Observation distribution: values associated with each state and observation.
#The probability of Y given a hidden X
emission_probability = tfd.Normal(loc=[0., 15.], scale=[5.0, 10.0])

## Test our system

In [None]:
#num_steps: how far you want to predict ahead in time new states
NUMBER_STATES_TO_PREDICT=7

model = tfd.HiddenMarkovModel(
    initial_distribution=initial_distribution,
    transition_distribution=transition_distribution,
    observation_distribution= emission_probability,
    num_steps=NUMBER_STATES_TO_PREDICT)

In [None]:
mean = model.mean()

# due to the way TensorFlow works on a lower level we need to evaluate part of the graph
# from within a session to see the value of this tensor

# in the new version of tensorflow we need to use tf.compat.v1.Session() rather than just tf.Session()
with tf.compat.v1.Session() as sess:
  print(mean.numpy())

[11.999999 10.500001  9.75      9.375     9.1875    9.09375   9.046875]


And that's it, we have performed another good example of this interesting model. If you're willing to know more, please check this online resources:

- https://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf
- https://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf
- https://www.youtube.com/watch?v=fX5bYmnHqqE