# Machine Learning Fundamentals Project
## Sentiment Analysis with Hidden Markov Model

Here is an overview of everyone who contributed to the project:

|First name|Last name|Master program|Contribution|
|----------|---------|--------------|-------------|
|Anna Lena Katharina|Braun|IMLEX|25%|
|Aryaman|Sharma|IMLEX|25%|
|Raffael|Rizzo|IMLEX|25%|
|Shani|Israelov|IMLEX|25%|

# Imports

In [21]:
!pip install hmmlearn==0.2.6
from hmmlearn import hmm

import numpy as np  # linear algebra
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
from tqdm import tqdm
from matplotlib import pyplot as plt  # show graph

from sklearn.model_selection import GroupShuffleSplit
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, precision_score, recall_score, \
    f1_score, roc_auc_score

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [22]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In this notebook we will look at the NER dataset and use it to understand HMM and also construct a POS tagger at the same time.

# Load data

## Data Description 

Imagine a dense jungle of words, where language constructs form dense thickets and
sentences rise like towering trees. Named Entity Recognition is the art of slicing
through these thickets, identifying and categorizing the entities that hide within. For
our group project, we will use the Hidden Markov Model to perform Named Entity
Recognition.


Kaggle hosts a popular dataset for this objective, with the convenient
name of Named Entity Recognition (NER). It can be accessed under this link: https://www.kaggle.com/datasets/debasisdotcom/name-entity-recognition-ner-dataset. 

The NER dataset consists of a collection of annotated sentences designed to
train and evaluate machine learning models for the task of named entity recognition.
It contains labeled sentences from various sources, like news articles, and spans mul-
tiple domains auch as sports, politics, and entertainment. The goal of an NER model
trained on this dataset is to learn how to identify and classify named entities within
a given text.


The NER dataset consists of four columns:


* **Sentence (or ID):** This column contains
a unique identifier for each sentence in
the dataset. The identifies distinguishes and references individual sentences in the
NER dataset.
* **Word:** This column contains individual words, also called tokens, which form the
sentence. Hence, each row represents a single word from the sentence.
* **POS (Part of Speech):** This column contains the part of speech tag for the
corresponding word.
* **Tag:** This column represents the named entity tag associated with the word. The
tags are usually in the IOB format (Inside, Outside, Beginning) and are categorized
into different classes like B-PER (beginning of a person’s name), I-PER (inside a
person’s name), B-ORG (beginning of an organization name), I-ORG (inside an
organization name), B-LOC (beginning of a location name), I-LOC (inside a location
name), and O (outside any named entity).

In [23]:
from google.colab import drive

drive.mount('/content/drive')

data = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/NER dataset.csv", encoding='latin1')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [24]:
data = data.fillna(method="ffill")
data = data.rename(columns={'Sentence #': 'sentence'})
data.head(5)

Unnamed: 0,sentence,Word,POS,Tag
0,Sentence: 1,Thousands,NNS,O
1,Sentence: 1,of,IN,O
2,Sentence: 1,demonstrators,NNS,O
3,Sentence: 1,have,VBP,O
4,Sentence: 1,marched,VBN,O


# Data pre-processing
If you want to do some pre-processing (lowercase any words, remove stop words, replace numbers/names by a unique NUM/NAME token, etc.) you can do it here in the pipeline.

Note : you could create a new dataset `data_pre_precessed = pre_process(data)` to keep both version and compare the effect of you pre-processing.

In [25]:
def pre_processing(df):
    # Select the columns to apply lowercase transformation
    columns_to_lowercase = ['sentence', 'Word']
    
    # Apply lowercase transformation to the selected columns
    df[columns_to_lowercase] = df[columns_to_lowercase].applymap(str.lower)
    
    # Return the modified DataFrame
    return df

# Example usage:
data = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/NER dataset.csv", encoding='latin1')
data = data.fillna(method="ffill")
data = data.rename(columns={'Sentence #': 'sentence'})

# Apply pre-processing to the DataFrame
data = pre_processing(data)
data.head(5)


Unnamed: 0,sentence,Word,POS,Tag
0,sentence: 1,thousands,NNS,O
1,sentence: 1,of,IN,O
2,sentence: 1,demonstrators,NNS,O
3,sentence: 1,have,VBP,O
4,sentence: 1,marched,VBN,O


First let's collect the unique words and the unique POS tags in the dataset, we will use this to construct the HMM later

In [26]:
tags = list(set(data.POS.values))  # Unique POS tags in the dataset
words = list(set(data.Word.values))  # Unique words in the dataset
len(tags), len(words)

(42, 31817)

### We have 42 different tags and 35,178 different words, so the HMM that we construct will have the following properties
- The hidden states of the this HMM will correspond to the POS tags, so we will have 42 hidden states.
- The Observations for this HMM will correspond to the sentences and their words.

#### Before constructing the HMM, we will split the data into train and test.

In [27]:
y = data.POS
X = data.drop('POS', axis=1)

gs = GroupShuffleSplit(n_splits=2, test_size=.33, random_state=42)
train_ix, test_ix = next(gs.split(X, y, groups=data['sentence']))

data_train = data.loc[train_ix]
data_test = data.loc[test_ix]

In [28]:
data_train.head(5)

Unnamed: 0,sentence,Word,POS,Tag
24,sentence: 2,families,NNS,O
25,sentence: 2,of,IN,O
26,sentence: 2,soldiers,NNS,O
27,sentence: 2,killed,VBN,O
28,sentence: 2,in,IN,O


In [13]:
data_test.head(5)

Unnamed: 0,sentence,Word,POS,Tag
0,sentence: 1,thousands,NNS,O
1,sentence: 1,of,IN,O
2,sentence: 1,demonstrators,NNS,O
3,sentence: 1,have,VBP,O
4,sentence: 1,marched,VBN,O


Now lets encode the POS and Words to be used to generate the HMM.

In [29]:
dfupdate = data_train.sample(frac=.15, replace=False, random_state=42)
dfupdate.Word = 'UNKNOWN'
data_train.update(dfupdate)
words = list(set(data_train.Word.values))
# Convert words and tags into numbers
word2id = {w: i for i, w in enumerate(words)}
tag2id = {t: i for i, t in enumerate(tags)}
id2tag = {i: t for i, t in enumerate(tags)}
len(tags), len(words)

(42, 25100)

In your theory classes you might have seen that the Hidden Markov Models can be learned by using the Baum-Welch algorithm by just using the observations.
Although we can learn the Hidden States (POS tags) using Baum-Welch algorithm,We cannot map them back the states (words) to the POS tag. So for this exercise we will skip using the BW algorithm and directly create the HMM.

For creating the HMM we should build the following three parameters. 
- `startprob_`
- `transmat_`
- `emissionprob_`

To construct the above mentioned paramters let's first create some useful matrices that will assist us in creating the above three parameters

In [30]:
count_tags = dict(data_train.POS.value_counts())  # Total number of POS tags in the dataset
# Now let's create the tags to words count
count_tags_to_words = data_train.groupby(['POS']).apply(
    lambda grp: grp.groupby('Word')['POS'].count().to_dict()).to_dict()
# We shall also collect the counts for the first tags in the sentence
count_init_tags = dict(data_train.groupby('sentence').first().POS.value_counts())

# Create a mapping that stores the frequency of transitions in tags to it's next tags
count_tags_to_next_tags = np.zeros((len(tags), len(tags)), dtype=int)
sentences = list(data_train.sentence)
pos = list(data_train.POS)
for i in tqdm(range(len(sentences)), position=0, leave=True):
    if (i > 0) and (sentences[i] == sentences[i - 1]):
        prevtagid = tag2id[pos[i - 1]]
        nexttagid = tag2id[pos[i]]
        count_tags_to_next_tags[prevtagid][nexttagid] += 1

100%|██████████| 702936/702936 [00:00<00:00, 878326.88it/s]


Now Let's build the parameter matrices 

In [31]:
startprob = np.zeros((len(tags),))
transmat = np.zeros((len(tags), len(tags)))
emissionprob = np.zeros((len(tags), len(words)))
num_sentences = sum(count_init_tags.values())
sum_tags_to_next_tags = np.sum(count_tags_to_next_tags, axis=1)
for tag, tagid in tqdm(tag2id.items(), position=0, leave=True):
    floatCountTag = float(count_tags.get(tag, 0))
    startprob[tagid] = count_init_tags.get(tag, 0) / num_sentences
    for word, wordid in word2id.items():
        emissionprob[tagid][wordid] = count_tags_to_words.get(tag, {}).get(word, 0) / floatCountTag
    for tag2, tagid2 in tag2id.items():
        transmat[tagid][tagid2] = count_tags_to_next_tags[tagid][tagid2] / sum_tags_to_next_tags[tagid]

100%|██████████| 42/42 [00:00<00:00, 64.31it/s]


# Task 1: 

Similar to how we built the hidden state transition probability matrix as shown above, you will built the transition probability between the words. With this matrix write a function that can calculate the log likelihood given a sentence.

# Our result:

In the code snippet underneath, we first initialize the word transition count matrix, which is a matrix of zeros with the shape (len(words), len(words)). This matrix represents the counts of transitions from one word to another in the training data. We then iterate through the words in the training data (data_train.Word) and update the count_words_to_next_words matrix. If two consecutive words belong to the same sentence, the count of the transition from the first word to the second word is incremented in the count_words_to_next_words matrix.

After that, we normalize the counts in the count_words_to_next_words matrix by dividing each row by its sum, creating a probability matrix. Each element in this matrix, which we call the word_transition_matrix, represents the probability of transitioning from one word to another.

We then implement the calculate_log_likelihood function that takes a sentence (as a list of words) and the word_transition_matrix as input. This function calculates the log-likelihood of the given sentence based on the word transition probabilities. The log-likelihood is the sum of the log probabilities of all transitions between consecutive words in the sentence. If a transition has a probability of zero, the log-likelihood is set to -np.inf.

Finally, we calculate the log likelihood of a sample sentence and the given sentences using the calculate_log_likelihood function. We split the words in the sentences and pass them to the function along with the word_transition_matrix. The resulting log likelihood for each sentence is then printed.

These steps are important for calculating the log likelihood of sentences based on word transition probabilities. By creating a matrix of word transition probabilities, we can analyze how likely it is for one word to follow another in a given sentence. The log likelihood can be used to compare different sentences or to evaluate the plausibility of a given sentence based on the training data.

In [32]:
import numpy as np
from typing import List

# Step 1: Initialize the word transition count matrix
count_words_to_next_words = np.zeros((len(words), len(words)), dtype=int)

# Step 2: Iterate through the training data and update the count_words_to_next_words matrix
words_list = list(data_train.Word)
for i in range(1, len(words_list)):
    if sentences[i] == sentences[i - 1]:
        prev_word_id = word2id[words_list[i - 1]]
        next_word_id = word2id[words_list[i]]
        count_words_to_next_words[prev_word_id][next_word_id] += 1

# Step 3: Normalize the counts to create the word_transition_matrix
word_transition_matrix = count_words_to_next_words / np.sum(count_words_to_next_words, axis=1, keepdims=True)

def calculate_log_likelihood(sentence: List[str], word_transition_matrix) -> float:
    log_likelihood = 0
    for i in range(1, len(sentence)):
        prev_word_id = word2id[sentence[i - 1].lower()]
        next_word_id = word2id[sentence[i].lower()]
        prob = word_transition_matrix[prev_word_id][next_word_id]
        log_likelihood += np.log(prob) if prob > 0 else -np.inf
    return log_likelihood

# Example usage
log_likelihood = calculate_log_likelihood(["This", "is", "a", "test", "sentence"], word_transition_matrix)
print(log_likelihood)

sentences = [
    "This is a protest about how the new law is not in the interest of the people",
    "The international conference will continue as planned on Friday",
    "Who are you ?",
    "You are not me",
    "Do you expect to be happy to work late"
]

for sentence in sentences:
    words_list = sentence.split()
    log_likelihood = calculate_log_likelihood(words_list, word_transition_matrix)
    print(f"Log likelihood for the sentence '{sentence}': {log_likelihood}")



-inf
Log likelihood for the sentence 'This is a protest about how the new law is not in the interest of the people': -70.96470273464142
Log likelihood for the sentence 'The international conference will continue as planned on Friday': -37.092219354590156
Log likelihood for the sentence 'Who are you ?': -15.9902178018753
Log likelihood for the sentence 'You are not me': -13.89888346686548
Log likelihood for the sentence 'Do you expect to be happy to work late': -35.11892547319604


  word_transition_matrix = count_words_to_next_words / np.sum(count_words_to_next_words, axis=1, keepdims=True)


#### Now we will continue to constructing the HMM.

We will use the hmmlearn implementation to initialize the HMM Model

In [33]:
model = hmm.MultinomialHMM(n_components=len(tags), algorithm='viterbi', random_state=42)
model.startprob_ = startprob
model.transmat_ = transmat
model.emissionprob_ = emissionprob

# Our solution

To use the HMM to predict the POS tags, we have to fix the training set as some of the words and tags in the test data might not appear in the training data so we collect this data to use it later. Here is how we did this (see the code underneath): 

We first process the test data by replacing any word not present in the training data with 'UNKNOWN'. This is done to ensure that the HMM model can handle words it has not seen before. We create a list of words in the test data called word_test.

Next, we create an empty list called samples, which will store the word IDs for the words in the test data. We iterate through the word_test list and append the word ID for each word to the samples list. This allows us to feed the test data to our HMM model in a numerical format.

Then, we want to calculate the lengths of sentences in the test data, as this information is needed when decoding the HMM model's predictions. We can use pandas to achieve this by calling the groupby method on data_test to group the data by sentence and then calculating the size of each group using the size() method. The resulting list of sentence lengths is stored in the lengths variable.

Alternatively, we can calculate the sentence lengths without using pandas. We initialize an empty list called lengths and a variable count set to 0. We iterate through the list of sentences in the test data and increment the count if the current sentence is the same as the previous one. If the sentences are different, we append the current count to the lengths list and reset the count to 1. This approach gives us the same sentence length information as the pandas solution.

These steps are crucial for preparing the test data to be used with our HMM model. By replacing unknown words and converting words to their corresponding IDs, we ensure that our model can process the test data. Additionally, calculating sentence lengths allows us to accurately decode the model's predictions and evaluate its performance.


In [18]:
data_test.loc[~data_test['Word'].isin(words), 'Word'] = 'UNKNOWN'
word_test = list(data_test.Word)
samples = []
for i, val in enumerate(word_test):
    samples.append([word2id[val]])

# TODO:
#Approach 1
# Calculate the lengths of sentences in the test data using pandas
lengths = data_test.groupby('sentence').size().tolist()

#Approach 2
lengths = []
count = 0
sentences = list(data_test.sentence)
for i in tqdm(range(len(sentences)), position=0, leave=True):
    if (i > 0) and (sentences[i] == sentences[i - 1]):
        count += 1
    elif i > 0:
        lengths.append(count)
        count = 1
    else:
        count = 1


100%|██████████| 345639/345639 [00:00<00:00, 357111.81it/s]


Now that we have the HMM ready lets predict the best path from them.

In [34]:
pos_predict = model.predict(samples, lengths)
pos_predict

array([28, 38, 28, ..., 17, 19, 18], dtype=int32)

The hmmlearn predict function will give the best probable path for the given sentence using the Viterbi algorithm.

## Task 2: Using the model parameters (startprob_, transmat_, emissionprob_) write the viterbi algorithm from scratch to calculate the best probable path and compare it with the hmmlearn implementation.

Now before using these matrices 

In [None]:
def Viterbi(pi: np.array, a: np.array, b: np.array, obs: List) -> np.array():
    """
    Write the viterbi algorithm from scratch to find the best probable path
    attr:
      pi: initial probabilities
      a: transition probabilities
      b: emission probabilities
      obs: list of observations
    return:
      array of the indices of the best hidden states
    """
    # Write your function here
    pass

### Task 3: Let's try to form our own HMM
In this task you will try to formulate your own HMM. Image a toy example that you think that closely relates to a Hidden Markov Model.

Steps:
 1. Define your hidden states
 2. Define your observable states
 3. Randomly generate your observations

Below is an example to demonstrate:

-In this toy HMM example, we have two hidden states 'healthy' and 'sick' these states relate to the state of a pet. In this example we cannot exactly know the situation of the pet if it is 'healthy' or 'sick'

-The observable states in this formulation is the what our pet is doing, whether it is sleeping, eating or pooping. We ideally want to determine if the pet is sick or not using these observable states


```python
hidden_states = ['healthy', 'sick']
observable_states = ['sleeping', 'eating', 'pooping']
observations = []
for i in range(100):
  observations.append(random.choice(observable_states))
```

# TASK 3

This is how we created our own HMM from scratch.



In this code cell below, we first define our hidden states and observable states. The hidden states correspond to the POS tags, while the observable states represent the unique words in the dataset. These states serve as the foundation for constructing our HMM.

Next, we initialize the probability matrices for our HMM. We use the startprob for initial state probabilities, the transmat for transition probabilities, and the emissionprob for emission probabilities. These matrices are derived from the training data and are essential for modeling the relationship between the hidden states and observable states.

After setting up the HMM's probability matrices, we define a function called generate_observation_sequence that takes the number of observations as input and returns a sequence of words generated by the HMM. This function simulates the process of generating text based on the underlying HMM model.

The function starts by randomly choosing an initial hidden state (POS tag) based on the init_probs distribution. It then selects an observable state (word) based on the emission probabilities for the chosen hidden state. This pair of hidden and observable states represents the starting point for our observation sequence.

We then iterate through the remaining number of observations, selecting the next hidden state based on the transition probabilities from the current hidden state. Subsequently, we choose the next observable state (word) based on the emission probabilities for the selected hidden state. We continue this process until the desired number of observations is reached, creating a sequence of words generated by the HMM.

Finally, we demonstrate the usage of the generate_observation_sequence function by generating a sequence of 10 words. The generated sequence serves as an example of text that can be produced by our HMM based on the probability matrices derived from the training data.

These steps are important for building and simulating an HMM from scratch. By defining the hidden and observable states and initializing the probability matrices, we can create an HMM that models the relationships between POS tags and words in our dataset. The generate_observation_sequence function allows us to observe how the HMM generates text based on these relationships, which can be useful for understanding the underlying structure of the language and generating new sentences.

In [36]:
import numpy as np

# Step 1: Define hidden states and observable states
hidden_states = tags  # POS tags
observable_states = words  # Unique words in the dataset

# Step 2: Initialize probability matrices
num_hidden_states = len(hidden_states)
num_observable_states = len(observable_states)

# Initial state probabilities
init_probs = startprob

# Transition probabilities
trans_probs = transmat

# Emission probabilities
emission_probs = emissionprob

# Step 3: Generate a sequence of observations
def generate_observation_sequence(num_observations: int) -> List[str]:
    hidden_state_sequence = [np.random.choice(hidden_states, p=init_probs)]
    observation_sequence = [np.random.choice(observable_states, p=emission_probs[tag2id[hidden_state_sequence[-1]]])]

    for _ in range(1, num_observations):
        next_hidden_state = np.random.choice(hidden_states, p=trans_probs[tag2id[hidden_state_sequence[-1]]])
        next_observation = np.random.choice(observable_states, p=emission_probs[tag2id[next_hidden_state]])
        
        hidden_state_sequence.append(next_hidden_state)
        observation_sequence.append(next_observation)
    
    return observation_sequence

# Example usage
num_observations = 10
observation_sequence = generate_observation_sequence(num_observations)
print(observation_sequence)



['workers', 'are', 'in', 'the', 'reporter', 'men', '.', '"', 'europe', ',']


Even tough we have generated the data randomly, for the learning purposes, let's try to learn an HMM from this data. For this we have to construct the Baum-Welch algorithm from scratch. Below is the skeleton of the Baum-Welch learning algorithm.

## TASK 4: Complete the forward and backward probs functions in the Baum-Welch algorithm and try it with your formulated HMM.

In [None]:
import numpy as np


def baum_welch(observations, observations_vocab, n_hidden_states):
    """
    Baum-Welch algorithm for estimating the HMM parameters
    :param observations: observations
    :param observations_vocab: observations vocabulary
    :param n_hidden_states: number of hidden states to estimate
    :return: a, b (transition matrix and emission matrix)
    """

    def forward_probs(observations, observations_vocab, n_hidden_states, a_, b_) -> np.array:
        """
        forward pass to calculate alpha
        :param observations: observations
        :param observations_vocab: observation vocabulary
        :param n_hidden_states: number of hidden states
        :param a_: estimated alpha
        :param b_: estimated beta
        :return: refined alpha_
        """
        a_start = 1 / n_hidden_states
        alpha_ = np.zeros((n_hidden_states, len(observations)), dtype=float)
        #TODO complete the forward function to calculate alpha

        return alpha_

    def backward_probs(observations, observations_vocab, n_hidden_states, a_, b_) -> np.array:
        """
        backward pass to calculate alpha
        :param observations: observations
        :param observations_vocab: observation vocabulary
        :param n_hidden_states: number of hidden states
        :param a_: estimated alpha
        :param b_: estimated beta
        :return: refined beta_
        """
        beta_ = np.zeros((n_hidden_states, len(observations)), dtype=float)
        beta_[:, -1:] = 1
        # TODO finish the function to calculate backward pass and calculate beta
        return beta_

    def compute_gamma(alfa, beta, observations, vocab, n_samples, a_, b_) -> np.array:
        """

        :param alfa:
        :param beta:
        :param observations:
        :param vocab:
        :param n_samples:
        :param a_:
        :param b_:
        :return:
        """
        # gamma_prob = np.zeros(n_samples, len(observations))
        gamma_prob = np.multiply(alfa, beta) / sum(np.multiply(alfa, beta))
        return gamma_prob

    def compute_sigma(alfa, beta, observations, vocab, n_samples, a_, b_) -> np.array:
        """

        :param alfa:
        :param beta:
        :param observations:
        :param vocab:
        :param n_samples:
        :param a_:
        :param b_:
        :return:
        """
        sigma_prob = np.zeros((n_samples, len(observations) - 1, n_samples), dtype=float)
        denomenator = np.multiply(alfa, beta)
        for i in range(len(observations) - 1):
            for j in range(n_samples):
                for k in range(n_samples):
                    index_in_vocab = np.where(vocab == observations[i + 1])[0][0]
                    sigma_prob[j, i, k] = (alfa[j, i] * beta[k, i + 1] * a_[j, k] * b_[k, index_in_vocab]) / sum(
                        denomenator[:, j])
        return sigma_prob

    # initialize A ,B
    a = np.ones((n_hidden_states, n_hidden_states)) / n_hidden_states
    b = np.ones((n_hidden_states, len(observations_vocab))) / len(observations_vocab)
    for iter in tqdm(range(2000), position=0, leave=True):

        # E-step caclculating sigma and gamma
        alfa_prob = forward_probs(observations, observations_vocab, n_hidden_states, a, b)  #
        beta_prob = backward_probs(observations, observations_vocab, n_hidden_states, a, b)  # , beta_val
        gamma_prob = compute_gamma(alfa_prob, beta_prob, observations, observations_vocab, n_hidden_states, a, b)
        sigma_prob = compute_sigma(alfa_prob, beta_prob, observations, observations_vocab, n_hidden_states, a, b)

        # M-step caclculating A, B matrices
        a_model = np.zeros((n_hidden_states, n_hidden_states))
        for j in range(n_hidden_states):  # calculate A-model
            for i in range(n_hidden_states):
                for t in range(len(observations) - 1):
                    a_model[j, i] = a_model[j, i] + sigma_prob[j, t, i]
                normalize_a = [sigma_prob[j, t_current, i_current] for t_current in range(len(observations) - 1) for
                               i_current in range(n_hidden_states)]
                normalize_a = sum(normalize_a)
                if normalize_a == 0:
                    a_model[j, i] = 0
                else:
                    a_model[j, i] = a_model[j, i] / normalize_a

        b_model = np.zeros((n_hidden_states, len(observations_vocab)))

        for j in range(n_hidden_states):
            for i in range(len(observations_vocab)):
                indices = [idx for idx, val in enumerate(observations) if val == observations_vocab[i]]
                numerator_b = sum(gamma_prob[j, indices])
                denominator_b = sum(gamma_prob[j, :])
                if denominator_b == 0:
                    b_model[j, i] = 0
                else:
                    b_model[j, i] = numerator_b / denominator_b

        a = a_model
        b = b_model
    return a, b


import random

hidden_states = ['healthy', 'sick']
observable_states = ['sleeping', 'eating', 'pooping']
observable_map = {'sleeping': 0, 'eating': 1, 'pooping': 2}
observations = []
for i in range(100):
    observations.append(observable_map[random.choice(observable_states)])

A, B = baum_welch(observations=observations, observations_vocab=np.array(list(observable_map.values())),
                  n_hidden_states=2)


In [None]:
#TASK 4: Now try it with your HMM