# Introduction


**What?** Hidden Markov Model (HMM)



# High level view of methods available in NLP


- The different approaches used to solve NLP problems commonly fall into three categories:
    - **Heuristics**: dictionaries and thesauruses
    - **Machine learning**: Naive Bayes, SVM, hidden Markov model, conditional random fields
    - **Deep Learning**: RNNs, LSTMs, GRUs, CNNs, transformer, autoencoder



# What is a hidden Markov model?


- The hidden Markov model (HMM) is a statistical model that assumes there is an underlying, unobservable process  with hidden states that generates the data—i.e., we can only observe the data once it is generated. 
- HMMs also  make the Markov assumption, which means that each hidden state is dependent on the previous state(s).
- Consider the NLP task of part-of-speech (POS) tagging, which deals with assigning part-of-speech tags to sentences.
- Parts of speech like JJ (adjective) and NN (noun) are hidden states, while the sentence “natural language processing (nlp)…” is directly observed.”



![image.png](attachment:image.png)

# Import modules

In [2]:
import nltk
from nltk.util import unique_list

# Implementation

In [14]:
corpus = nltk.corpus.brown.tagged_sents(categories='adventure')[:700]
print(len(corpus))

tag_set = unique_list(tag for sent in corpus for (word,tag) in sent)
print(len(tag_set))

symbols = unique_list(word for sent in corpus for (word,tag) in sent)
print(len(symbols))
print(len(tag_set))

symbols = unique_list(word for sent in corpus for (word,tag) in sent)
print(len(symbols))


trainer = nltk.tag.HiddenMarkovModelTrainer(tag_set, symbols)
train_corpus = []
test_corpus = []
for i in range(len(corpus)):
    if i % 10:
        train_corpus += [corpus[i]]
    else:
        test_corpus += [corpus[i]]

print(len(train_corpus))
print(len(test_corpus)) 

print("111")

#def train_and_test(est):
hmm = trainer.train_supervised(train_corpus)
print('%.2f%%' % (100 * hmm.evaluate(test_corpus)))

700
104
1908
104
1908
630
70
111
28.76%


# References


- Chopra, Deepti, Nisheeth Joshi, and Iti Mathur. Mastering natural language processing with python. Packt Publishing Ltd, 2016.
- https://tedboy.github.io/nlps/generated/generated/nltk.tag.HiddenMarkovModelTrainer.train_supervised.html
- https://github.com/PacktPublishing/Mastering-Natural-Language-Processing-with-Python
    
