# HMMs for POS Tagging

- 📺 **Video:** [https://youtu.be/wijpAX_LLXo](https://youtu.be/wijpAX_LLXo)

## Overview
- Apply HMMs to part-of-speech tagging by modeling tags as hidden states and words as emissions.
- Evaluate accuracy on simple sentences.

## Key ideas
- **States as tags:** transitions capture tag bigram patterns (DET→NOUN).
- **Emissions as words:** lexical probabilities connect tags to word types.
- **Decoding:** Viterbi selects the best tag sequence.
- **Supervision:** estimate parameters from tagged corpora or use unsupervised EM.

## Demo
Train a tiny POS-tag HMM from labeled examples and decode a new sentence, echoing the lecture (https://youtu.be/IUgnk58HvDQ).

In [1]:
import numpy as np
from collections import defaultdict

train_sentences = [
    (['the', 'cat', 'sleeps'], ['DET', 'NOUN', 'VERB']),
    (['a', 'dog', 'runs'], ['DET', 'NOUN', 'VERB']),
    (['the', 'dog', 'barks'], ['DET', 'NOUN', 'VERB'])
]

states = sorted({tag for _, tags in train_sentences for tag in tags})
words = sorted({word for words, _ in train_sentences for word in words} | {'purrs'})
state_to_id = {s: i for i, s in enumerate(states)}
word_to_id = {w: i for i, w in enumerate(words)}

start_counts = np.zeros(len(states))
trans_counts = np.zeros((len(states), len(states)))
emiss_counts = np.zeros((len(states), len(words)))

for words_seq, tags_seq in train_sentences:
    start_counts[state_to_id[tags_seq[0]]] += 1
    for i in range(len(tags_seq)):
        emiss_counts[state_to_id[tags_seq[i]], word_to_id[words_seq[i]]] += 1
        if i < len(tags_seq) - 1:
            trans_counts[state_to_id[tags_seq[i]], state_to_id[tags_seq[i+1]]] += 1

start = (start_counts + 1) / (start_counts.sum() + len(states))
trans = (trans_counts + 1) / (trans_counts.sum(axis=1, keepdims=True) + len(states))
emiss = (emiss_counts + 1) / (emiss_counts.sum(axis=1, keepdims=True) + len(words))

sentence = ['the', 'cat', 'purrs']
t = len(sentence)
log_start = np.log(start)
log_trans = np.log(trans)
log_emiss = np.log(emiss)

viterbi = np.full((t, len(states)), -np.inf)
backpointer = np.zeros((t, len(states)), dtype=int)

viterbi[0] = log_start + log_emiss[:, word_to_id['the']]
for i in range(1, t):
    obs_id = word_to_id.get(sentence[i], None)
    for s in range(len(states)):
        obs_log = log_emiss[s, obs_id] if obs_id is not None else -10.0
        scores = viterbi[i-1] + log_trans[:, s] + obs_log
        best = np.argmax(scores)
        viterbi[i, s] = scores[best]
        backpointer[i, s] = best

best_last = np.argmax(viterbi[-1])
best_path = [best_last]
for i in range(t-1, 0, -1):
    best_path.append(backpointer[i, best_path[-1]])

pred_tags = [states[idx] for idx in reversed(best_path)]
print('Sentence:', sentence)
print('Predicted tags:', pred_tags)


Sentence: ['the', 'cat', 'purrs']
Predicted tags: ['DET', 'NOUN', 'VERB']


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks](https://www.aclweb.org/anthology/W19-4302/)
- [GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/pdf/1804.07461.pdf)
- [What Does BERT Look At? An Analysis of BERT's Attention](https://arxiv.org/abs/1906.04341)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/pdf/1907.11692.pdf)
- [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
- [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf)
- [UnifiedQA: Crossing Format Boundaries With a Single QA System](https://arxiv.org/abs/2005.00700)
- [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf)
- [Byte Pair Encoding is Suboptimal for Language Model Pretraining](https://arxiv.org/pdf/2004.03720.pdf)
- [Eisenstein 8.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [TnT - A Statistical Part-of-Speech Tagger](https://arxiv.org/abs/cs/0003055)
- [Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger](https://www.aclweb.org/anthology/W00-1308/)
- [Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?](https://link.springer.com/chapter/10.1007/978-3-642-19400-9_14)
- [Natural Language Processing with Small Feed-Forward Networks](https://www.aclweb.org/anthology/D17-1309.pdf)
- [Eisenstein 10.1-10.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3-10.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Accurate Unlexicalized Parsing](https://www.aclweb.org/anthology/P03-1054/)
- [Eisenstein 10.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 11.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Finding Optimal 1-Endpoint-Crossing Trees](https://www.aclweb.org/anthology/Q13-1002/)
- [Eisenstein 11.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)


*Links only; we do not redistribute slides or papers.*