# HMMs: Viterbi Algorithm

- 📺 **Video:** [https://youtu.be/Ks7IrsjhqSo](https://youtu.be/Ks7IrsjhqSo)

## Overview
- Decode the most likely hidden state path given observations using the Viterbi algorithm.
- Track backpointers to recover the best sequence efficiently.

## Key ideas
- **Dynamic programming:** recursively store best path scores for each state/time pair.
- **Log-domain:** avoid underflow by maximizing log probabilities.
- **Backpointers:** record argmax choices to reconstruct the optimal path.
- **Complexity:** O(T * |states|^2), feasible for many tagging tasks.

## Demo
Run Viterbi decoding on a toy weather HMM to mirror the walkthrough in the lecture (https://youtu.be/mbmk5J--K2w).

In [1]:
import numpy as np

states = ['Rainy', 'Sunny']
obs = ['walk', 'shop', 'clean']
start = np.log(np.array([0.6, 0.4]))
trans = np.log(np.array([[0.7, 0.3], [0.4, 0.6]]))
emiss = np.log(np.array([[0.1, 0.4, 0.5], [0.6, 0.3, 0.1]]))
sequence = ['walk', 'shop', 'clean']
obs_index = {w: i for i, w in enumerate(obs)}

T = len(sequence)
num_states = len(states)
viterbi = np.full((T, num_states), -np.inf)
backpointer = np.zeros((T, num_states), dtype=int)

viterbi[0] = start + emiss[:, obs_index[sequence[0]]]

for t in range(1, T):
    for s in range(num_states):
        scores = viterbi[t-1] + trans[:, s]
        best_prev = np.argmax(scores)
        viterbi[t, s] = scores[best_prev] + emiss[s, obs_index[sequence[t]]]
        backpointer[t, s] = best_prev

best_last = np.argmax(viterbi[-1])
best_path = [best_last]
for t in range(T-1, 0, -1):
    best_path.append(backpointer[t, best_path[-1]])

best_path = list(reversed([states[idx] for idx in best_path]))
print('Best state sequence:', best_path)


Best state sequence: ['Sunny', 'Rainy', 'Rainy']


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks](https://www.aclweb.org/anthology/W19-4302/)
- [GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/pdf/1804.07461.pdf)
- [What Does BERT Look At? An Analysis of BERT's Attention](https://arxiv.org/abs/1906.04341)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/pdf/1907.11692.pdf)
- [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
- [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf)
- [UnifiedQA: Crossing Format Boundaries With a Single QA System](https://arxiv.org/abs/2005.00700)
- [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf)
- [Byte Pair Encoding is Suboptimal for Language Model Pretraining](https://arxiv.org/pdf/2004.03720.pdf)
- [Eisenstein 8.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [TnT - A Statistical Part-of-Speech Tagger](https://arxiv.org/abs/cs/0003055)
- [Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger](https://www.aclweb.org/anthology/W00-1308/)
- [Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?](https://link.springer.com/chapter/10.1007/978-3-642-19400-9_14)
- [Natural Language Processing with Small Feed-Forward Networks](https://www.aclweb.org/anthology/D17-1309.pdf)
- [Eisenstein 10.1-10.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3-10.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Accurate Unlexicalized Parsing](https://www.aclweb.org/anthology/P03-1054/)
- [Eisenstein 10.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 11.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Finding Optimal 1-Endpoint-Crossing Trees](https://www.aclweb.org/anthology/Q13-1002/)
- [Eisenstein 11.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)


*Links only; we do not redistribute slides or papers.*