# Part-of-Speech Tagging

- 📺 **Video:** [https://youtu.be/Llw6qfeAWDs](https://youtu.be/Llw6qfeAWDs)

## Overview
- Frame POS tagging as sequence labeling that assigns a grammatical tag to each token.
- Highlight supervised approaches with lexical and contextual features.

## Key ideas
- **Contextual features:** surrounding words and affixes inform tag decisions.
- **Sequence constraints:** tags follow patterns (DET→NOUN) that models can exploit.
- **Evaluation:** accuracy per token and confusion matrices reveal confusions.
- **Applications:** POS tags serve as input to parsers and information extraction systems.

## Demo
Use scikit-learn with hand-crafted features to train a simple POS tagger on toy sentences, mirroring the lecture (https://youtu.be/Ez4Wshp72zk).

In [1]:
from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

train_data = [
    (['the', 'cat', 'sleeps'], ['DET', 'NOUN', 'VERB']),
    (['a', 'dog', 'runs'], ['DET', 'NOUN', 'VERB']),
    (['the', 'dog', 'barks'], ['DET', 'NOUN', 'VERB']),
    (['lazy', 'dog', 'sleeps'], ['ADJ', 'NOUN', 'VERB']),
    (['quick', 'cat', 'runs'], ['ADJ', 'NOUN', 'VERB'])
]

def token_features(sent, idx):
    word = sent[idx]
    prev = sent[idx-1] if idx > 0 else '<START>'
    next_w = sent[idx+1] if idx < len(sent) - 1 else '<END>'
    return {
        'word': word,
        'lower': word.lower(),
        'suffix3': word[-3:],
        'prefix1': word[0],
        'prev': prev,
        'next': next_w
    }

X_feat, y = [], []
for words, tags in train_data:
    for i in range(len(words)):
        X_feat.append(token_features(words, i))
        y.append(tags[i])

vec = DictVectorizer(sparse=False)
X = vec.fit_transform(X_feat)
clf = LogisticRegression(max_iter=2000, multi_class='ovr', random_state=0)
clf.fit(X, y)

pred = clf.predict(X)
print(classification_report(y, pred, digits=3))


              precision    recall  f1-score   support

         ADJ      1.000     1.000     1.000         2
         DET      1.000     1.000     1.000         3
        NOUN      1.000     1.000     1.000         5
        VERB      1.000     1.000     1.000         5

    accuracy                          1.000        15
   macro avg      1.000     1.000     1.000        15
weighted avg      1.000     1.000     1.000        15





## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks](https://www.aclweb.org/anthology/W19-4302/)
- [GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/pdf/1804.07461.pdf)
- [What Does BERT Look At? An Analysis of BERT's Attention](https://arxiv.org/abs/1906.04341)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/pdf/1907.11692.pdf)
- [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
- [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf)
- [UnifiedQA: Crossing Format Boundaries With a Single QA System](https://arxiv.org/abs/2005.00700)
- [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf)
- [Byte Pair Encoding is Suboptimal for Language Model Pretraining](https://arxiv.org/pdf/2004.03720.pdf)
- [Eisenstein 8.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [TnT - A Statistical Part-of-Speech Tagger](https://arxiv.org/abs/cs/0003055)
- [Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger](https://www.aclweb.org/anthology/W00-1308/)
- [Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?](https://link.springer.com/chapter/10.1007/978-3-642-19400-9_14)
- [Natural Language Processing with Small Feed-Forward Networks](https://www.aclweb.org/anthology/D17-1309.pdf)
- [Eisenstein 10.1-10.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3-10.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Accurate Unlexicalized Parsing](https://www.aclweb.org/anthology/P03-1054/)
- [Eisenstein 10.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 11.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Finding Optimal 1-Endpoint-Crossing Trees](https://www.aclweb.org/anthology/Q13-1002/)
- [Eisenstein 11.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)


*Links only; we do not redistribute slides or papers.*