# LM Evaluation

- 📺 **Video:** [https://youtu.be/ImW4vJ5XZQc](https://youtu.be/ImW4vJ5XZQc)

## Overview
Discusses how to evaluate language models, introducing Perplexity as the main metric. The video defines perplexity (PP) of a model on a test set as the geometric mean of the inverse probability the model assigns to each word (or essentially exp of the average negative log-likelihood) It explains that a lower perplexity means the model is less “surprised” by the test data - hence a better model.

In [None]:
import os, random
random.seed(0)
CI = os.environ.get('CI') == 'true'

## Key ideas
- For example, if a model has perplexity 50, it's as uncertain as if it had to choose among 50 equally likely words at each step on average.
- The lecture might demonstrate computing perplexity on a tiny example for clarity.
- It also covers cross-entropy as an information-theoretic measure related to log-likelihood, noting that minimizing cross-entropy is equivalent to maximizing likelihood, and perplexity is just 2^(cross-entropy) if logs are base 2.
- Additionally, the video might highlight that perplexity is an intrinsic evaluation - it measures how well the model predicts data, but not directly how useful the model is for any downstream task.

## Demo

In [None]:
print('Try the exercises below and follow the linked materials.')

## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 6.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 6.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 6.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 6.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [[Blog] Understanding LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
- [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473)
- [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf)
- [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf)
- [[Blog] The Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/)
- [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf)
- [Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation](https://arxiv.org/abs/2108.12409)
- [The Impact of Positional Encoding on Length Generalization in Transformers](https://arxiv.org/abs/2305.19466)


*Links only; we do not redistribute slides or papers.*