# Nucleus Sampling

- 📺 **Video:** [https://youtu.be/JETxaSaj6_k](https://youtu.be/JETxaSaj6_k)

## Overview
Covers an alternative decoding strategy geared towards more natural and less repetitive text generation: Nucleus Sampling (top-p sampling) The video likely begins by pointing out limitations of deterministic decoding like beam search: for open-ended tasks (like story generation or chat), always taking the highest probability leads to safe, generic, sometimes dull outputs or can get stuck in repetitive loops (a phenomenon called Neural Text Degeneration, as per Holtzman et al. 2019 To produce varied and human-like text, we often use randomness in decoding.

In [None]:
import os, random
random.seed(0)
CI = os.environ.get('CI') == 'true'

## Key ideas
- Basic random sampling from the full softmax often yields incoherent text because the model's probability tail (many low-probability words) can be sampled, introducing nonsense.
- So Top-k sampling was introduced - only sample from the top k most likely words.
- Nucleus Sampling further generalizes that: at each step, determine the smallest set of words such that their cumulative probability ≥ p (e.g.
- p = 0.9), and then sample uniformly from that set according to the model's probabilities This means sometimes the set could be just 1 word (if one word has ≥90% probability, we'll deterministically take it), but if the distribution is more uncertain, we might include more options.

## Demo

In [None]:
print('Try the exercises below and follow the linked materials.')

## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf)
- [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361)
- [Efficient Transformers: A Survey](https://arxiv.org/abs/2009.06732)
- [Rethinking Attention with Performers](https://arxiv.org/abs/2009.14794)
- [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150)
- [The Curious Case of Neural Text Degeneration](https://arxiv.org/abs/1904.09751)


*Links only; we do not redistribute slides or papers.*