truncation-sampling

This repository describes experiments for the paper Truncation Sampling as Language Model Desmoothing, comparing and evaluating existing truncation sampling methods like top-p (nucleus), typical decoding , and new methods epsilon sampling and our proposed eta sampling.

Getting started

Start by installing some packages:

    pip install -r requirements.txt

Finding MAUVE-maximizing hyperparameters

In many experiments, we set the truncation severity hyperparameter of each method by maximizing the MAUVE score on open-ended WebText (in-distribution for the GPT-2 models tested.) For those generations and results, and to replicate the experiments, go to our fork of the MAUVE paper repository.

Human evaluation

In these experiments, we took samples generated by GPT-2 large from prefixes in the WebText data according to different truncation sampling techniques and asked humans to state their preferences between them. In particular, to test long document plausibility, we give humans the shared prefix, as well as the last 70 words generated by two truncation sampling methods for that prefix (or the real human-written suffix) and ask which suffix more plausibly came from the same document as the prefix. See our paper for more details and a screenshot of our mturk study. In the human_eval subdirectory, we provide the results and the analysis scripts for this portion of the study.

Note that we follow the MAUVE authors in skipping low-quality prefixes in the WebText distribution when choosing the set of prefixes to generate from.

See the README.md in the human_eval directory for more details.

Automatic evaluations

Repetition Analysis

To run the repetition analysis (Section 5.4), run

    python src/simple_repetition.py 
            --model_string {gpt2,gpt2_medium,gpt2_large,gpt2_xl}
            {p,t,e,h} # top-p, typical, epsilon, eta

Entropy Analysis

To run the entropy analysis (Section 5.3), run

    python src/entropy_tv_tradeoff.py

Individual distribution Analysis

To run the individual distribution analysis (Section 5.4), run

    python src/make_cutoff_plots.py

Citation

If this work is useful to you, please cite it with

@inproceedings{hewitt2022truncation,
 author = {Hewitt, John and Manning, Christopher D. and Liang, Percy},
 booktitle = {Findings of the Conference on Empirical Methods in Natural Language Processing  (Findings of EMNLP)},
 title = {Truncation Sampling as Language Model Desmoothing},
 url= {https://arxiv.org/pdf/2210.15191.pdf},
 year = {2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

human_eval

human_eval

src

src

README.md

README.md

download_data.py

download_data.py

requirements.txt

requirements.txt

Repository files navigation

truncation-sampling

Getting started

Finding MAUVE-maximizing hyperparameters

Human evaluation

Automatic evaluations

Repetition Analysis

Entropy Analysis

Individual distribution Analysis

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
human_eval		human_eval
src		src
README.md		README.md
download_data.py		download_data.py
requirements.txt		requirements.txt

john-hewitt/truncation-sampling

Folders and files

Latest commit

History

Repository files navigation

truncation-sampling

Getting started

Finding MAUVE-maximizing hyperparameters

Human evaluation

Automatic evaluations

Repetition Analysis

Entropy Analysis

Individual distribution Analysis

Citation

About

Resources

Stars

Watchers

Forks

Languages