Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Generalized Entropy Regularization

This library is built on top of fairseq (pytorch).

Generalized entropy regularization can be used with any probabilistic model and data set. Just set the --criterion flag to jensen_cross_entropy and specify --alpha and --beta when running fairseq-train. Specify --use-uniform to use the uniform distribution as the baseline. Otherwise, the unigram distribution with annealing parameter --T will be used (run fairseq-train --help to see all options).


Neural Machine Translation

Preprocess data following examples in the Translation README. A convolutional model can then be trained on IWSLT'14 (De-En) with the following command:

fairseq-train data-bin/ \
			--arch fconv_iwslt_de_en --max-tokens 4000 --update-freq 8 \
            --clip-norm 0.1 --dropout 0.2 --criterion jensen_cross_entropy \
            --lr-scheduler fixed --min-lr 1e-8 --lr 0.5 --alpha 0.5 \
            --beta 0.7 --use-uniform

Likewise, a Transformer can be trained as follows:

fairseq-train data-bin/ \
		    --arch transformer_iwslt_de_en --share-decoder-input-output-embed \
		    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
		    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
		    --dropout 0.3 --weight-decay 0.0001 --max-tokens 4000 \
		    --criterion jensen_cross_entropy  --alpha 0.5 \
            --beta 0.7 --use-uniform

Generation is the same as in the Fairseq Translation README.

Abstractive Summarization

Download the CNN/DailyMail data set according to the BART README. Follow their suggested training method, setting --criterion to jensen_cross_entropy and specifying --alpha, --beta, and --use-uniform (if desired).

Other models can be trained using the same methodology as above. See fairseq documentation for more options.

Requirements and Installation

  • PyTorch version >= 1.2.0
  • Python version >= 3.6
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • For faster training install NVIDIA's apex library with the --cuda_ext and --deprecated_fused_adam options


git clone
cd entropy_regularization
pip install --editable .


This code is for the paper Generalized Entropy Regularization or: There’s Nothing Special about Label Smoothing featured in ACL 2020. Please cite as:

 title = {Generalized Entropy Regularization or: {T}here's Nothing Special about Label Smoothing},
 author = {Meister, Clara and 
Salesky, Elizabeth and 
Cotterell, Ryan},
 booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
 month = {July},
 year = {2020},
 address = {Seattle, USA},
 publisher = {Association for Computational Linguistics},

Feel free to include the fairseq citation as well; they're awesome.

    title = "fairseq: {A} Fast, Extensible Toolkit for Sequence Modeling",
    author = "Ott, Myle  and
      Edunov, Sergey  and
      Baevski, Alexei  and
      Fan, Angela  and
      Gross, Sam  and
      Ng, Nathan  and
      Grangier, David  and
      Auli, Michael",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics (Demonstrations)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "",
    doi = "10.18653/v1/N19-4009",
    pages = "48--53",


Code for Generalized Entropy Regularization paper



Code of conduct





No releases published


No packages published