BART

Overview

The Bart model was proposed in BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.

According to the abstract,

Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT).
The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.
BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE.

This model was contributed by sshleifer. The authors' code can be found here.

Usage tips:

BART is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than the left.
Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). A composition of the following transformations are applied on the pretraining tasks for the encoder:
- mask random tokens (like in BERT)
- delete random tokens
- mask a span of k tokens with a single mask token (a span of 0 tokens is an insertion of a mask token)
- permute sentences
- rotate the document to make it start at a specific token

Implementation Notes

Bart doesn't use token_type_ids for sequence classification. Use [BartTokenizer] or [~BartTokenizer.encode] to get the proper splitting.
The forward pass of [BartModel] will create the decoder_input_ids if they are not passed. This is different than some other modeling APIs. A typical use case of this feature is mask filling.
Model predictions are intended to be identical to the original implementation when forced_bos_token_id=0. This only works, however, if the string you pass to [fairseq.encode] starts with a space.
[~generation.GenerationMixin.generate] should be used for conditional generation tasks like summarization, see the example in that docstrings.
Models that load the facebook/bart-large-cnn weights will not have a mask_token_id, or be able to perform mask-filling tasks.

Mask Filling

The facebook/bart-base and facebook/bart-large checkpoints can be used to fill multi-token masks.

from transformers import BartForConditionalGeneration, BartTokenizer

model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0)
tok = BartTokenizer.from_pretrained("facebook/bart-large")
example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
batch = tok(example_english_phrase, return_tensors="pt")
generated_ids = model.generate(batch["input_ids"])
assert tok.batch_decode(generated_ids, skip_special_tokens=True) == [
    "UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria"
]

Resources

A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BART. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

A blog post on Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker.
A notebook on how to finetune BART for summarization with fastai using blurr. 🌎
A notebook on how to finetune BART for summarization in two languages with Trainer class. 🌎
[BartForConditionalGeneration] is supported by this example script and notebook.
[TFBartForConditionalGeneration] is supported by this example script and notebook.
[FlaxBartForConditionalGeneration] is supported by this example script.
An example of how to train [BartForConditionalGeneration] with a Hugging Face datasets object can be found in this forum discussion
Summarization chapter of the 🤗 Hugging Face course.
Summarization task guide

[BartForConditionalGeneration] is supported by this example script and notebook.
[TFBartForConditionalGeneration] is supported by this example script and notebook.
[FlaxBartForConditionalGeneration] is supported by this example script and notebook.
Masked language modeling chapter of the 🤗 Hugging Face Course.
Masked language modeling task guide

A notebook on how to finetune mBART using Seq2SeqTrainer for Hindi to English translation. 🌎
[BartForConditionalGeneration] is supported by this example script and notebook.
[TFBartForConditionalGeneration] is supported by this example script and notebook.
Translation task guide

BartConfig

[[autodoc]] BartConfig - all

BartTokenizer

[[autodoc]] BartTokenizer - all

BartTokenizerFast

[[autodoc]] BartTokenizerFast - all

BartModel

[[autodoc]] BartModel - forward

BartForConditionalGeneration

[[autodoc]] BartForConditionalGeneration - forward

BartForSequenceClassification

[[autodoc]] BartForSequenceClassification - forward

BartForQuestionAnswering

[[autodoc]] BartForQuestionAnswering - forward

BartForCausalLM

[[autodoc]] BartForCausalLM - forward

TFBartModel

[[autodoc]] TFBartModel - call

TFBartForConditionalGeneration

[[autodoc]] TFBartForConditionalGeneration - call

TFBartForSequenceClassification

[[autodoc]] TFBartForSequenceClassification - call

FlaxBartModel

[[autodoc]] FlaxBartModel - call - encode - decode

FlaxBartForConditionalGeneration

[[autodoc]] FlaxBartForConditionalGeneration - call - encode - decode

FlaxBartForSequenceClassification

[[autodoc]] FlaxBartForSequenceClassification - call - encode - decode

FlaxBartForQuestionAnswering

[[autodoc]] FlaxBartForQuestionAnswering - call - encode - decode

FlaxBartForCausalLM

[[autodoc]] FlaxBartForCausalLM - call

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bart.md

bart.md

BART

Overview

Usage tips:

Implementation Notes

Mask Filling

Resources

BartConfig

BartTokenizer

BartTokenizerFast

BartModel

BartForConditionalGeneration

BartForSequenceClassification

BartForQuestionAnswering

BartForCausalLM

TFBartModel

TFBartForConditionalGeneration

TFBartForSequenceClassification

FlaxBartModel

FlaxBartForConditionalGeneration

FlaxBartForSequenceClassification

FlaxBartForQuestionAnswering

FlaxBartForCausalLM

Files

bart.md

Latest commit

History

bart.md

File metadata and controls

BART

Overview

Usage tips:

Implementation Notes

Mask Filling

Resources

BartConfig

BartTokenizer

BartTokenizerFast

BartModel

BartForConditionalGeneration

BartForSequenceClassification

BartForQuestionAnswering

BartForCausalLM

TFBartModel

TFBartForConditionalGeneration

TFBartForSequenceClassification

FlaxBartModel

FlaxBartForConditionalGeneration

FlaxBartForSequenceClassification

FlaxBartForQuestionAnswering

FlaxBartForCausalLM