Skip to content

v4.3.0: Wav2Vec2, ConvBERT, BORT, Amazon SageMaker

Choose a tag to compare
@LysandreJik LysandreJik released this 08 Feb 17:45

Wav2Vec2 from facebook (@patrickvonplaten)

Two new models are released as part of the Wav2Vec2 implementation: Wav2Vec2Model and Wav2Vec2ForMaskedLM, in PyTorch.

Wav2Vec2 is a multi-modal model, combining speech and text. It's the first multi-modal model of its kind we welcome in Transformers.

The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.

Compatible checkpoints can be found on the Hub:

Available notebooks:


Future Additions

  • Enable fine-tuning and pretraining for Wav2Vec2
  • Add example script with dependency to wav2letter/flashlight
  • Add Encoder-Decoder Wav2Vec2 model


The ConvBERT model was proposed in ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.

Six new models are released as part of the ConvBERT implementation: ConvBertModel, ConvBertForMaskedLM, ConvBertForSequenceClassification, ConvBertForTokenClassification, ConvBertForQuestionAnswering and ConvBertForMultipleChoice. These models are available both in PyTorch and TensorFlow.



The BORT model was proposed in Optimal Subarchitecture Extraction for BERT by Amazon's Adrian de Wynter and Daniel J. Perry. It is an optimal subset of architectural parameters for the BERT, which the authors refer to as 鈥淏ort鈥.

The BORT model can be loaded directly in the BERT architecture, therefore all BERT model heads are available for BORT.


Trainer now supports Amazon SageMaker鈥檚 data parallel library (@sgugger)

When executing a script with Trainer using Amazon SageMaker and enabling SageMaker's data parallelism library, Trainer will automatically use the smdistributed library. All maintained examples have been tested with this functionality. Here is an overview of SageMaker data parallelism library.

  • When on SageMaker use their env variables for saves #9876 (@sgugger)

Community page

A new Community Page has been added to the docs. These contain all the notebooks contributed by the community, as well as some community projects built around Transformers. Feel free to open a PR if you want your project to be showcased!

Additional model architectures

DeBERTa now has more model heads available.

BART, mBART, Marian, Pegasus and Blenderbot now have decoder-only model architectures. They can therefore be used in decoder-only settings.

Breaking changes


General improvements and bugfixes