Release v4.13.0: Perceiver, ImageGPT, mLUKE, Vision-Text dual encoders, QDQBert, new documentation frontend · huggingface/transformers

New Model additions

Perceiver

Eight new models are released as part of the Perceiver implementation: PerceiverModel, PerceiverForMaskedLM, PerceiverForSequenceClassification, PerceiverForImageClassificationLearned, PerceiverForImageClassificationFourier, PerceiverForImageClassificationConvProcessing, PerceiverForOpticalFlow, PerceiverForMultimodalAutoencoding, in PyTorch.

The Perceiver IO model was proposed in Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch,
Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M.
Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.

Add Perceiver IO by @NielsRogge in #14487

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=perceiver

mLUKE

The mLUKE tokenizer is added. The tokenizer can be used for the multilingual variant of LUKE.

The mLUKE model was proposed in mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. It's a multilingual extension
of the LUKE model trained on the basis of XLM-RoBERTa.

Add mLUKE by @Ryou0634 in #14640

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=luke

ImageGPT

Three new models are released as part of the ImageGPT integration: ImageGPTModel, ImageGPTForCausalImageModeling, ImageGPTForImageClassification, in PyTorch.

The ImageGPT model was proposed in Generative Pretraining from Pixels by Mark
Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. ImageGPT (iGPT) is a GPT-2-like
model trained to predict the next pixel value, allowing for both unconditional and conditional image generation.

Add ImageGPT by @NielsRogge in #14240

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=imagegpt

QDQBert

Eight new models are released as part of the QDQBert implementation: QDQBertModel, QDQBertLMHeadModel, QDQBertForMaskedLM, QDQBertForSequenceClassification, QDQBertForNextSentencePrediction, QDQBertForMultipleChoice, QDQBertForTokenClassification, QDQBertForQuestionAnswering, in PyTorch.

The QDQBERT model can be referenced in Integer Quantization for Deep Learning Inference: Principles and Empirical
Evaluation by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius
Micikevicius.

Add QDQBert model and quantization examples of SQUAD task by @shangz-ai in #14066

Semantic Segmentation models

The semantic Segmentation models' API is unstable and bound to change between this version and the next.

The first semantic segmentation models are added. In semantic segmentation, the goal is to predict a class label for every pixel of an image. The models that are added are SegFormer (by NVIDIA) and BEiT (by Microsoft Research). BEiT was already available in the library, but this release includes the model with a semantic segmentation head.

The SegFormer model was proposed in SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. The model consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on image segmentation benchmarks such as ADE20K and Cityscapes.

The BEiT model was proposed in BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei. Rather than pre-training the model to predict the class of an image (as done in the original ViT paper), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI’s DALL-E model given masked patches.

Add SegFormer by @NielsRogge in #14019
Add BeitForSemanticSegmentation by @NielsRogge in #14096

Vision-text dual encoder

Adds VisionTextDualEncoder model in PyTorch and Flax to be able to load any pre-trained vision (ViT, DeiT, BeiT, CLIP's vision model) and text (BERT, ROBERTA) model in the library for vision-text tasks like CLIP.

This model pairs a vision and text encoder and adds projection layers to project the embeddings to another embeddings space with similar dimensions. which can then be used to align the two modalities.

VisionTextDualEncoder by @patil-suraj in #13511

CodeParrot

CodeParrot, a model trained to generate code, has been open-sourced in the research projects by @lvwerra.

Add CodeParrot 🦜 codebase by @lvwerra in #14536

Language model support for ASR

Add language model support for CTC models by @patrickvonplaten in #14339
Language model boosted decoding is added for all CTC models via https://github.com/kensho-technologies/pyctcdecode and https://github.com/kpu/kenlm.

See https://huggingface.co/patrickvonplaten/wav2vec2-xlsr-53-es-kenlm for more information.

Flax-specific additions

Adds Flax version of the vision encoder-decoder model, and adds a Flax version of GPT-J.

Add FlaxVisionEncoderDecoderModel by @ydshieh in #13359
FlaxGPTJ by @patil-suraj in #14396

TensorFlow-specific additions

Vision transformers are here! Convnets are so 2012, now that ML is converging on self-attention as a universal model.

Add TFViTModel by @ydshieh in #13778

Want to handle real-world tables, where text and data are positioned in a 2D grid? TAPAS is now here for both TensorFlow and PyTorch.

Tapas tf by @kamalkraj in #13393

Automatic checkpointing and cloud saves to the HuggingFace Hub during training are now live, allowing you to resume training when it's interrupted, even if your initial instance is terminated. This is an area of very active development - watch this space for future developments, including automatic model card creation and more.

Add model checkpointing to push_to_hub and PushToHubCallback by @Rocketknight1 in #14492

Auto-processors

A new class to automatically select processors is added: AutoProcessor. It can be used for all models that require a processor, in both computer vision and audio.

Auto processor by @sgugger in #14465

New documentation frontend

A new documentation frontend is out for the transformers library! The goal with this documentation is to be better aligned with the rest of our website, and contains tools to improve readability. The documentation can now be written in markdown rather than RST.

Doc new front by @sgugger in #14590

LayoutLM Improvements

The LayoutLMv2 feature extractor now supports non-English languages, and LayoutXLM gets its own processor.

LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. by @Xargonus in #14514
Add LayoutXLMProcessor (and LayoutXLMTokenizer, LayoutXLMTokenizerFast) by @NielsRogge in #14115

Trainer Improvements

You can now take advantage of the Ampere hardware with the Trainer:

--bf16 - do training or eval in mixed precision of bfloat16
--bf16_full_eval - do eval in full bfloat16
--tf32 control having TF32 mode on/off

Improvements and bugfixes

Replace assertions with RuntimeError exceptions by @ddrm86 in #14186
Adding batch_size support for (almost) all pipelines by @Narsil in #13724
Remove n_ctx from configs by @thomasw21 in #14165
Add BlenderbotTokenizerFast by @stancld in #13720
Adding handle_long_generation paramters for text-generation pipeline. by @Narsil in #14118
Fix pipeline tests env and fetch by @sgugger in #14209
Generalize problem_type to all sequence classification models by @sgugger in #14180
Fixing image segmentation with inference mode. by @Narsil in #14204
Add a condition for checking labels by @hrxorxm in #14211
Torch 1.10 by @LysandreJik in #14169
Add more missing models to models/init.py by @ydshieh in #14177
Clarify QA examples by @NielsRogge in #14172
Fixing image-segmentation tests. by @Narsil in #14223
Tensor location is already handled by @Narsil in #14224
Raising exceptions instead of using assertions for few models by @pdcoded in #14219
Fix the write problem in trainer.py comment by @wmathor in #14202
[GPTJ] enable common tests and few fixes by @patil-suraj in #14190
improving efficiency of mlflow metric logging by @wamartin-aml in #14232
Fix generation docstring by @qqaatw in #14216
Fix test_configuration_tie in FlaxEncoderDecoderModelTest by @ydshieh in #14076
[Tests] Fix DistilHubert path by @anton-l in #14245
Add PushToHubCallback in main init by @sgugger in #14246
Fixes Beit training for PyTorch 1.10+ by @sgugger in #14249
Added Beit model ouput class by @lumliolum in #14133
Update Transformers to huggingface_hub >= 0.1.0 by @sgugger in #14251
Add cross attentions to TFGPT2Model by @ydshieh in #14038
[Wav2Vec2] Adapt conversion script by @patrickvonplaten in #14258
Put load_image function in image_utils.py & fix image rotation issue by @mishig25 in #14062
minimal fixes to run DataCollatorForWholeWordMask with return_tensors="np" and return_tensors="tf" by @dwyatte in #13891
Adding support for truncation parameter on feature-extraction pipeline. by @Narsil in #14193
Fix of issue #13327: Wrong weight initialization for TF t5 model by @dshirron in #14241
Fixing typo in error message. by @Narsil in #14226
Pin Keras cause they messed their release by @sgugger in #14262
Quality explain by @sgugger in #14264
Add more instructions to the release guide by @sgugger in #14263
Fixing slow pipeline tests by @Narsil in #14260
Fixing mishandling of ignore_labels. by @Narsil in #14274
improve rewrite state_dict missing _metadata by @changwangss in #14276
Removing Keras version pinning by @Rocketknight1 in #14280
Pin TF until tests are fixed by @sgugger in #14283
[Hubert Docs] Make sure example uses a fine-tuned model by @patrickvonplaten in #14291
Add new LFS prune API by @sgugger in #14294
Remove DPRPretrainedModel from docs by @xhlulu in #14300
Handle long answer needs to be updated. by @Narsil in #14279
[tests] Fix SegFormer and BEiT tests by @NielsRogge in #14289
Fix typo on PPLM example README by @Beomi in #14287
[Marian Conversion] Fix eos_token_id conversion in conversion script by @patrickvonplaten in #14320
[Tests] Update audio classification tests to support torch 1.10 by @anton-l in #14318
[TFWav2Vec2Model] Fix input shapes in TFWav2Vec2WeightNormConv1D by @anton-l in #14319
Fixing tests on master. by @Narsil in #14317
Fixing mutable default argument in pipeline. by @Narsil in #14316
Changed relative imports to absolute to allow convert_graph_to_onnx.py to run as a script. by @nbertagnolli in #14325
Expand dynamic supported objects to configs and tokenizers by @sgugger in #14296
[deepspeed] Enable multiple test runs on single box, defer to DS_TEST_PORT if set by @jeffra in #14331
Small change to Wav2Vec2 model to support Tensor-Parallelism with DeepSpeed by @RezaYazdaniAminabadi in #14298
Correct order of overflowing tokens for LayoutLmV2 tokenizer by @Apoorvgarg-creator in #13495
Update Seq2Seq QA example script to use SQuAD metric. by @karthikrangasai in #14335
remove an irrelevant test from test_modeling_tf_layoutlm by @ydshieh in #14341
bump flax version by @patil-suraj in #14343
Rewrite guides for fine-tuning with Datasets by @stevhliu in #13923
[Bert2Bert] allow bert2bert + relative embeddings by @patrickvonplaten in #14324
Support for TF >= 2.7 by @sgugger in #14345
BatchFeature: Convert List[np.ndarray] to np.ndarray before converting to pytorch tensors by @eladsegal in #14306
Adding some quality of life for pipeline function. by @Narsil in #14322
Fix fast tokenization problems by @qqaatw in #13930
Add notebook INC quantization for text classification tasks by @echarlaix in #14293
enhance rewrite state_dict missing _metadata by @changwangss in #14348
Fix list index out of range when padding nested empty lists by @qqaatw in #13876
[testing] solve the port conflict by @stas00 in #14362
Fix Flax params dtype by @patil-suraj in #13098
[flax generate] allow passing params to encode by @patil-suraj in #14370
Experimenting with adding proper get_config() and from_config() methods by @Rocketknight1 in #14361
Fixing requirements for TF LM models and use correct model mappings by @Rocketknight1 in #14372
fix loading flax bf16 weights in pt by @patil-suraj in #14369
[wav2vec2] fix --gradient_checkpointing by @stas00 in #13964
Adding support for raw python generator in addition to Dataset for pipelines by @Narsil in #14352
minor doc fix by @patil-suraj in #14377
[Wav2Vec2 Example] Improve fine-tuning script by @patrickvonplaten in #14373
Use AlbertConverter for FNet instead of using FNet's own converter by @qqaatw in #14365
Add support for WMT21 tokenizer in M2M100Tokenizer by @patil-suraj in #14376
[M2M100Tokenizer] fix _build_translation_inputs by @patil-suraj in #14382
Raise exceptions instead of using asserts in modeling_openai #12789 by @nbertagnolli in #14386
[doc] performance and parallelism updates by @stas00 in #14391
Quick fix to TF summarization example by @Rocketknight1 in #14401
[Speech2Text2] Enable tokenizers by @patrickvonplaten in #14390
Fix TFViT by @NielsRogge in #14399
Fix weight loading issue by @ydshieh in #14016
Replace BertLayerNorm with LayerNorm by @eldarkurtic in #14385
[Wav2Vec2] Make sure that gradient checkpointing is only run if needed by @patrickvonplaten in #14407
Allow per-version configurations by @LysandreJik in #14344
Fix gradient_checkpointing backward compatibility by @sgugger in #14408
Add forward method to dummy models by @sgugger in #14419
Avoid looping when data exhausted by @valentindey in #14413
Debug doc by @sgugger in #14424
[Wav2Vec2] Add New Wav2Vec2 Translation by @patrickvonplaten in #14392
Improve semantic segmentation models by @NielsRogge in #14355
[Gradient checkpoining] Update Wav2Vec scripts by @falcaopetri in #14036
[Bart] Fix docs by @patrickvonplaten in #14434
[WIP] Ensure TF model configs can be converted to proper JSON by @Zahlii in #14415
Recover Deleted XNLI Instructions by @Helw150 in #14437
Fix EncoderDecoderModel code example by @NielsRogge in #14441
Add a post init method to all models by @sgugger in #14431
Fix finite IterableDataset test on multiple GPUs by @sgugger in #14445
[Bert, et al] fix early device assignment by @stas00 in #14447
Add GitPython to quality tools by @LysandreJik in #14459
[ImageGPT] Small fixes by @NielsRogge in #14460
[Generation] Allow inputs_embeds as an input by @patrickvonplaten in #14443
Adding support for hidden_states and attentions in unbatching support. by @Narsil in #14420
add Tuple as possible type hint for EvalPredictions label_ids by @ameasure in #14473
Fix dummy objects for quantization by @sgugger in #14478
Moving pipeline tests from Narsil to hf-internal-testing. by @Narsil in #14463
Improve add-new-pipeline docs a bit by @stancld in #14485
[test] add test for --config_overrides by @stas00 in #14466
Support for Training with BF16 by @JamesDeAntonis in #13207
fixes some key names for in LayoutLMv2 / LayoutXLM tokenizers by @valentindey in #14493
Switch from using sum for flattening lists of lists in group_texts by @nbroad1881 in #14472
[deepspeed] zero inference by @stas00 in #14253
add cache_dir for tokenizer verification loading by @vmaryasin in #14508
Fix feature extraction utils import by @LysandreJik in #14515
[Tests] Improve vision tests by @NielsRogge in #14458
[CI] clear ~/.cache/torch_extensions between builds by @stas00 in #14520
Fix a slow test. by @Narsil in #14527
added save_directories for _psave_pretrained_pt and _tf, changed model to tf_model and pt_model, enable the notebook to run cleanly from top to bottom without error by @cfregly in #14529
Quicktour updates by @LysandreJik in #14533
Fixes by @LysandreJik in #14534
[flax] unfreeze initial cache in gpt models by @patil-suraj in #14535
Tokenizers docs: Specify which class contains __call__ method by @xhlulu in #14379
Rename ImageGPT by @NielsRogge in #14526
[Generate] Fix generate with inputs_embeds on GPU by @patrickvonplaten in #14564
[Flax] token-classification model steps enumerate start from 1 by @kamalkraj in #14547
Fix sentinel token IDs in data collator for Flax T5 pretraining script by @rahuln in #14477
Fix backend regex by @sgugger in #14566
[Flax] Add FlaxBlenderbot by @stancld in #13633
Add documentation for multi-label classification by @gsnidero in #14168
use functional interface for softmax in attention by @t-vi in #14198
Fix mask token handling by @qqaatw in #14364
[doc] bf16/tf32 guide by @stas00 in #14579
Rename toctree.yml -> _toctree.yml by @mishig25 in #14594
Update doc img links by @mishig25 in #14593
Adds a git pull instruction to the documentation builder by @LysandreJik in #14597
[Flax] Add FlaxBlenderbotSmall by @stancld in #14576
Python 3.6 -> Python 3.7 for TF runs by @LysandreJik in #14598
change tf.math.divide with int(/) in distilbert model by @yis11178 in #14600
fix #14524 (IndexError when mask prob is too low) by @nikvaessen in #14525
Improve tokenizer tests by @qqaatw in #13594
[CI] move env print to util, add pt, nccl versions by @stas00 in #14607
2022 is the year of multi-modality by @LysandreJik in #14610
Fix doc builder by @LysandreJik in #14616
[trainer] add tf32-mode control by @stas00 in #14606
Make DefaultDataCollator importable from root by @Rocketknight1 in #14588
fix a typo by @yuchenlin in #14626
updated pytorch token-classification readme by @kamalkraj in #14624
Add Flax example tests by @patil-suraj in #14599
fix typo by @patil-suraj in #14635
add flax example tests in CI workflow by @patil-suraj in #14637
[urls to hub] Replace outdated model tags with their now-canonical pipeline types by @julien-c in #14617
Update the example of exporting Bart + BeamSearch to ONNX module to resolve comments. by @fatcat-z in #14310
Add GPTJForQuestionAnswering by @tucan9389 in #14503
doc: mismatch between pooler/d_output by @guhur in #14641
fix flax example tests by @patil-suraj in #14643
Auto processor fix by @LysandreJik in #14623
Fix syntax for class references by @sgugger in #14644
Add a job to test the documentation build by @sgugger in #14645
fix flax examples tests by @patil-suraj in #14646
Use cross_attention_hidden_size in Encoder-Decoder models by @ydshieh in #14378
[deepspeed] fix --load_best_model_at_end by @stas00 in #14652
quick fix SummarizationPipeline error messages by @NouamaneTazi in #14618
Fix a Bug, trainer_seq2seq.py, in the else branch at Line 172, generation_inputs should be a dict by @TranSirius in #14546
[trainer] conditional ctx managers into one wrapper by @stas00 in #14663
Fixing Dataset for TQA + token-classification. by @Narsil in #14658
fix deprecated tf method by @zoheth in #14671
Fix doc builder by @LysandreJik in #14676
[AutoProcessor] Add Wav2Vec2WithLM & small fix #14675 (@patrickvonplaten)
Added support for other features for already supported models #14358 (@michaelbenayoun)
Revert "Added support for other features for already supported models" #14679 (@lewtun)
Convert tutorials #14665 (@sgugger)
fix: verify jsonlines file in run_translation (#14660) #14661 (@GaurangTandon)
Improvements to Comet Integration #14680 (@DN6)
Fixes in init #14681 (@sgugger)
Revert open-in-colab and add perceiver #14683 (@sgugger)
Fix wrong checkpoint paths in doc examples #14685 (@ydshieh)
[bf16 support] tweaks #14580 (@stas00)
[trainer] support UserDict inputs (torch-nightly) #14688 (@stas00)
Move pyctcdecode #14686 (@sgugger)
Make MLuke tokenizer tests slow #14690 (@sgugger)
Fix doc examples: name '...' is not defined #14687 (@ydshieh)
Add a job to test doc building (for realsies this time) #14662 (@sgugger)
Fix Perceiver tests #14703 (@NielsRogge)
add str hub token to repository when provided else fallback to default #14682 (@philschmid)
Fix typo in toctree #14704 (@mishig25)

New Contributors

@hrxorxm made their first contribution in #14211
@pdcoded made their first contribution in #14219
@wmathor made their first contribution in #14202
@wamartin-aml made their first contribution in #14232
@lumliolum made their first contribution in #14133
@dwyatte made their first contribution in #13891
@dshirron made their first contribution in #14241
@changwangss made their first contribution in #14276
@xhlulu made their first contribution in #14300
@Beomi made their first contribution in #14287
@nbertagnolli made their first contribution in #14325
@jeffra made their first contribution in #14331
@RezaYazdaniAminabadi made their first contribution in #14298
@echarlaix made their first contribution in #14293
@valentindey made their first contribution in #14413
@Zahlii made their first contribution in #14415
@Helw150 made their first contribution in #14437
@shangz-ai made their first contribution in #14066
@vmaryasin made their first contribution in #14508
@cfregly made their first contribution in #14529
@Xargonus made their first contribution in #14514
@rahuln made their first contribution in #14477
@gsnidero made their first contribution in #14168
@t-vi made their first contribution in #14198
@JamesDeAntonis made their first contribution in #13207
@yis11178 made their first contribution in #14600
@nikvaessen made their first contribution in #14525
@yuchenlin made their first contribution in #14626
@Ryou0634 made their first contribution in #14640
@NouamaneTazi made their first contribution in #14618
@TranSirius made their first contribution in #14546
@zoheth made their first contribution in #14671

Full Changelog: v4.12.0...v4.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.13.0: Perceiver, ImageGPT, mLUKE, Vision-Text dual encoders, QDQBert, new documentation frontend