Release v4.23.0: Whisper, Deformable DETR, Conditional DETR, MarkupLM, MSN, `safetensors` · huggingface/transformers

Whisper

The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

Whisper is an encoder-decoder Transformer trained on 680,000 hours of labeled (transcribed) audio. The model shows impressive performance and robustness in a zero-shot setting, in multiple languages.

Add WhisperModel to transformers by @ArthurZucker in #19166
Add TF whisper by @amyeroberts in #19378

Deformable DETR

The Deformable DETR model was proposed in Deformable DETR: Deformable Transformers for End-to-End Object Detection by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original DETR by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference.

Add Deformable DETR by @NielsRogge in #17281
[fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140

Conditional DETR

The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

Add support for conditional detr by @DeppMeng in #18948
Improve conditional detr docs by @NielsRogge in #19154

Time Series Transformer

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

The model is trained in a similar way to how one would train an encoder-decoder Transformer (like T5 or BART) for machine translation; i.e. teacher forcing is used. At inference time, one can autoregressively generate samples, one time step at a time.

⚠️ This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

time series forecasting model by @kashif in #17965

Masked Siamese Networks

The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.

MSN (masked siamese networks) consists of a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, the method yields excellent performance in the low-shot and extreme low-shot regimes for image classification, outperforming other self-supervised methods such as DINO. For instance, with 1% of ImageNet-1K labels, the method achieves 75.7% top-1 accuracy.

MSN (Masked Siamese Networks) for ViT by @sayakpaul in #18815

MarkupLM

The MarkupLM model was proposed in MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.

MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to LayoutLM.

The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks: WebSRC and SWDE.

Add MarkupLM by @NielsRogge in #19198

Security & safety

We explore a new serialization format not using Pickle that we can then leverage in the three frameworks we support: PyTorch, TensorFlow, and JAX. We leverage the safetensors library for that.

Support is for PyTorch models only at this stage, and still experimental.

Poc to use safetensors by @sgugger in #19175

Computer vision post-processing methods overhaul

The processors for computer vision have been overhauled to ensure they have consistent naming, input arguments and outputs.
⚠️ The existing methods that are superseded by the introduced methods post_process_object_detection, post_process_semantic_segmentation, post_process_instance_segmentation, post_process_panoptic_segmentation are now deprecated.

Improve DETR post-processing methods by @alaradirik in #19205
Beit postprocessing by @alaradirik in #19099
Fix BeitFeatureExtractor postprocessing by @alaradirik in #19119
Add post_process_semantic_segmentation method to SegFormer by @alaradirik in #19072
Add post_process_semantic_segmentation method to DPTFeatureExtractor by @alaradirik in #19107
Add semantic segmentation post-processing method to MobileViT by @alaradirik in #19105
Detr preprocessor fix by @alaradirik in #19007
Improve and fix ImageSegmentationPipeline by @alaradirik in #19367
Restructure DETR post-processing, return prediction scores by @alaradirik in #19262
Maskformer post-processing fixes and improvements by @alaradirik in #19172
Fix MaskFormer failing postprocess tests by @alaradirik in #19354
Fix DETR segmentation postprocessing output by @alaradirik in #19363
fix docs example, add object_detection to DETR docs by @alaradirik in #19377

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly..

Breaking change for ViT parameter initialization

🚨🚨🚨 Fix ViT parameter initialization by @alaradirik in #19341

Breaking change for the top_p argument of the TopPLogitsWarper of the generate method.

🚨🚨🚨 Optimize Top P Sampler and fix edge case by @ekagra-ranjan in #18984

Model head additions

OPT and BLOOM now have question answering heads available.

Add OPTForQuestionAnswering by @clementapa in #19402
Add BloomForQuestionAnswering by @younesbelkada in #19310

Pipelines

There is now a zero-shot object detection pipeline.

Add ZeroShotObjectDetectionPipeline by @sahamrit in #18445)

TensorFlow architectures

The GroupViT model is now available in TensorFlow.

[TensorFlow] Adding GroupViT by @ariG23498 in #18020

Bugfixes and improvements

Fix a broken link for deepspeed ZeRO inference in the docs by @nijkah in #19001
[doc] debug: fix import by @stas00 in #19042
[bnb] Small improvements on utils by @younesbelkada in #18646
Update image segmentation pipeline test by @amyeroberts in #18731
Fix test_save_load for TFViTMAEModelTest by @ydshieh in #19040
Pin minimum PyTorch version for BLOOM ONNX export by @lewtun in #19046
Update serving signatures and make sure we actually use them by @Rocketknight1 in #19034
Move cache: expand error message by @sgugger in #19051
Fixing OPT fast tokenizer option. by @Narsil in #18753
Fix custom tokenizers test by @sgugger in #19052
Run torchdynamo tests by @ydshieh in #19056
[fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140
fix arg name in BLOOM testing and remove unused arg document by @shijie-wu in #18843
Adds package and requirement spec output to version check exception by @colindean in #18702
fix use_cache by @younesbelkada in #19060
FX support for ConvNext, Wav2Vec2 and ResNet by @michaelbenayoun in #19053
[doc] Fix link in PreTrainedModel documentation by @tomaarsen in #19065
Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with FP16 input by @jimypbr in #18746
Organize test jobs by @sgugger in #19058
Automatically tag CLIP repos as zero-shot-image-classification by @osanseviero in #19064
Fix LeViT checkpoint by @ydshieh in #19069
TF: tests for (de)serializable models with resized tokens by @gante in #19013
Add type hints for PyTorch UniSpeech, MPNet and Nystromformer by @daspartho in #19039
replace logger.warn by logger.warning by @fxmarty in #19068
Fix tokenizer load from one file by @sgugger in #19073
Note about developer mode by @LysandreJik in #19075
german autoclass by @flozi00 in #19049
Add tests for legacy load by url and fix bugs by @sgugger in #19078
Add runner availability check by @ydshieh in #19054
fix working dir by @ydshieh in #19101
Added type hints for TFConvBertModel by @kishore-s-15 in #19088
Added Type hints for VIT MAE by @kishore-s-15 in #19085
Add type hints for TF MPNet models by @kishore-s-15 in #19089
Added type hints to ResNetForImageClassification by @kishore-s-15 in #19084
added type hints by @daspartho in #19076
Improve vision models docs by @NielsRogge in #19103
correct spelling in README by @flozi00 in #19092
Don't warn of move if cache is empty by @sgugger in #19109
HPO: keep the original logic if there's only one process, pass the trial to trainer by @sywangyi in #19096
Add documentation of Trainer.create_model_card by @sgugger in #19110
Added type hints for YolosForObjectDetection by @kishore-s-15 in #19086
Fix the wrong schedule by @ydshieh in #19117
Change document question answering pipeline to always return an array by @ankrgyl in #19071
german processing by @flozi00 in #19121
Fix: update ltp word segmentation call in mlm_wwm by @xyh1756 in #19047
Add a missing space in a script arg documentation by @bryant1410 in #19113
Skip test_export_to_onnx for LongT5 if torch < 1.11 by @ydshieh in #19122
Fix GLUE MNLI when using max_eval_samples by @lvwerra in #18722
[BugFix] Fix fsdp option on shard_grad_op. by @ZHUI in #19131
Fix FlaxPretTrainedModel pt weights check by @mishig25 in #19133
suppoer deps from github by @lhoestq in #19141
Fix dummy creation for multi-frameworks objects by @sgugger in #19144
Allowing users to use the latest tokenizers release ! by @Narsil in #19139
Add some tests for check_dummies by @sgugger in #19146
Fixed typo in generation_utils.py by @nbalepur in #19145
Add accelerate support for ViLT by @younesbelkada in #18683
TF: check embeddings range by @gante in #19102
Reduce LR for TF MLM example test by @Rocketknight1 in #19156
update perf_train_cpu_many doc by @sywangyi in #19151
fix: ckpt paths. by @sayakpaul in #19159
Fix TrainingArguments documentation by @sgugger in #19162
fix HPO DDP GPU problem by @sywangyi in #19168
[WIP] Trainer supporting evaluation on multiple datasets by @timbmg in #19158
Add doctests to Perceiver examples by @stevenmanton in #19129
Add offline runners info in the Slack report by @ydshieh in #19169
Fix incorrect comments about atten mask for pytorch backend by @lygztq in #18728
Fixed type hint for pipelines/check_task by @Fei-Wang in #19150
Update run_clip.py by @enze5088 in #19130
german training, accelerate and model sharing by @flozi00 in #19171
Separate Push CI images from Scheduled CI by @ydshieh in #19170
Remove pos arg from Perceiver's Pre/Postprocessors by @aielawady in #18602
Use assertAlmostEqual in BloomEmbeddingTest.test_logits by @ydshieh in #19200
Move the model type check by @ankrgyl in #19027
Use repo_type instead of deprecated datasets repo IDs by @sgugger in #19202
Updated hf_argparser.py by @IMvision12 in #19188
Add warning for torchaudio <= 0.10 in MCTCTFeatureExtractor by @ydshieh in #19203
Fix cached_file in offline mode for cached non-existing files by @sgugger in #19206
Remove unused cur_len in generation_utils.py by @ekagra-ranjan in #18874
add wav2vec2_alignment by @arijitx in #16782
add doc for hyperparameter search by @sywangyi in #19192
Add a use_parallel_residual argument to control the residual computing way by @NinedayWang in #18695
translated add_new_pipeline by @nickprock in #19215
More tests for regression in cached non existence by @sgugger in #19216
Use math.pi instead of torch.pi in MaskFormer by @ydshieh in #19201
Added tests for yaml and json parser by @IMvision12 in #19219
Fix small use_cache typo in the docs by @ankrgyl in #19191
Generate: add warning when left padding should be used by @gante in #19067
Fix deprecation warning for return_all_scores by @ogabrielluiz in #19217
Fix doctest for TFDeiTForImageClassification by @ydshieh in #19173
Document and validate typical_p in generation by @mapmeld in #19128
Fix trainer seq2seq qa.py evaluate log and ft script by @iamtatsuki05 in #19208
Fix cache names in CircleCI jobs by @ydshieh in #19223
Move AutoClasses under Main Classes by @stevhliu in #19163
Focus doc around preprocessing classes by @stevhliu in #18768
Fix confusing working directory in Push CI by @ydshieh in #19234
XGLM - Fix Softmax NaNs when using FP16 by @gsarti in #18057
Add a getattr method, which replaces _module_getattr in torch.fx.Tracer from PyTorch 1.13+ by @michaelbenayoun in #19233
Fix m2m_100.mdx doc example missing labels by @Mustapha-AJEGHRIR in #19149
Fix opt softmax small nit by @younesbelkada in #19243
Use hf_raise_for_status instead of deprecated _raise_for_status by @Wauplin in #19244
Fix TrainingArgs argument serialization by @atturaioe in #19239
Fix test fetching for examples by @sgugger in #19237
Cast TF generate() inputs by @Rocketknight1 in #19232
Skip pipeline tests by @sgugger in #19248
Add job names in Past CI artifacts by @ydshieh in #19235
Update Past CI report script by @ydshieh in #19228
[Wav2Vec2] Fix None loss in doc examples by @rbsteinm in #19218
Catch HFValidationError in TrainingSummary by @ydshieh in #19252
Add expected output to the sample code for ViTMSNForImageClassification by @sayakpaul in #19183
Add stop sequence to text generation pipeline by @KMFODA in #18444
Add notebooks by @JingyaHuang in #19259
Add beautifulsoup4 to the dependency list by @ydshieh in #19253
Fix Encoder-Decoder testing issue about repo. names by @ydshieh in #19250
Fix cached lookup filepath on windows for hub by @kjerk in #19178
Docs - Guide to add a new TensorFlow model by @gante in #19256
Update no_trainer script for summarization by @divyanshugit in #19277
Don't automatically add bug label by @sgugger in #19302
Breakup export guide by @stevhliu in #19271
Update Protobuf dependency version to fix known vulnerability by @qthequartermasterman in #19247
Update README.md by @ShubhamJagtap2000 in #19309
[Docs] Fix link by @patrickvonplaten in #19313
Fix for sequence regression fit() in TF by @Rocketknight1 in #19316
Added Type hints for LED TF by @IMvision12 in #19315
Added type hints for TF: rag model by @debjit-bw in #19284
alter retrived to retrieved by @gouqi666 in #18863
ci(stale.yml): upgrade actions/setup-python to v4 by @oscard0m in #19281
ci(workflows): update actions/checkout to v3 by @oscard0m in #19280
wrap forward passes with torch.no_grad() by @daspartho in #19279
wrap forward passes with torch.no_grad() by @daspartho in #19278
wrap forward passes with torch.no_grad() by @daspartho in #19274
wrap forward passes with torch.no_grad() by @daspartho in #19273
Removing BertConfig inheritance from LayoutLMConfig by @arnaudstiegler in #19307
docker-build: Update actions/checkout to v3 by @Sushrut1101 in #19288
Clamping hidden state values to allow FP16 by @SSamDav in #19229
Remove interdependency from OpenAI tokenizer by @E-Aho in #19327
removing XLMConfig inheritance from FlaubertConfig by @D3xter1922 in #19326
Removed interdependency of BERT's Tokenizer in tokenization of prophetnet by @divyanshugit in #19331
Remove bert interdependency from clip tokenizer by @shyamsn97 in #19332
[WIP]remove XLMTokenizer inheritance from FlaubertTokenizer by @D3xter1922 in #19330
Making camembert independent from roberta, clean by @Mustapha-AJEGHRIR in #19337
Add sudachi and jumanpp tokenizers for bert_japanese by @r-terada in #19043
Frees LongformerTokenizer of the Roberta dependency by @srhrshr in #19346
Change BloomConfig docstring by @younesbelkada in #19336
Test failing test while we resolve the issue. by @sgugger in #19355
Call _set_save_spec() when creating TF models by @Rocketknight1 in #19321
correct typos in README by @paulaxisabel in #19304
Removes Roberta and Bert config dependencies from Longformer by @srhrshr in #19343
Fix gather for metrics by @muellerzr in #19360
Fix pipeline tests for Roberta-like tokenizers by @sgugger in #19365
Change link of repojacking vulnerable link by @Ilaygoldman in #19393
Making ConvBert Tokenizer independent from bert Tokenizer by @IMvision12 in #19347
Fix gather for metrics by @muellerzr in #19389
Added Type hints for XLM TF by @IMvision12 in #19333
add ONNX support for swin transformer by @bibhabasumohapatra in #19390
removes prophet config dependencies from xlm-prophet by @srhrshr in #19400
Added type hints for TF: TransfoXL by @thliang01 in #19380
HF <-> megatron checkpoint reshaping and conversion for GPT by @pacman100 in #19317
Remove unneded words from audio-related feature extractors by @osanseviero in #19405
edit: cast attention_mask to long in DataCollatorCTCWithPadding by @ddobokki in #19369
Copy BertTokenizer dependency into retribert tokenizer by @Davidy22 in #19371
Export TensorFlow models to ONNX with dynamic input shapes by @dwyatte in #19255
update attention mask handling by @ArthurZucker in #19385
Remove dependency of Bert from Squeezebert tokenizer by @rchan26 in #19403
Removed Bert and XML Dependency from Herbert by @harry7337 in #19410
Clip device map by @patrickvonplaten in #19409
Remove Dependency between Bart and LED (slow/fast) by @Infrared1029 in #19408
Removed Bert interdependency in tokenization_electra.py by @OtherHorizon in #19356
Make Camembert TF version independent from Roberta by @Mustapha-AJEGHRIR in #19364
Removed Bert dependency from BertGeneration code base. by @Threepointone4 in #19370
Rework pipeline tests by @sgugger in #19366
Fix ViTMSNForImageClassification doctest by @ydshieh in #19275
Skip BloomEmbeddingTest.test_embeddings for PyTorch < 1.10 by @ydshieh in #19261
remove RobertaConfig inheritance from MarkupLMConfig by @D3xter1922 in #19404
Backtick fixed (paragraph 68) by @kant in #19440
Fixed duplicated line (paragraph #83) Documentation: @sgugger by @kant in #19436
fix marianMT convertion to onnx by @kventinel in #19287
Fix typo in image-classification/README.md by @zhawe01 in #19424
Stop relying on huggingface_hub's private methods by @LysandreJik in #19392
Add onnx support for VisionEncoderDecoder by @mht-sharma in #19254
Remove dependency of Roberta in Blenderbot by @rchan26 in #19411
fix: renamed variable name by @ariG23498 in #18850
Fix the error message in run_t5_mlm_flax.py by @yangky11 in #19282
Add Italian translation for add_new_model.mdx by @Steboss89 in #18713
Fix momentum and epsilon values by @amyeroberts in #19454
Generate: corrected exponential_decay_length_penalty type hint by @ShivangMishra in #19376
Fix misspelled word in docstring by @Bearnardd in #19415
Fixed a non-working hyperlink in the README.md file by @MikailINTech in #19434
fix by @ydshieh in #19469
wrap forward passes with torch.no_grad() by @daspartho in #19439
wrap forward passes with torch.no_grad() by @daspartho in #19438
wrap forward passes with torch.no_grad() by @daspartho in #19416
wrap forward passes with torch.no_grad() by @daspartho in #19414
wrap forward passes with torch.no_grad() by @daspartho in #19413
wrap forward passes with torch.no_grad() by @daspartho in #19412

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@flozi00
- german autoclass (#19049)
- correct spelling in README (#19092)
- german processing (#19121)
- german training, accelerate and model sharing (#19171)
@DeppMeng
- Add support for conditional detr (#18948)
@sayakpaul
- MSN (Masked Siamese Networks) for ViT (#18815)
- fix: ckpt paths. (#19159)
- Add expected output to the sample code for ViTMSNForImageClassification (#19183)
@IMvision12
- Updated hf_argparser.py (#19188)
- Added tests for yaml and json parser (#19219)
- Added Type hints for LED TF (#19315)
- Making ConvBert Tokenizer independent from bert Tokenizer (#19347)
- Added Type hints for XLM TF (#19333)
@ariG23498
- [TensorFlow] Adding GroupViT (#18020)
- fix: renamed variable name (#18850)
@Mustapha-AJEGHRIR
- Fix m2m_100.mdx doc example missing labels (#19149)
- Making camembert independent from roberta, clean (#19337)
- Make Camembert TF version independent from Roberta (#19364)
@D3xter1922
- removing XLMConfig inheritance from FlaubertConfig (#19326)
- [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer (#19330)
- remove RobertaConfig inheritance from MarkupLMConfig (#19404)
@srhrshr
- Frees LongformerTokenizer of the Roberta dependency (#19346)
- Removes Roberta and Bert config dependencies from Longformer (#19343)
- removes prophet config dependencies from xlm-prophet (#19400)
@sahamrit
- [WIP] Add ZeroShotObjectDetectionPipeline (#18445) (#18930)
@Davidy22
- Copy BertTokenizer dependency into retribert tokenizer (#19371)
@rchan26
- Remove dependency of Bert from Squeezebert tokenizer (#19403)
- Remove dependency of Roberta in Blenderbot (#19411)
@harry7337
- Removed Bert and XML Dependency from Herbert (#19410)
@Infrared1029
- Remove Dependency between Bart and LED (slow/fast) (#19408)
@Steboss89
- Add Italian translation for add_new_model.mdx (#18713)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.23.0: Whisper, Deformable DETR, Conditional DETR, MarkupLM, MSN, `safetensors`