v4.31.0: Llama v2, MusicGen, Bark, MMS, EnCodec, InstructBLIP, Umt5, MRa, vIvIt
New models
Llama v2
Llama 2 was proposed in LLaMA: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron et al. It builds upon the Llama architecture adding Grouped Query Attention for efficient inference.
- Add support for Llama 2 by @ArthurZucker in #24891
Musicgen
The MusicGen model was proposed in the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
MusicGen is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.
Through an efficient token interleaving pattern, MusicGen does not require a self-supervised semantic representation of the text/audio prompts, thus eliminating the need to cascade multiple models to predict a set of codebooks (e.g. hierarchically or upsampling). Instead, it is able to generate all the codebooks in a single forward pass.
- Add Musicgen by @sanchit-gandhi in #24109
Bark
Bark is a transformer-based text-to-speech model proposed by Suno AI in suno-ai/bark.
MMS
The MMS model was proposed in Scaling Speech Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli
- Add MMS CTC Fine-Tuning by @patrickvonplaten in #24281
EnCodec
The EnCodec neural codec model was proposed in High Fidelity Neural Audio Compression by Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.
InstructBLIP
The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.
- Add InstructBLIP by @NielsRogge in #23460
Umt5
The UMT5 model was proposed in UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.
- [
Umt5
] Add google's umt5 totransformers
by @ArthurZucker in #24477
MRA
The MRA model was proposed in Multi Resolution Analysis (MRA) for Approximate Self-Attention by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, and Vikas Singh.
ViViT
The Vivit model was proposed in ViViT: A Video Vision Transformer by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid. The paper proposes one of the first successful pure-transformer based set of models for video understanding.
Python 3.7
The last version to support Python 3.7 was 4.30.x, as it reached end-of-life on June 27, 2023 and is no longer supported by the Python Software Foundation.
PyTorch 1.9
The last version to support PyTorch 1.9 was 4.30.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.10 and up, we do not support PyTorch 1.9 for v4.31 and up.
RoPE scaling
This PR adds RoPE scaling to the LLaMa and GPTNeoX families of models. It allows us to extrapolate and go beyond the original maximum sequence length (e.g. 2048 tokens on LLaMA), without fine-tuning. It offers two strategies:
- Linear scaling
- Dynamic NTK scaling
Agents
Tools now return a type that is specific to agents. This type can return a serialized version of itself (a string), that either points to a file on-disk or to the object's content. This should make interaction with text-based systems much simpler.
- Tool types by @LysandreJik in #24032
Tied weights load
Models with potentially tied weights dropped off some keys from the state dict even when the weights were not tied. This has now been fixed and more generally, the whole experience of loading a model with state dict that don't match exactly should be improved in this release.
Whisper word-level timestamps
This PR adds a method of predicting timestamps at the word (or even token) level, by analyzing the cross-attentions and applying dynamic time warping.
Auto model addition
A new auto model is added, AutoModelForTextEncoding
. It is to be used when you want to extract the text encoder from an encoder-decoder architecture.
- [AutoModel] Add AutoModelForTextEncoding by @sanchit-gandhi in #24305
Model deprecation
Transformers is growing a lot and to ease a bit the burden of maintenance on our side, we have taken the decision to deprecate models that are not used a lot. Those models will never actually disappear from the library, but we will stop testing them or accepting PRs modifying them.
(enfin ça
The criteria to identify models to deprecate was less than 1,000 unique downloads in the last 30 days for models that are at least one year old. The list of deprecated models is:
- BORT
- M-CTC-T
- MMBT
- RetriBERT
- TAPEX
- Trajectory Transformer
- VAN
Breaking changes
Fixes an issue with stripped spaces for the T5 family tokenizers. If this impacts negatively inference/training with your models, please let us know by opening an issue.
⚠️ ⚠️ [T5Tokenize
] Fix T5 family tokenizers⚠️ ⚠️ by @ArthurZucker in #24565
Bugfixes and improvements
-
add trust_remote_code option to CLI download cmd by @radames in #24097
-
Avoid
GPT-2
daily CI job OOM (in TF tests) by @ydshieh in #24106 -
[Lllama] Update tokenization code to ensure parsing of the special tokens [core] by @ArthurZucker in #24042
-
[
bnb
] Fix bnb config json serialization by @younesbelkada in #24137 -
Correctly build models and import call_context for older TF versions by @Rocketknight1 in #24138
-
Generate: PT's
top_p
enforcesmin_tokens_to_keep
when it is1
by @gante in #24111 -
fix bugs with trainer by @pacman100 in #24134
-
[
SAM
] Fix sam slow test by @younesbelkada in #24140 -
[lamaTokenizerFast] Update documentation by @ArthurZucker in #24132
-
[BlenderBotSmall] Update doc example by @ArthurZucker in #24092
-
[documentation] grammatical fixes in image_classification.mdx by @LiamSwayne in #24141
-
Fix typo in streamers.py by @freddiev4 in #24144
-
Fix push to hub by @NielsRogge in #24187
-
Change ProgressCallback to use dynamic_ncols=True by @gmlwns2000 in #24101
-
[i18n]Translated "attention.mdx" to korean by @kihoon71 in #23878
-
Generate: force caching on the main model, in assisted generation by @gante in #24177
-
Fix device issue in
OpenLlamaModelTest::test_model_parallelism
by @ydshieh in #24195 -
typo: fix typos in CONTRIBUTING.md and deepspeed.mdx by @zsj9509 in #24184
-
Generate: detect special architectures when loaded from PEFT by @gante in #24198
-
🌐 [i18n-KO] Translated tasks_summary.mdx to Korean by @kihoon71 in #23977
-
🚨🚨🚨 Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests 🚨🚨🚨 by @muellerzr in #24028
-
Fix steps bugs in no trainer examples by @Ethan-yt in #24197
-
Remove unnecessary aten::to overhead in llama by @fxmarty in #24203
-
Update
WhisperForAudioClassification
doc example by @ydshieh in #24188 -
Finish dataloader integration by @muellerzr in #24201
-
Add the number of
model
test failures to slack CI report by @ydshieh in #24207 -
fix: TextIteratorStreamer cannot work with pipeline by @yuanwu2017 in #23641
-
Improving error message when using
use_safetensors=True
. by @Narsil in #24232 -
Safely import pytest in testing_utils.py by @amyeroberts in #24241
-
fix overflow when training mDeberta in fp16 by @sjrl in #24116
-
deprecate
use_mps_device
by @pacman100 in #24239 -
[Time Series] use mean scaler when scaling is a boolean True by @kashif in #24237
-
TF: standardize
test_model_common_attributes
for language models by @gante in #23457 -
Generate: GenerationConfig can overwrite attributes at from_pretrained time by @gante in #24238
-
Add
torch >=1.12
requirement forTapas
by @ydshieh in #24251 -
Update urls in warnings for rich rendering by @IvanReznikov in #24136
-
Fix how we detect the TF package by @Rocketknight1 in #24255
-
Stop storing references to bound methods via tf.function by @Rocketknight1 in #24146
-
docs wrt using accelerate launcher with trainer by @pacman100 in #24250
-
update FSDP save and load logic by @pacman100 in #24249
-
Fix URL in comment for contrastive loss function by @taepd in #24271
-
QA doc: import torch before it is used by @ByronHsu in #24228
-
Skip some
TQAPipelineTests
tests in past CI by @ydshieh in #24267 -
Adapt Wav2Vec2 conversion for MMS lang identification by @patrickvonplaten in #24234
-
Pix2StructImageProcessor
requirestorch>=1.11.0
by @ydshieh in #24270 -
Fix Debertav2 embed_proj by @WissamAntoun in #24205
-
Fix bug in slow tokenizer conversion, make it a lot faster by @stephantul in #24266
-
Fix
check_config_attributes
: check all configuration classes by @ydshieh in #24231 -
Fix LLaMa beam search when using parallelize by @FeiWang96 in #24224
-
remove unused is_decoder parameter in DetrAttention by @JayL0321 in #24226
-
[fix] bug in BatchEncoding.getitem by @flybird1111 in #24293
-
Fix image segmentation tool bug by @amyeroberts in #23897
-
[Docs] Improve docs for MMS loading of other languages by @patrickvonplaten in #24292
-
deepspeed init during eval fix by @pacman100 in #24298
-
[EnCodec] Changes for 32kHz ckpt by @sanchit-gandhi in #24296
-
[Docs] Fix the paper URL for MMS model by @hitchhicker in #24302
-
Update tokenizer_summary.mdx (grammar) by @belladoreai in #24286
-
Beam search type by @jprivera44 in #24288
-
[
SwitchTransformers
] Fix return values by @ArthurZucker in #24300 -
Fix functional TF Whisper and modernize tests by @Rocketknight1 in #24301
-
Big TF test cleanup by @Rocketknight1 in #24282
-
Fix ner average grouping with no groups by @Narsil in #24319
-
Fix ImageGPT doc example by @amyeroberts in #24317
-
Add test for proper TF input signatures by @Rocketknight1 in #24320
-
Adding ddp_broadcast_buffers argument to Trainer by @TevenLeScao in #24326
-
error bug on saving distributed optim state when using data parallel by @xshaun in #24108
-
🌐 [i18n-KO] Fixed
tutorial/preprocessing.mdx
by @sim-so in #24156 -
pin
apex
to a speicifc commit (for DeepSpeed CI docker image) by @ydshieh in #24351 -
Clean up disk sapce during docker image build for
transformers-pytorch-gpu
by @ydshieh in #24346 -
Fix
KerasMetricCallback
: passgenerate_kwargs
even ifuse_xla_generation
is False by @Kripner in #24333 -
Fix device issue in
SwitchTransformers
by @ydshieh in #24352 -
Update MMS integration docs by @vineelpratap in #24311
-
Make
AutoFormer
work with previous torch version by @ydshieh in #24357 -
Fix ImageGPT doctest by @amyeroberts in #24353
-
Fix link to documentation in Install from Source by @SoyGema in #24336
-
docs: add BentoML to awesome-transformers by @aarnphm in #24344
-
[Doc Fix] Fix model name path in the transformers doc for AutoClasses by @riteshghorse in #24329
-
Fix the order in
GPTNeo
's docstring by @qgallouedec in #24358 -
Respect explicitly set framework parameter in pipeline by @denis-ismailaj in #24322
-
Allow passing kwargs through to TFBertTokenizer by @Rocketknight1 in #24324
-
Fix resuming PeftModel checkpoints in Trainer by @llohann-speranca in #24274
-
TensorFlow CI fixes by @Rocketknight1 in #24360
-
Update tiny models for pipeline testing. by @ydshieh in #24364
-
[modelcard] add audio classification to task list by @sanchit-gandhi in #24363
-
[Whisper] Make tests faster by @sanchit-gandhi in #24105
-
Add a check in
ImageToTextPipeline._forward
by @ydshieh in #24373 -
[Tokenizer doc] Clarification about
add_prefix_space
by @ArthurZucker in #24368 -
style: add BitsAndBytesConfig repr function by @aarnphm in #24331
-
Better test name and enable pipeline test for
pix2struct
by @ydshieh in #24377 -
Skip a tapas (tokenization) test in past CI by @ydshieh in #24378
-
[Whisper Docs] Nits by @ArthurZucker in #24367
-
[GPTNeoX] Nit in config by @ArthurZucker in #24349
-
[Wav2Vec2 - MMS] Correct directly loading adapters weights by @patrickvonplaten in #24335
-
Add
ffmpeg
fordoc_test_job
on CircleCI by @ydshieh in #24397 -
byebye Hub connection timeout - Recast by @ydshieh in #24399
-
fix type annotation for debug arg by @Bearnardd in #24033
-
[Trainer] Fix optimizer step on PyTorch TPU by @cowanmeg in #24389
-
Fix gradient checkpointing + fp16 autocast for most models by @younesbelkada in #24247
-
Clean up dist import by @muellerzr in #24402
-
Check auto mappings could be imported via
from transformers
by @ydshieh in #24400 -
Remove redundant code from TrainingArgs by @muellerzr in #24401
-
[ASR pipeline] Check for torchaudio by @sanchit-gandhi in #23953
-
TF safetensors reduced mem usage by @Rocketknight1 in #24404
-
Skip
test_conditional_generation_pt_pix2struct
in Past CI (torch < 1.11) by @ydshieh in #24417 -
[
bnb
] Fix bnb serialization issue with new release by @younesbelkada in #24416 -
Revert "Fix gradient checkpointing + fp16 autocast for most models" by @younesbelkada in #24420
-
Update RayTune doc link for Hyperparameter tuning by @JoshuaEPSamuel in #24422
-
TF CI fix for Segformer by @Rocketknight1 in #24426
-
Refactor hyperparameter search backends by @alexmojaki in #24384
-
Clarify batch size displayed when using DataParallel by @sgugger in #24430
-
Save
site-packages
as cache in CircleCI job by @ydshieh in #24424 -
[llama] Fix comments in weights converter by @weimingzha0 in #24436
-
[
Trainer
] Fix.to
call on 4bit models by @younesbelkada in #24444 -
fix the grad_acc issue at epoch boundaries by @pacman100 in #24415
-
Replace python random with torch.rand to enable dynamo.export by @BowenBao in #24434
-
Fix typo by @siryuon in #24440
-
Fix some
TFWhisperModelIntegrationTests
by @ydshieh in #24428 -
fixes issue when saving fsdp via accelerate's FSDP plugin by @pacman100 in #24446
-
Allow dict input for audio classification pipeline by @sanchit-gandhi in #23445
-
Improved keras imports by @Rocketknight1 in #24448
-
add missing alignment_heads to Whisper integration test by @hollance in #24487
-
Update AlbertModel type annotation by @amyeroberts in #24450
-
[
pipeline
] Fix str device issue by @younesbelkada in #24396 -
when resume from peft checkpoint, the model should be trainable by @sywangyi in #24463
-
deepspeed z1/z2 state dict fix by @pacman100 in #24489
-
Update
InstructBlipModelIntegrationTest
by @ydshieh in #24490 -
Update token_classification.md by @condor-cp in #24484
-
Add support for for loops in python interpreter by @sgugger in #24429
-
[
InstructBlip
] Add accelerate support for instructblip by @younesbelkada in #24488 -
Compute
dropout_probability
only in training mode by @ydshieh in #24486 -
Fix 'local_rank' AttiributeError in Trainer class by @mocobeta in #24297
-
Compute
dropout_probability
only in training mode (SpeechT5) by @ydshieh in #24498 -
🚨🚨 Fix group beam search by @hukuda222 in #24407
-
Generate:
group_beam_search
requiresdiversity_penalty>0.0
by @gante in #24456 -
Generate:
min_tokens_to_keep
has to be>= 1
by @gante in #24453 -
Fix TypeError: Object of type int64 is not JSON serializable by @xiaoli in #24340
-
🌐 [i18n-KO] Translated
tflite.mdx
to Korean by @0525hhgus in #24435 -
use accelerate autocast in jit eval path, since mix precision logic is… by @sywangyi in #24460
-
Update hyperparameter_search.py by @pacman100 in #24515
-
[
T5
] Add T5ForQuestionAnswering and MT5ForQuestionAnswering by @sjrl in #24481 -
set model to training mode before accelerate.prepare by @sywangyi in #24520
-
Find module name in an OS-agnostic fashion by @sgugger in #24526
-
Fix LR scheduler based on bs from auto bs finder by @muellerzr in #24521
-
[Mask2Former] Remove SwinConfig by @NielsRogge in #24259
-
Allow backbones not in backbones_supported - Maskformer Mask2Former by @amyeroberts in #24532
-
Finishing tidying keys to ignore on load by @sgugger in #24535
-
Add bitsandbytes support for gpt2 models by @DarioSucic in #24504
-
Unpin DeepSpeed and require DS >= 0.9.3 by @ydshieh in #24541
-
Allow for warn_only selection in enable_full_determinism by @Frank995 in #24496
-
Fix typing annotations for FSDP and DeepSpeed in TrainingArguments by @mryab in #24549
-
Update PT/TF weight conversion after #24030 by @ydshieh in #24547
-
[
gpt2-int8
] Add gpt2-xl int8 test by @younesbelkada in #24543 -
Fix processor init bug if image processor undefined by @amyeroberts in #24554
-
[
InstructBlip
] Add instruct blip int8 test by @younesbelkada in #24555 -
Update PT/Flax weight conversion after #24030 by @ydshieh in #24556
-
Make PT/Flax tests could be run on GPU by @ydshieh in #24557
-
Update masked_language_modeling.md by @condor-cp in #24560
-
Fixed OwlViTModel inplace operations by @pasqualedem in #24529
-
Update old existing feature extractor references by @amyeroberts in #24552
-
Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" by @sgugger in #24574
-
Update some torchscript tests after #24505 by @ydshieh in #24566
-
Removal of deprecated vision methods and specify deprecation versions by @amyeroberts in #24570
-
Check all objects are equally in the main
__init__
file by @ydshieh in #24573 -
fix peft ckpts not being pushed to hub by @pacman100 in #24578
-
Udate link to RunHouse hardware setup documentation. by @BioGeek in #24590
-
Show a warning for missing attention masks when pad_token_id is not None by @hackyon in #24510
-
Make (TF) CI faster (test only a subset of model classes) by @ydshieh in #24592
-
Speed up TF tests by reducing hidden layer counts by @Rocketknight1 in #24595
-
🌐 [i18n-KO] Translated
perplexity.mdx
to Korean by @HanNayeoniee in #23850 -
Fix loading dataset docs link in run_translation.py example by @SoyGema in #24594
-
Generate: multi-device support for contrastive search by @gante in #24635
-
Generate: force cache with
inputs_embeds
forwarding by @gante in #24639 -
precompiled_charsmap checking before adding to the normalizers' list for XLNetTokenizerFast conversion. by @shahad-mahmud in #24618
-
Fix audio feature extractor deps by @sanchit-gandhi in #24636
-
documentation_tests.txt - sort filenames alphabetically by @amyeroberts in #24647
-
Update warning messages reffering to post_process_object_detection by @rafaelpadilla in #24649
-
Add
finetuned_from
property in the autogenerated model card by @sgugger in #24528 -
Make warning disappear for remote code in pipelines by @sgugger in #24603
-
Fix
EncodecModelTest::test_multi_gpu_data_parallel_forward
by @ydshieh in #24663 -
Fix
VisionTextDualEncoderIntegrationTest
by @ydshieh in #24661 -
Add
is_torch_mps_available
function to utils by @NripeshN in #24660 -
Fix model referenced and results in documentation. Model mentioned was inaccessible by @rafaelpadilla in #24609
-
Add Nucleotide Transformer notebooks and restructure notebook list by @Rocketknight1 in #24669
-
DeepSpeed/FSDP ckpt saving utils fixes and FSDP training args fixes by @pacman100 in #24591
-
Avoid import
sentencepiece_model_pb2
inutils.__init__.py
by @ydshieh in #24689 -
Fix integration with Accelerate and failing test by @muellerzr in #24691
-
[
MT5
] Fix CONFIG_MAPPING issue leading it to load umt5 class by @ArthurZucker in #24678 -
Fix flaky
test_for_warning_if_padding_and_no_attention_mask
by @ydshieh in #24706 -
Enable
conversational
pipeline forGPTSw3Tokenizer
by @saattrupdan in #24648 -
[
T5
] Adding model_parallel = False toT5ForQuestionAnswering
andMT5ForQuestionAnswering
by @sjrl in #24684 -
Docs: change some
input_ids
doc reference fromBertTokenizer
toAutoTokenizer
by @gante in #24730 -
[Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words by @ArthurZucker in #24622
-
Fix typo in LocalAgent by @jamartin9 in #24736
-
fix: Text splitting in the BasicTokenizer by @connor-henderson in #22280
-
add gradient checkpointing for distilbert by @jordane95 in #24719
-
Skip keys not in the state dict when finding mismatched weights by @sgugger in #24749
-
Fix non-deterministic Megatron-LM checkpoint name by @janEbert in #24674
-
[InstructBLIP] Fix bos token of LLaMa checkpoints by @NielsRogge in #24492
-
Skip some slow tests for doctesting in PRs (Circle)CI by @ydshieh in #24753
-
Fix lr scheduler not being reset on reruns by @muellerzr in #24758
-
🐛 Handle empty gen_kwargs for seq2seq trainer prediction_step function by @gkumbhat in #24759
-
Allow existing configs to be registered by @sgugger in #24760
-
Unpin protobuf in docker file (for daily CI) by @ydshieh in #24761
-
Fix eval_accumulation_steps leading to incorrect metrics by @muellerzr in #24756
-
Add MobileVitV2 to doctests by @amyeroberts in #24771
-
Replacement of 20 asserts with exceptions by @Baukebrenninkmeijer in #24757
-
Update default values of bos/eos token ids in
CLIPTextConfig
by @ydshieh in #24773 -
Fix pad across processes dim in trainer and not being able to set the timeout by @muellerzr in #24775
-
gpt-bigcode: avoid
zero_
to support Core ML by @pcuenca in #24755 -
Remove WWT from README by @LysandreJik in #24672
-
Rm duplicate pad_across_processes by @muellerzr in #24780
-
Revert "Unpin protobuf in docker file (for daily CI)" by @ydshieh in #24800
-
Removing unnecessary
device=device
in modeling_llama.py by @Liyang90 in #24696 -
[fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" by @SeongBeomLEE in #24769
-
[DOC] Clarify relationshi load_best_model_at_end and save_total_limit by @BramVanroy in #24614
-
Fix MobileVitV2 doctest checkpoint by @amyeroberts in #24805
-
Skip torchscript tests for
MusicgenForConditionalGeneration
by @ydshieh in #24782 -
Generate: add SequenceBiasLogitsProcessor by @gante in #24334
-
Add accelerate version in transformers-cli env by @amyeroberts in #24806
-
Remove Falcon docs for the release until TGI is ready by @Rocketknight1 in #24808
-
Update setup.py to be compatible with pipenv by @georgiemathews in #24789
-
Use _BaseAutoModelClass's register method by @fadynakhla in #24810
-
Copy code when using local trust remote code by @sgugger in #24785
-
Fixing double
use_auth_token.pop
(preventing private models from being visible). by @Narsil in #24812 -
set correct model input names for gptsw3tokenizer by @DarioSucic in #24788
-
Check models used for common tests are small by @sgugger in #24824
-
[🔗 Docs] Fixed Incorrect Migration Link by @kadirnar in #24793
-
deprecate
sharded_ddp
training argument by @statelesshz in #24825 -
🌐 [i18n-KO] Translated
custom_tools.mdx
to Korean by @sim-so in #24580 -
Remove unused code in GPT-Neo by @namespace-Pt in #24826
-
Add Multimodal heading and Document question answering in task_summary.mdx by @y3sar in #23318
-
Fix comments for
_merge_heads
by @bofenghuang in #24855 -
fix broken links in READMEs by @younesbelkada in #24861
-
Add TAPEX to the list of deprecated models by @sgugger in #24859
Significant community contributions
The following contributors have made significant changes to the library over the last release: