v4.42.0: Gemma 2, RTDETR, InstructBLIP, LLAVa Next, New Model Adder
New model additions
Gemma-2
The Gemma2 model was proposed in Gemma2: Open Models Based on Gemini Technology and Research by Gemma2 Team, Google.
Gemma2 models are trained on 6T tokens, and released with 2 versions, 2b and 7b.
The abstract from the paper is the following:
This work introduces Gemma2, a new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma2 outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of our model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations
- Add gemma 2 by @ArthurZucker in #31659
RTDETR
The RT-DETR model was proposed in DETRs Beat YOLOs on Real-time Object Detection by Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu.
RT-DETR is an object detection model that stands for “Real-Time DEtection Transformer.” This model is designed to perform object detection tasks with a focus on achieving real-time performance while maintaining high accuracy. Leveraging the transformer architecture, which has gained significant popularity in various fields of deep learning, RT-DETR processes images to identify and locate multiple objects within them.
- New model support RTDETR by @SangbumChoi in #29077
InstructBlip
The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.
InstructBLIP uses the same architecture as BLIP-2 with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former.
- Add video modality for InstrucBLIP by @zucchini-nlp in #30182
LlaVa NeXT Video
The LLaVa-NeXT-Video model was proposed in LLaVA-NeXT: A Strong Zero-shot Video Understanding Model by Yuanhan Zhang, Bo Li, Haotian Liu, Yong Jae Lee, Liangke Gui, Di Fu, Jiashi Feng, Ziwei Liu, Chunyuan Li. LLaVa-NeXT-Video improves upon LLaVa-NeXT by fine-tuning on a mix if video and image dataset thus increasing the model’s performance on videos.
LLaVA-NeXT surprisingly has strong performance in understanding video content in zero-shot fashion with the AnyRes technique that it uses. The AnyRes technique naturally represents a high-resolution image into multiple images. This technique is naturally generalizable to represent videos because videos can be considered as a set of frames (similar to a set of images in LLaVa-NeXT). The current version of LLaVA-NeXT makes use of AnyRes and trains with supervised fine-tuning (SFT) on top of LLaVA-Next on video data to achieves better video understanding capabilities.The model is a current SOTA among open-source models on VideoMME bench.
- Add LLaVa NeXT Video by @zucchini-nlp in #31252
New model adder
A very significant change makes its way within the transformers
codebase, introducing a new way to add models to transformers
. We recommend reading the description of the PR below, but here is the gist of it:
The diff_converter tool is here to replace our old Copied from statements, while keeping our core transformers philosophy:
- single model single file
- explicit code
- standardization of modeling code
- readable and educative code
- simple code
- least amount of modularity
This additionally unlocks the ability to very quickly see the differences between new architectures that get developed. While many architectures are similar, the "single model, single file" policy can obfuscate the changes. With this diff converter, we want to make the changes between architectures very explicit.
- Diff converter v2 by @ArthurZucker in #30868
Tool-use and RAG model support
We've made major updates to our support for tool-use and RAG models. We can now automatically generate JSON schema descriptions for Python functions which are suitable for passing to tool models, and we've defined a standard API for tool models which should allow the same tool inputs to be used with many different models. Models will need updates to their chat templates to support the new API, and we're targeting the Nous-Hermes, Command-R and Mistral/Mixtral model families for support in the very near future. Please see the updated chat template docs for more information.
If you are the owner of a model that supports tool use, but you're not sure how to update its chat template to support the new API, feel free to reach out to us for assistance with the update, for example on the Hugging Face Discord server. Ping Matt and yell key phrases like "chat templates" and "Jinja" and your issue will probably get resolved.
- Chat Template support for function calling and RAG by @Rocketknight1 in #30621
GGUF support
We further the support of GGUF files to offer fine-tuning within the python/HF ecosystem, before converting them back to the GGUF/GGML/llama.cpp libraries.
- Add Qwen2 GGUF loading support by @Isotr0py in #31175
- GGUF: Fix llama 3 GGUF by @younesbelkada in #31358
- Fix llama gguf converter by @SunMarc in #31575
Trainer improvements
A new optimizer is added in the Trainer
.
- FEAT / Trainer: LOMO optimizer support by @younesbelkada in #30178
Quantization improvements
Several improvements are done related to quantization: a new cache (the quantized KV cache) is added, offering the ability to convert the cache of generative models, further reducing the memory requirements.
Additionally, the documentation related to quantization is entirely redone with the aim of helping users choose which is the best quantization method.
- Quantized KV Cache by @zucchini-nlp in #30483
- Docs / Quantization: refactor quantization documentation by @younesbelkada in #30942
Examples
New instance segmentation examples are added by @qubvel
Notable improvements
As a notable improvement to the HF vision models that leverage backbones, we enable leveraging HF pretrained model weights as backbones, with the following API:
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation
config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True)
model = MaskFormerForInstanceSegmentation(config)
- Enable HF pretrained backbones by @amyeroberts in #31145
Additionally, we thank @Cyrilvallez for diving into our generate
method and greatly reducing the memory requirements.
- Reduce by 2 the memory requirement in
generate()
🔥🔥🔥 by @Cyrilvallez in #30536
Breaking changes
Remove ConversationalPipeline and Conversation object
Both the ConversationalPipeline and the Conversation object have been deprecated for a while, and are due for removal in 4.42, which is the upcoming version.
The TextGenerationPipeline
is recommended for this use-case, and now accepts inputs in the form of the OpenAI API.
- 🚨 Remove ConversationalPipeline and Conversation object by @Rocketknight1 in #31165
Remove an accidental duplicate softmax application in FLAVA's attention
Removes duplicate softmax application in FLAVA attention. Likely to have a small change on the outputs but flagging with 🚨 as it will change a bit.
- 🚨 FLAVA: Remove double softmax by @amyeroberts in #31322
Idefics2's ignore_index
attribute of the loss is updated to -100
- 🚨 [Idefics2] Update ignore index by @NielsRogge in #30898
out_indices from timm
being updated
Recent updates to timm changed the type of the attribute model.feature_info.out_indices
. Previously, out_indices
would reflect the input type of out_indices
on the create_model
call i.e. either tuple
or list
. Now, this value is always a tuple.
As list are more useful and consistent for us -- we cannot save tuples in configs, they must be converted to lists first -- we instead choose to cast out_indices
to always be a list.
This has the possibility of being a slight breaking change if users are creating models and relying on out_indices
on being a tuple. As this property only happens when a new model is created, and not if it's saved and reloaded (because of the config), then I think this has a low chance of having much of an impact.
- 🚨 out_indices always a list by @amyeroberts in #30941
datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.
- 🚨 Remove dataset with restrictive license by @echarlaix in #31452
Bugfixes and improvements
- Add fixed resize and pad strategy for object detection by @qubvel in #30742
- Enable dynamic resolution input for Swin Transformer and variants by @the-neural-networker in #30656
- Add TokenClassification for Mistral, Mixtral and Qwen2 by @josephenguehard in #29878
- FIX / Quantization: Fix Dockerfile build by @younesbelkada in #30890
- Add support for torch.compile dynamic shapes by @warner-benjamin in #30560
- LLaVa-Next: Update docs with batched inference by @zucchini-nlp in #30857
- DeformableDETR two stage support bfloat16 by @DonggeunYu in #30907
- add return_token_timestamps to WhisperProcessor by @kamilakesbi in #30812
- Fix num_hidden_layers in initialization of new model in Mamba by @SrGonao in #30403
- separate kwargs in processor (similar to #30193) by @Eric2i in #30905
- fix for custom pipeline configuration by @not-lain in #29004
- Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM by @ylacombe in #28706
- Fix a shape annotation and typos in
mamba
slow forward by @vasqu in #30691 tokenizer_class = "AutoTokenizer"
Llava Family by @ArthurZucker in #30912- Introduce configured_state arg for accelerator_config by @muellerzr in #29781
- Add torch.compile for Mistral by @zhenglongjiepheonix in #30642
- [docs] Spanish translation of model_memory_anatomy.md by @aaronjimv in #30885
- FIX / TST: Fix expected results on Mistral slow test (A10) by @younesbelkada in #30909
- PaliGemma - fix processor with no input text by @hiyouga in #30916
- CI: AMD MI300 tests fix by @mht-sharma in #30797
- Enforce saving at end of training if saving option chosen by @muellerzr in #30160
- fix: center_crop occasionally outputs off-by-one dimension matrix by @mattlbeck in #30934
- [Benchmark] Reuse
optimum-benchmark
by @ydshieh in #30615 - TST / Workflows: Get slack notifications for docker image build by @younesbelkada in #30891
- Fix swin embeddings interpolation by @amyeroberts in #30936
- Fix inhomogeneous shape error in example by @Zantares in #30434
- update ruff version by @ArthurZucker in #30932
- Update build ci image [push-ci-image] by @ArthurZucker in #30933)
- Update video-llava docs by @zucchini-nlp in #30935
- Fix low cpu mem usage tests by @SunMarc in #30808
- [doc] Add references to the fine-tuning blog and distil-whisper to Whisper. by @Vaibhavs10 in #30938
- Avoid extra chunk in speech recognition by @jonatanklosko in #29539
- [whisper] only trigger forced ids warning once by @sanchit-gandhi in #30966
- Paligemma - fix slow tests, add bf16 and f16 slow tests by @molbap in #30851
- Finally fix the missing new model failure CI report by @ydshieh in #30968
- legacy to init the slow tokenizer when converting from slow was wrong by @ArthurZucker in #30972
- Generation: get special tokens from model config by @zucchini-nlp in #30899
- [Whisper] Strip prompt before finding common subsequence by @sanchit-gandhi in #27836
- Fix link in Pipeline documentation by @junhl in #30948
- [Mistral and friends] Update MLP by @NielsRogge in #31057
- Paligemma causal attention mask by @molbap in #30967
- Update object detection with latest resize and pad strategies by @qubvel in #30955
- Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size by @kamilakesbi in #30637
- Push ci image by @ArthurZucker in #30982
- test_custom_4d_attention_mask skip with sliding window attn by @poedator in #30833
- Finish adding support for torch.compile dynamic shapes by @warner-benjamin in #30919
- FIX / Docs: Minor changes in quantization docs by @younesbelkada in #30985
- Fix accelerate failing tests by @SunMarc in #30836
- [tests] add
torch.use_deterministic_algorithms
for XPU by @faaany in #30774 - Add a check that warmup_setps is either 0 or >= 1 by @ymoslem in #30764
- Update 4
MptIntegrationTests
expected outputs by @ydshieh in #30989 - [Port] TensorFlow implementation of Mistral by @ariG23498 in #29708
- Remove deprecated properties in tokenization_nllb.py and tokenization_nllb_fast.py by @ymoslem in #29834
- Bugfix: WandbCallback uploads initial model checkpoint by @mgerstgrasser in #30897
- add prefix space ignored in llama #29625 by @itazap in #30964
- Fix training speed regression introduced by "optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by @kkoehncke in #26139)"
- Do not trigger autoconversion if local_files_only by @Wauplin in #31004
- pin
uv==0.1.45
by @ydshieh in #31006 - Perceiver interpolate position embedding by @g1y5x3 in #30979
- [tests] make
test_model_parallelism
device-agnostic by @faaany in #30844 - FIX / TST: Fix expected results on Mistral AWQ test by @SunMarc in #30971
- allow multi-gpu by @ydshieh in #31011
- Fix resume_download future warning by @Wauplin in #31007
- Quantization / TST: Fix remaining quantization tests by @younesbelkada in #31000
- save the list of new model failures by @ydshieh in #31013
- added interpolation for vitmae model in pytorch as well as tf. by @bhuvanmdev in #30732
- Add split special tokens by @itazap in #30772
- Paligemma- fix devices and dtype assignments by @molbap in #31008
- Redirect transformers_agents doc to agents by @aymeric-roucher in #31054
- unpin uv by @ydshieh in #31055
- Follow up: Fix link in dbrx.md by @eitanturok in #30514
- Update feature request label in template by @amyeroberts in #30940
- Fix quanto tests by @SunMarc in #31062
- Fix pad_to_max_length Whisper by @ylacombe in #30787
- skip
test_model_parallelism
for 2 model test classes by @ydshieh in #31067 - use
@main
by @ydshieh in #31065 - Remove
ninja
from docker image build by @ydshieh in #31080 - fix "piano" typo by @clinty in #31027
- Update quicktour.md to fix broken link to Glossary by @apalkk in #31072
- Remove redundant backend checks in training_args.py by @kevint324 in #30999
- fix from_pretrained in offline mode when model is preloaded in cache by @oOraph in #31010
- Remove float64 cast for OwlVit and OwlV2 to support MPS device by @qubvel in #31071
- Fix OWLv2 post_process_object_detection for multiple images by @qubvel in #31082
- Fix typo in trainer.py by @taslimisina in #31048
- [SuperPoint, PaliGemma] Update docs by @NielsRogge in #31025
- Fix failing tokenizer tests by @LysandreJik in #31083
- Watermark: fix tests by @zucchini-nlp in #30961
- Docs / PEFT: Add PEFT API documentation by @younesbelkada in #31078
- Render chat template tojson filter as unicode by @CISC in #31041
- FIX: Add
accelerate
as a hard requirement by @younesbelkada in #31090 - FIX / OPT: Fix OPT multi-GPU training for
OPTForQuestionAnswering
by @younesbelkada in #31092 - skip
test_multi_gpu_data_parallel_forward
forvit
anddeit
by @ydshieh in #31086 - Fix PretrainedConfig docstring with deprecated resume_download by @albertvillanova in #31014
- Fix DeepSpeed compatibility with weight_norm by @jonnyli1125 in #30881)
- TST: Fix instruct-blip tests by @younesbelkada in #31088
- Docs / Quantization: Redirect deleted page by @younesbelkada in #31063
- Deprecate low use models by @amyeroberts in #30781
- Quantized KV cache: update quanto by @zucchini-nlp in #31052
- FEAT: Add mistral v3 conversion script by @younesbelkada in #30981
- Use
HF_HUB_OFFLINE
+ fix has_file in offline mode by @Wauplin in #31016 - Improve
transformers-cli env
reporting by @statelesshz in #31003 - Fix env.py in cases where torch is not present by @Rocketknight1 in #31113
- Fix faulty rstrip in module loading by @Rocketknight1 in #31108
- Rm maintainer + migrate by @muellerzr in #31089
- Fix nightly circleci by @ydshieh in #31114
- FIX / Docs: Fix GPTQ expected number of bits by @younesbelkada in #31111
- Add VLM generation default contributor by @gante in #31115
- Add on_optimizer_step to callback options by @dhruvbpai in #31095
- Cleanup docker build by @ydshieh in #31119
- FIX / Quantization: Add extra validation for bnb config by @younesbelkada in #31135
- fix get_scheduler when name is warmup_stable_decay by @zspo in #31128
- Docs / Quantization: Replace all occurences of
load_in_8bit
with bnb config by @younesbelkada in #31136 - Workflow: Remove
IS_GITHUB_CI
by @younesbelkada in #31147 - helper by @ArthurZucker in #31152
- pytest -rsfE by @ydshieh in #31140
- Fix quantized cache output by @SunMarc in #31143
- Update sam.md by @asifajrof in #31130
- Quantization: Enhance bnb error message by @younesbelkada in #31160
- [trainer] add sanity evaluation option by @SunMarc in #31146
- Add streaming, various fixes by @aymeric-roucher in #30838
- Added description of quantization_config by @vamsivallepu in #31133
- Fix typo: use_safetenstors to use_safetensors by @CharlesCNorton in #31184
- Remove copied froms for deprecated models by @amyeroberts in #31153
- Token healing by @ahmed-moubtahij in #30081
- [
GemmaModel
] fix small typo by @ArthurZucker in #31202 - Fix Cannot convert [array()] to EagerTensor of dtype int64 by @pavi-ninjaac in #31109
- Ignore non-causal mask in more cases with SDPA by @fxmarty in #30138
- SlidingWindowCache: reduce differences to other Cache classes by @gante in #30970
- Fix
test_compile_static_cache
by @ydshieh in #30991 - fix the get_size_with_aspect_ratio in max_size situation by @SangbumChoi in #30902
- Fix typo in utils by @Bojun-Feng in #31169
- Rename sanity_evaluation to eval_on_start by @Qubitium in #31192
- Wrong translation FR : Contents = Contenu by @jadechoghari in #31186
- Cohere: Fix copied from by @younesbelkada in #31213
- Set greater_is_better to False if metric_for_best_model ends with "loss" by @miivanov90 in #31142
- Fix GPU OOM for
mistral.py::Mask4DTestHard
by @ydshieh in #31212 - [docs] Spanish translation of tokenizer_summary.md by @aaronjimv in #31154
- Pass device in Logits Processor's init by @zucchini-nlp in #29804
- Fix sentence fragment within test comments by @DomHudson in #31218
- fix(PatchTST): Wrong dropout used for PretainHead by @maxstrobel in #31117
- Video-LLaVa: handle any number of frames by @zucchini-nlp in #31221
- Add dynamic resolution input/interpolate position embedding to deit by @p-kris10 in #31131
- fix bf16 issue in text classification pipeline by @chujiezheng in #30996
- Fix pipeline tests - torch imports by @amyeroberts in #31227
- Add new line switch before logging ***** Running {description} ***** by @jacklanda in #31225
- add no split modules for xlmrobertaxl by @ManuelFay in #31223
- Fix
MistralIntegrationTest
by @ydshieh in #31231 - Blip: Deprecate
BlipModel
by @younesbelkada in #31235 - Move out common backbone config param validation by @amyeroberts in #31144
- Upload (daily) CI results to Hub by @ydshieh in #31168
- Specify dtype=torch.bool to avoid xla error by @ysulsky in #31191
- Fixing
name 'torch' is not defined
inbitsandbytes
integration by @jamesbraza in #31243 - Benchmark GitHub Actions workflow by @ydshieh in #31163
- Early labels validation by @amyeroberts in #31240
- doc: add info about wav2vec2 bert in older wav2vec2 models. by @Vaibhavs10 in #31120
- enable deterministic mode for npu by @statelesshz in #31253
- Add missing Flaubert tokenizer tests by @bastrob in #30492
- Fix circular reference issue in CLIPTokenizerFast by @dhaivat1729 in #31075
- Add condition to
benchmark
job inpush-important-models.yml
by @ydshieh in #31259 - Skip failing JetMOE generation tests by @amyeroberts in #31266
- no need for explicit EXTRA_TOKENS in processing_paligemma.py by @grahamannett in #31022
- [
SwitchTransformer
] Significant performance improvement on MoE blocks by @ranggihwang in #31173 - fix loading special_tokens_map_file by @ZhiyuanChen in #31012
- Make mamba use cache by @zucchini-nlp in #31116
- Generation: fix handling of special tokens by @zucchini-nlp in #31254
- Switch from
cached_download
tohf_hub_download
in remaining occurrences by @Wauplin in #31284 - fix:
str
should be used notint
when setting env variables by @statelesshz in #31272 - Fix _save_tpu: use _maybe_convert_to_cpu instead of to cpu. by @baoleai in #31264
- fix accelerate tests for roberta xl by @SunMarc in #31288
- Enable dynamic resolution input for Beit by @OmarManzoor in #31053
- Mark MobileNetV1ModelTest::test_batching_equivalence as flaky by @amyeroberts in #31258
- Pipeline VQA: Add support for list of images and questions as pipeline input by @BlacCod in #31217
- Fix SwinLayer / DonutSwinLayer / ClapAudioLayer attention mask device by @gorodnitskiy in #31295
- Update text-to-speech.md by @jaguaryang in #31269
- Fixed Wav2Vec2ProcessorWithLM decoding error by @karicotiza in #31188
- Fix jetmoe model by @Cyrilvallez in #31279
- Extend save_pretrained to offloaded models by @blbadger in #27412
- Implement JSON dump conversion for torch_dtype in TrainingArguments by @junrae6454 in #31224
- interpolation added for TVP. by @bhuvanmdev in #30863
- Rename test_model_common_attributes -> test_model_get_set_embeddings by @amyeroberts in #31321
- Use unused prepare_img() function in dinov2 conversion script by @IbrahimAmin1 in #31335
- docs: fix style by @imba-tjd in #31340
- Fix paligemma inverted mask by @molbap in #31207
- docs/zh: fix style by @imba-tjd in #31334
- Decorators for deprecation and named arguments validation by @qubvel in #30799
- Improve error msg when using bitsandbytes by @SunMarc in #31350
- Fix Cohere CI by @ydshieh in #31263
- Fix gradio tool demos by @aymeric-roucher in #31230
- Fast image processor by @amyeroberts in #28847
- Add french translation of AutoBackbone by @jadechoghari in #31300
- Add support to declare imports for code agent by @JasonZhu1313 in #31355
- Fix idefics cache by @zucchini-nlp in #31377
- [Bug Fix] Renamed loss to losses to suppress UnboundLocalError by @her0e1c1 in #31365
- docs: fix broken link by @imba-tjd in #31370
- backbone_utils - fix relative import by @amyeroberts in #31382
- README underline between badges fix by @novialriptide in #31376
- Update comment in modeling_utils.py by @inf3rnus in #31299
- Use huggingface_hub helper function to split state dict by @SunMarc in #31091
- Change JSON serialization to custom json.dumps by @junrae6454 in #31100
- feat(ci): add trufflehog secrets detection by @McPatate in #31344
- [QoL fix] [Image processing] Add warning on assumption of channel dim and avoid infering when inputs are PIL.Image by @aliencaocao in #31364
- Make chat templates part of ProcessorMixin by @Rocketknight1 in #30744
- add initial design for uniform processors + align model by @molbap in #31197
- Add missing French translation of tutoriel_pipeline.md by @jadechoghari in #31396
- Temporarily pin datasets upper version to fix CI by @albertvillanova in #31407
- Support Clip QKV for MPT by @akakakakakaa in #31307
- Pin datasets<2.20.0 for examples by @amyeroberts in #31417
- Fix MusicGen SDPA by @ylacombe in #31208
- Set seed for M4T retain grad test by @ylacombe in #31419
- Fix SpeechT5
decoder_attention_mask
shape by @ylacombe in #28071 - Change potential
inputs_embeds
paddinglogger.warning
tologger.warning_once
by @naimenz in #31411 - Remove duplicate image processor in auto map by @amyeroberts in #31383
- Install the tensorflow example requirements in docker by @amyeroberts in #31428
- Remove empty create_and_test_config_common_properties tests by @amyeroberts in #31359
- xpu: support xpu backend from stock pytorch (>=2.4) by @dvrogozh in #31238
- Musicgen special tokens in tensors by @zucchini-nlp in #31420
- Fix Bark logits processors device misplacement by @ylacombe in #31416
- Rename misnamed image processor test files by @amyeroberts in #31430
- Generate: fix
tokenizer
being popped twice by @gante in #31427 - [tests] make
TestDeepSpeedModelZoo
device-agnostic by @faaany in #31402 - Support multiple validation datasets when
dataloader_persistent_workers=True
by @bastienlc in #30627 - Pass datasets trust_remote_code by @albertvillanova in #31406
- simple fix by @tokenizer-decode in #31456
- Fix typing errors in
Qwen2ForTokenClassification
by @kevinhu in #31440 - Agents: Improve python interpreter by @aymeric-roucher in #31409
- Donut: fix
generate
call from local path by @gante in #31470 - Make "tool_use" the default chat template key when tools are passed by @Rocketknight1 in #31429
- Fix single letter stop strings by @Rocketknight1 in #31448
- Update chat template docs and bump Jinja version by @Rocketknight1 in #31455
- Improve
PreTrainedTokenizerFast
loading time when there are many added tokens by @ydshieh in #31404 - Fix documentation typos by @qgallouedec in #31476
- Give more useful
metric_for_best_model
errors by @tomaarsen in #31450 - Update perf_train_gpu_many.md by @remyleone in #31451
- [
GPT2
] Add SDPA support by @vasqu in #31172 - Fix autocast incompatibility in RecurrentGemma by @xplip in #30832
- Use self.config_tester.run_common_tests() by @amyeroberts in #31431
- [tests] rename
test_config_object
totest_ds_config_object
by @faaany in #31403 - Docs / AQLM: Clarify
torch.compile
support for AQLM by @younesbelkada in #31473 - Mamba: add generative tests by @gante in #31478
- Update object_detection.md by @jajupmochi in #31488
- Add docs on zeroshot image classification prompt templates by @aliencaocao in #31343
- auto-detect device when no device is passed to pipeline by @faaany in #31398
- Fix typo: pas_token_id by @ftnext in #30894
- Fix
wandb
integration withSetFit
model by @timothepearce in #30021 - Consider inheritance in type checking for tensors by @daemyung in #31378
- Add valid columns check in _remove_unused_columns method by @arthasking123 in #31466
- Fix a teeny-tiny typo in
tokenization_utils_base.py
's docstring by @sadra-barikbin in #31510 - Fix mismatched ` in doc & other common typos by @jhwei in #31516
- RWKV: enable generation tests by @gante in #31490
- unskip 2 tests in cohere by @ydshieh in #31517
- Revive Nightly/Past CI by @ydshieh in #31159
- Deprecate legacy cache + use cache position by @zucchini-nlp in #31491
- SPLIT PR: add user defined symbols and control symbols by @itazap in #31305
- Removed torch.cuda.empty_cache from train loop. by @FoamoftheSea in #31530
- Update mask_generation.md by @nicholicaron in #31543
- Correct @is_flaky test decoration by @qubvel in #31480
- Add implementation of
spectrogram_batch
by @ravenouse in #27159 - chore: fix typos by @xiaoxianBoy in #31559
- Update git templates by @ArthurZucker in #31539
- Fix the error caused by incorrect use of logger in pipeline by @lanyun1103 in #31565
- Fix bug about add_special_tokens and so on by @hiroshi-matsuda-rit in #31496
- Add Jinja as a requirement with the right version cutoff by @Rocketknight1 in #31536
- Fix doc typo in
TrainingArguments
by @qgallouedec in #31503 - Fix is_torch_xpu_available for torch < 2.3 by @amyeroberts in #31573
- Added version constraint on numpy for version <2.0 by @Resteklicken in #31569
- Siglip: add
_no_split_module
by @zucchini-nlp in #31566 - fix output data type of image classification by @jiqing-feng in #31444
- add preprocessing_num_workers to run_classification.py by @jiahuanluo in #31586
- Improve error message for mismatched copies in code blocks by @molbap in #31535
- Add ViTImageProcessorFast to tests by @amyeroberts in #31424
- docs: move translations to
i18n
by @SauravMaheshkar in #31584 - Removed unnecessary
self.projection
call inVivitTubeletEmbeddings
by @v-iashin in #31632 - [
GPT-NeoX
] Add SDPA support by @vasqu in #31031 - Update RT-DETR code snippet by @qubvel in #31631
- Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP by @younesbelkada in #31161
- Fix RT-DETR inference with float16 and bfloat16 by @qubvel in #31639
- Fix paligemma detection inference by @molbap in #31587
- Generate: fix assisted generation with
past_key_values
passed as kwargs by @gante in #31644 - Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference by @aliencaocao in #31589
- Skip tests properly by @amyeroberts in #31308
- Generation: past kv can be None by @zucchini-nlp in #31051
- Fix ONNX exports for Optimum compatible models by @merveenoyan in #31311
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @josephenguehard
- Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878)
- @vasqu
- @ariG23498
- [Port] TensorFlow implementation of Mistral (#29708)
- @bhuvanmdev
- @SangbumChoi
- @Cyrilvallez
- @ravenouse
- Add implementation of
spectrogram_batch
(#27159)
- Add implementation of