Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(hugging-face): update dependency transformers to v4.32.0 #10581

Merged
merged 1 commit into from
Aug 23, 2023

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Aug 22, 2023

Mend Renovate

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
transformers 4.31.0 -> 4.32.0 age adoption passing confidence

⚠ Dependency Lookup Warnings ⚠

Warnings were logged while processing this repo. Please check the Dependency Dashboard for more information.


Release Notes

huggingface/transformers (transformers)

v4.32.0: IDEFICS, GPTQ Quantization

Compare Source

IDEFICS

The IDEFICS model was proposed in OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

IDEFICS is the first open state-of-the-art visual language model at the 80B scale!

The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.

Blogpost: hf.co/blog/idefics
Playground: HuggingFaceM4/idefics_playground

image

MPT

MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.

GPTQ Integration

GPTQ quantization is now supported in Transformers, through the optimum library. The backend relies on the auto_gptq library, from which we use the GPTQ and QuantLinear classes.

See below for an example of the API, quantizing a model using the new GPTQConfig configuration utility.

from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer,  group_size=128, desc_act=False)

### works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)

Most models under TheBloke namespace with the suffix GPTQ should be supported, for example, to load a GPTQ quantized model on TheBloke/Llama-2-13B-chat-GPTQ simply run (after installing latest optimum and auto-gptq libraries):

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration

Pipelines

A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into transformers: SpeechT5ForTextToSpeech, MusicGen and Bark.

See below for an example:

from transformers import pipeline

classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")

audio = output["audio"]
sampling_rate = output["sampling_rate"]

Classifier-Free Guidance sampling

Classifier-Free Guidance sampling is a generation technique developed by EleutherAI, announced in this paper. With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its docs for usage instructions.

Task guides

A new task guide going into Visual Question Answering has been added to Transformers.

Model deprecation

We continue the deprecation of models that was introduced in https://github.com/huggingface/transformers/pull/24787.

By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.

Translation Efforts

There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.

If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.

Explicit input data format for image processing

Addition of input_data_format argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.

import numpy as np
from transformers import ViTImageProcessor

img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")

Documentation clarification about efficient inference through torch.scaled_dot_product_attention & Flash Attention

Users are not aware that it is possible to force dispatch torch.scaled_dot_product_attention method from torch to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.

In a nutshell, one can just run:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")

### convert the model to BetterTransformer
model.to_bettertransformer()

input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
    outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

to enable Flash-attenion in their model. However, this feature does not support padding yet.

FSDP and DeepSpeed Changes

Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting.
Users no longer have to pass fsdp_transformer_layer_cls_to_wrap as the code now use _no_split_modules by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.

Breaking changes

Default optimizer in the Trainer class

The default optimizer in the Trainer class has been updated to be adam_torch rather than our own adam_hf, as the official Torch optimizer is more robust and fixes some issues.

In order to keep the old behavior, ensure that you pass "adamw_hf" as the optim value in your TrainingArguments.

ViVit and EfficientNet rescale bugfix

There was an issue with the definition of the rescale of values with ViVit and EfficientNet. These have been fixed, but will result in different model outputs for both of these models. To understand the change and see what needs to be done to obtain previous results, please take a look at the following PR.

Removing softmax for the image classification EfficientNet class

The EfficientNetForImageClassification model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.

In order to obtain previous results, pass the model logits through a softmax.

Bug fixes with SPM models

Some SPM models had issues with their management of added tokens. Namely the Llama and T5, among others, were behaving incorrectly. These have been updated in https://github.com/huggingface/transformers/pull/25224.

An option to obtain the previous behavior was added through the legacy flag, as explained in the PR linked above.

Bugfixes and improvements


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

@renovate renovate bot requested a review from hongbo-miao as a code owner August 22, 2023 13:46
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@renovate renovate bot temporarily deployed to test August 22, 2023 13:46 Inactive
@sonarcloud
Copy link

sonarcloud bot commented Aug 22, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@hongbo-miao hongbo-miao merged commit 1582181 into main Aug 23, 2023
76 of 77 checks passed
@hongbo-miao hongbo-miao deleted the renovate/hugging-face-transformers-4.x branch August 23, 2023 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant