Can optimum.bettertransformer supports LLAVA model? #1592

xiaovhua · 2023-12-13T09:08:35Z

System Info

Local NVIDIA env:
(llava) xuyang@nobisuke:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

Python=3.10.4
Torch==2.0.1+cu117

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model)

Expected behavior

Recently, we sought to apply the optimum.bettertransformer in LLAVA for fine-tuning. The code run successfully and we found that the memory has decreased significantly.

However, in https://huggingface.co/docs/optimum/v1.15.0/bettertransformer/overview, we found that LLAVA is not in the support list.

Therefore, we want to confirm that can bettertransformer employ for pre-training or fine-tuning in LLAVA now?

The text was updated successfully, but these errors were encountered:

fxmarty · 2023-12-13T12:37:13Z

Hi @xiaovhua, in Transformers 4.36 release we started adding native torch.nn.functional.scaled_dot_product_attention support for decoder models (see https://github.com/huggingface/transformers/releases/tag/v4.36.0 & https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention).

As for decoder models we do not use nested tensors and simply rely on SDPA, I will not be adding support for more models in optimum.bettertransformer and am instead looking to increase SDPA coverage in Transformers.

I opened the issue huggingface/transformers#28005 in Transformers to track the support. Please continue the discussion there!

xiaovhua added the bug Something isn't working label Dec 13, 2023

fxmarty mentioned this issue Dec 13, 2023

Open to contribution: adding torch.nn.functional.scaled_dot_product_attention support for more architectures huggingface/transformers#28005

Open

6 tasks

fxmarty closed this as completed Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can optimum.bettertransformer supports LLAVA model? #1592

Can optimum.bettertransformer supports LLAVA model? #1592

xiaovhua commented Dec 13, 2023 •

edited

fxmarty commented Dec 13, 2023

Can optimum.bettertransformer supports LLAVA model? #1592

Can optimum.bettertransformer supports LLAVA model? #1592

Comments

xiaovhua commented Dec 13, 2023 • edited

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

fxmarty commented Dec 13, 2023

xiaovhua commented Dec 13, 2023 •

edited