Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can optimum.bettertransformer supports LLAVA model? #1592

Closed
4 tasks
xiaovhua opened this issue Dec 13, 2023 · 1 comment
Closed
4 tasks

Can optimum.bettertransformer supports LLAVA model? #1592

xiaovhua opened this issue Dec 13, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@xiaovhua
Copy link

xiaovhua commented Dec 13, 2023

System Info

Local NVIDIA env:
(llava) xuyang@nobisuke:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

Python=3.10.4
Torch==2.0.1+cu117

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model)

Expected behavior

Recently, we sought to apply the optimum.bettertransformer in LLAVA for fine-tuning. The code run successfully and we found that the memory has decreased significantly.

However, in https://huggingface.co/docs/optimum/v1.15.0/bettertransformer/overview, we found that LLAVA is not in the support list.

Therefore, we want to confirm that can bettertransformer employ for pre-training or fine-tuning in LLAVA now?

@fxmarty
Copy link
Collaborator

fxmarty commented Dec 13, 2023

Hi @xiaovhua, in Transformers 4.36 release we started adding native torch.nn.functional.scaled_dot_product_attention support for decoder models (see https://github.com/huggingface/transformers/releases/tag/v4.36.0 & https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention).

As for decoder models we do not use nested tensors and simply rely on SDPA, I will not be adding support for more models in optimum.bettertransformer and am instead looking to increase SDPA coverage in Transformers.

I opened the issue huggingface/transformers#28005 in Transformers to track the support. Please continue the discussion there!

@fxmarty fxmarty closed this as completed Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants