Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoftQ does not seem to quantify the base model #1525

Closed
4 tasks done
Mr-KenLee opened this issue Mar 4, 2024 · 4 comments
Closed
4 tasks done

LoftQ does not seem to quantify the base model #1525

Mr-KenLee opened this issue Mar 4, 2024 · 4 comments

Comments

@Mr-KenLee
Copy link

System Info

transformers version: 4.37.2
Platform: Ubuntu 18.04.6 LTS
GPU: RTX GeForce 3090 x 2
Python version: 3.10.13
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: 0.26.1
Accelerate config: not found
PyTorch version (GPU?): 2.2.1+cu121 (True)
Tensorflow version (GPU?): not found
Flax version (CPU?/GPU?/TPU?): not found
Jax version: not found
JaxLib version: not found
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no
Peft version: 0.9.0
Trl version: 0.7.11

Who can help?

@pacman100 @stevhliu

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I am attempting to fine-tune my model (Baichuan2-chat-7B) using LoftQ, but the results seem to differ from my expectations. Due to the computational resources available in our lab, I am using two 3090 GPUs for fine-tuning. I followed the method described at https://huggingface.co/docs/peft/en/developer_guides/lora#loftq:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoftQConfig, LoraConfig, get_peft_model

base_model = AutoModelForCausalLM.from_pretrained("Baichuan2-7B-Chat", trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16)  # don't quantize here
loftq_config = LoftQConfig(loftq_bits=4)           # set 4bit quantization
lora_config = LoraConfig(
    init_lora_weights="loftq",
    loftq_config=loftq_config,
    r=16,
    lora_alpha=8,
    target_modules=["W_pack"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
peft_model = get_peft_model(base_model, lora_config)

However, I have found that LoftQ does not seem to quantize my LLM as expected. The actual usage of GPU memory is roughly the same as when directly loading the model (about 8GB on each GPU), with only an additional 2GB for LoRA used.

Could you please help me understand why this is happening? Is there something I am doing incorrectly?

Additionally, there is a discussion thread with a similar issue encountered by someone using the Llama-2-7b model:
https://discuss.huggingface.co/t/fine-tuning-for-llama2-based-model-with-loftq-quantization/66737/7

Expected behavior

After setting LoftQ,the usage of GPU memory could be able to drop significantly, comparable to the effect of using load_in_4bit.

@BenjaminBossan
Copy link
Member

Thanks a lot for reporting. Indeed, what is written in the documentation about how to initialize with LoftQ is incorrect, my bad. The correct way is unfortunately a bit more complicated. Please follow the instructions here:

https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning

In the meantime, I'll be working on updating the docs and unit tests.

@Mr-KenLee
Copy link
Author

Thank you very much! I will try the script immediately ~

BenjaminBossan added a commit to BenjaminBossan/peft that referenced this issue Mar 4, 2024
Relates to huggingface#1525

Unfortunately, the docs I wrote about how to use LoftQ were incorrect,
based on a misunderstanding I had. In reality, it is quite a bit more
involved to get LoftQ working, requiring a complete roundtrip first
loading a non-quantized model with LoftQ, saving the LoRA weights and
the modified base model, loading the just stored base model again but
this time with quantization, and finally loading the LoftQ-initialized
adapter on top. The docs now link to the example which demosthenes how
to move through these steps.
Copy link

github-actions bot commented Apr 3, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@BenjaminBossan
Copy link
Member

Note that the docs are updated and should now be correct. Also, in PEFT v0.10.0, we released a more convenient way to initialize with LoftQ (docs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants