Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peft v0.10 takes up too much GPU memory than v0.3.0 #1613

Closed
2 of 4 tasks
azuryl opened this issue Apr 2, 2024 · 14 comments
Closed
2 of 4 tasks

peft v0.10 takes up too much GPU memory than v0.3.0 #1613

azuryl opened this issue Apr 2, 2024 · 14 comments

Comments

@azuryl
Copy link

azuryl commented Apr 2, 2024

System Info

peft v0.10

Who can help?

@pacman100 @younesbelkada @BenjaminBossan @sayakpaul

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

in 24G GPU even bloom 3B Lora , the peft v0.10(https://github.com/huggingface/peft/blob/v0.10.0/src/peft/tuners/lora/layer.py) is out of GPU memory,but the v0.3.0 (https://github.com/huggingface/peft/blob/v0.3.0/src/peft/tuners/lora.py) has no problem

0%|                                                                                                                                                                                   | 0/1554 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/post_training.py", line 262, in <module>
    main(args)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/post_training.py", line 213, in main
    trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/transformers/trainer.py", line 2767, in compute_loss
    outputs = model(**inputs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/peft/peft_model.py", line 1129, in forward
    return self.base_model(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/peft/tuners/tuners_utils.py", line 161, in forward
    return self.model.forward(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/models/hf_baichuan/baichuan7B/modeling_baichuan_7B.py", line 610, in forward
    outputs = self.model(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/models/hf_baichuan/baichuan7B/modeling_baichuan_7B.py", line 499, in forward
    layer_outputs = decoder_layer(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/models/hf_baichuan/baichuan7B/modeling_baichuan_7B.py", line 302, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/models/hf_baichuan/baichuan7B/modeling_baichuan_7B.py", line 193, in forward
    proj = self.W_pack(hidden_states)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/peft/tuners/lora/layer.py", line 497, in forward
    result = self.base_layer(x, *args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 24.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 19.38 MiB is free. Including non-PyTorch memory, this process has 23.62 GiB memory in use. Of the allocated memory 23.38 GiB is allocated by PyTorch, and 48.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Expected behavior

The updated version should not perform worse than the old version.
peft v0.10 should not out of GPU memory in 24G GPU such NVIDIA TITANRTX 24G for Bloom3b, llama 3b.....

@azuryl azuryl changed the title peft v0.10 akes up too much GPU memory than v0.3.0 peft v0.10 takes up too much GPU memory than v0.3.0 Apr 2, 2024
@BenjaminBossan
Copy link
Member

For us to investigate, you have to give us more information. Ideally, you can share the script that produces the error. If that can't be done, share as many details as you can (model, data, hyper params, lib versions, how much memory the old version uses, etc.)

@azuryl
Copy link
Author

azuryl commented Apr 2, 2024

For us to investigate, you have to give us more information. Ideally, you can share the script that produces the error. If that can't be done, share as many details as you can (model, data, hyper params, lib versions, how much memory the old version uses, etc.)

https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#1-pruning-discovery-stage--estimation-stage

CUDA_VISIBLE_DEVICES=X python post_training.py --prune_model prune_log/PATH_TO_PRUNE_MODEL/pytorch_model.bin \
      --data_path yahma/alpaca-cleaned \
      --lora_r 8 \
      --num_epochs 2 \ 
      --learning_rate 1e-4 \ 
      --batch_size 64 \
      --output_dir tune_log/PATH_TO_SAVE_TUNE_MODEL \ 
      --wandb_project llama_tune

if you use v0.3.0 there is no problem if peft v0.10 will out of memory

@BenjaminBossan
Copy link
Member

Thanks for the link. I took a look at the LLM-Pruner repository. From what I can tell, this package does not use PEFT. Instead, it vendors its own PEFT code, which looks like a copy of our code base from a year ago or so + some of their modifications. Checking their requirements, PEFT is not mentioned there.

This means that the PEFT version should not have any influence on LLM-Pruner. In fact, you should be able to uninstall PEFT from your environment and it should still run. Why you get different memory usage when you upgrade PEFT, I don't know, but my best guess is that by upgrading PEFT, you also upgraded other PEFT dependencies, like transformers, and those are the issue.

Also note that LLM-Pruner code hasn't been updated for more than half a year, so is quite outdated by now. I would recommend using package version from back then when you want to run LLM-Pruner.

@azuryl
Copy link
Author

azuryl commented Apr 9, 2024

I have replaced the LLM-pruner's PEFT by V0.10 and not update transfers, I guess is v0.3.0 just for lora, but v0.10 support another module such as loha,,ia3 etc , v0.10 has base layer https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py#L39 v0.3.0 has not.

Thanks for the link. I took a look at the LLM-Pruner repository. From what I can tell, this package does not use PEFT. Instead, it vendors its own PEFT code, which looks like a copy of our code base from a year ago or so + some of their modifications. Checking their requirements, PEFT is not mentioned there.

This means that the PEFT version should not have any influence on LLM-Pruner. In fact, you should be able to uninstall PEFT from your environment and it should still run. Why you get different memory usage when you upgrade PEFT, I don't know, but my best guess is that by upgrading PEFT, you also upgraded other PEFT dependencies, like transformers, and those are the issue.

Also note that LLM-Pruner code hasn't been updated for more than half a year, so is quite outdated by now. I would recommend using package version from back then when you want to run LLM-Pruner.

@BenjaminBossan
Copy link
Member

I have replaced the LLM-pruner's PEFT by V0.10

You mean you have replaced their vendored PEFT code with PEFT v0.10.0? I wouldn't expect that to work, because the rest of their code probably makes assumptions about PEFT that don't hold up anymore. Maybe you can create a PR on their repo for them to use a more up-to-date PEFT version, but that probably requires more adjustments on their side and I'm not sure if the repo is still being actively developed.

As it stands, I would not expect code from > 6 months ago with their own custom PEFT code to work with the current PEFT version out of the box.

@azuryl
Copy link
Author

azuryl commented Apr 9, 2024

I have replaced the LLM-pruner's PEFT by V0.10

You mean you have replaced their vendored PEFT code with PEFT v0.10.0? I wouldn't expect that to work, because the rest of their code probably makes assumptions about PEFT that don't hold up anymore. Maybe you can create a PR on their repo for them to use a more up-to-date PEFT version, but that probably requires more adjustments on their side and I'm not sure if the repo is still being actively developed.

As it stands, I would not expect code from > 6 months ago with their own custom PEFT code to work with the current PEFT version out of the box.

yes I have replaced their PEFT v0.10.0 and not modify, and work in 48G A6000GPU success but out of memory in 24G GPU. they just use PEFT for Lora and not modify PEFT。
the difference is https://github.com/horseee/LLM-Pruner/blob/main/post_training.py#L33 use pruned_dict = torch.load(args.prune_model, map_location='cpu') load pruned model.

their PEFT is V0.3.0
https://github.com/huggingface/peft/blob/v0.3.0/src/peft/tuners/lora.py
https://github.com/horseee/LLM-Pruner/blob/main/LLMPruner/peft/tuners/lora.py

and you can try if the orginal LLamA 7-B can be run in 24G GPU by V0.10 so there is no relationship with pruning

@azuryl
Copy link
Author

azuryl commented Apr 12, 2024

hello @BenjaminBossan

I want to porting dora module in v0.3.0
could you tell me how to realize
weight = self.get_base_layer().weight https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py#L185
in v0.3.0
because v0.3.0 have no base layer.

Thank you

@BenjaminBossan
Copy link
Member

Try self.weight instead.

@azuryl
Copy link
Author

azuryl commented Apr 17, 2024

Try self.weight instead.
@BenjaminBossan
I have port Dora to v0.3.0 and can runing on 24G GPU now , thank you

and I have a question how can I set use_dora

in lora.py: I added

use_dora: bool = field(
        default=False,
        metadata={
            "help": (
                "Enable 'Weight-Decomposed Low-Rank Adaptation' (DoRA). This technique decomposes the updates of the "
                "weights into two parts, magnitude and direction. Direction is handled by normal LoRA, whereas the "
                "magnitude is handled by a separate learnable parameter. This can improve the performance of LoRA, "
                "especially at low ranks. Right now, DoRA only supports linear and Conv2D layers. DoRA introduces a bigger"
                "overhead than pure LoRA, so it is recommended to merge weights for inference. For more information, "
                "see  https://arxiv.org/abs/2402.09353."
            )
        },
    )#jliu
class Linear(nn.Linear, LoraLayer):
    # Lora implemented in a dense layer
    def __init__(
        self,
        adapter_name: str,
        in_features: int,
        out_features: int,
        r: int = 0,
        lora_alpha: int = 1,
        lora_dropout: float = 0.0,
        fan_in_fan_out: bool = False,  # Set this to True if the layer to replace stores weight like (fan_in, fan_out)
        use_rslora: bool = False
        use_dora: bool = False,
        **kwargs,
    ):

in my tuning.py
set

config = LoraConfig(
        r=args.lora_r,
        lora_alpha=args.lora_alpha,
        target_modules=args.lora_target_modules.split(","),
        lora_dropout=args.lora_dropout,
        bias="none",
        task_type="CAUSAL_LM",
        use_dora=True
        use_rslora=args.use_rslora,
    )

but lora.py can not receive the use_dora flag at class Linear(nn.Linear, LoraLayer). I have to manually set the use_dora=True in class Linear init.

@BenjaminBossan
Copy link
Member

@azuryl The question you have is not related to the initial issue, could you please move it to our discussion page? Also, please clarify what you mean by:

but lora.py can not receive the use_dora flag at class Linear(nn.Linear, LoraLayer). I have to manually set the use_dora=True in class Linear init.

There should not be any need to modify the PEFT code just to use DoRA.

@azuryl
Copy link
Author

azuryl commented Apr 18, 2024

@azuryl The question you have is not related to the initial issue, could you please move it to our discussion page? Also, please clarify what you mean by:

but lora.py can not receive the use_dora flag at class Linear(nn.Linear, LoraLayer). I have to manually set the use_dora=True in class Linear init.

There should not be any need to modify the PEFT code just to use DoRA.

I mean modify the v0.3.0 version since it does have not Dora

@BenjaminBossan
Copy link
Member

I mean modify the v0.3.0 version since it does have not Dora

I see. I cannot give you a complete recipe to backport DoRA to PEFT v0.3.0. If you paste the complete error message, I might be able to give some tips. Make sure to update the update_layer method too when you add new arguments.

@azuryl
Copy link
Author

azuryl commented Apr 29, 2024

I mean modify the v0.3.0 version since it does have not Dora

I see. I cannot give you a complete recipe to backport DoRA to PEFT v0.3.0. If you paste the complete error message, I might be able to give some tips. Make sure to update the update_layer method too when you add new arguments.

in 0.30 version lora.py

def _find_and_replace(self, adapter_name):
        lora_config = self.peft_config[adapter_name]
        loaded_in_8bit = getattr(self.model, "is_loaded_in_8bit", False)
        if loaded_in_8bit and not is_bnb_available():
            raise ImportError(
                "To use Lora with 8-bit quantization, please install the `bitsandbytes` package. "
                "You can install it with `pip install bitsandbytes`."
            )
        is_target_modules_in_base_model = False
        kwargs = {
            "r": lora_config.r,
            "lora_alpha": lora_config.lora_alpha,
            "lora_dropout": lora_config.lora_dropout,
            "fan_in_fan_out": lora_config.fan_in_fan_out,
            "init_lora_weights": lora_config.init_lora_weights,
+           "use_dora": lora_config.use_dora,
+           "use_rslora": lora_config.use_rslora,

        
class LoraConfig(PeftConfig):
    """
+   "use_dora": lora_config.use_dora,
+   "use_rslora": lora_config.use_rslora,

It can be run now

@BenjaminBossan
Copy link
Member

Okay, great, good look with your further training. I'll close the issue then, feel free to re-open if something else comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants