peft v0.10 takes up too much GPU memory than v0.3.0 #1613

azuryl · 2024-04-02T08:28:44Z

System Info

peft v0.10

Who can help?

@pacman100 @younesbelkada @BenjaminBossan @sayakpaul

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

in 24G GPU even bloom 3B Lora , the peft v0.10(https://github.com/huggingface/peft/blob/v0.10.0/src/peft/tuners/lora/layer.py) is out of GPU memory,but the v0.3.0 (https://github.com/huggingface/peft/blob/v0.3.0/src/peft/tuners/lora.py) has no problem

0%|                                                                                                                                                                                   | 0/1554 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/post_training.py", line 262, in <module>
    main(args)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/post_training.py", line 213, in main
    trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/transformers/trainer.py", line 2767, in compute_loss
    outputs = model(**inputs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/peft/peft_model.py", line 1129, in forward
    return self.base_model(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/peft/tuners/tuners_utils.py", line 161, in forward
    return self.model.forward(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/models/hf_baichuan/baichuan7B/modeling_baichuan_7B.py", line 610, in forward
    outputs = self.model(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/models/hf_baichuan/baichuan7B/modeling_baichuan_7B.py", line 499, in forward
    layer_outputs = decoder_layer(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/models/hf_baichuan/baichuan7B/modeling_baichuan_7B.py", line 302, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/models/hf_baichuan/baichuan7B/modeling_baichuan_7B.py", line 193, in forward
    proj = self.W_pack(hidden_states)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/delight-gpu/Workspace2/azuryl/LLM-Pruner/LLMPruner/peft/tuners/lora/layer.py", line 497, in forward
    result = self.base_layer(x, *args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 24.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 19.38 MiB is free. Including non-PyTorch memory, this process has 23.62 GiB memory in use. Of the allocated memory 23.38 GiB is allocated by PyTorch, and 48.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Expected behavior

The updated version should not perform worse than the old version.
peft v0.10 should not out of GPU memory in 24G GPU such NVIDIA TITANRTX 24G for Bloom3b, llama 3b.....

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-04-02T08:55:39Z

For us to investigate, you have to give us more information. Ideally, you can share the script that produces the error. If that can't be done, share as many details as you can (model, data, hyper params, lib versions, how much memory the old version uses, etc.)

azuryl · 2024-04-02T17:36:46Z

For us to investigate, you have to give us more information. Ideally, you can share the script that produces the error. If that can't be done, share as many details as you can (model, data, hyper params, lib versions, how much memory the old version uses, etc.)

https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#1-pruning-discovery-stage--estimation-stage

CUDA_VISIBLE_DEVICES=X python post_training.py --prune_model prune_log/PATH_TO_PRUNE_MODEL/pytorch_model.bin \
      --data_path yahma/alpaca-cleaned \
      --lora_r 8 \
      --num_epochs 2 \ 
      --learning_rate 1e-4 \ 
      --batch_size 64 \
      --output_dir tune_log/PATH_TO_SAVE_TUNE_MODEL \ 
      --wandb_project llama_tune

if you use v0.3.0 there is no problem if peft v0.10 will out of memory

BenjaminBossan · 2024-04-03T10:03:52Z

Thanks for the link. I took a look at the LLM-Pruner repository. From what I can tell, this package does not use PEFT. Instead, it vendors its own PEFT code, which looks like a copy of our code base from a year ago or so + some of their modifications. Checking their requirements, PEFT is not mentioned there.

This means that the PEFT version should not have any influence on LLM-Pruner. In fact, you should be able to uninstall PEFT from your environment and it should still run. Why you get different memory usage when you upgrade PEFT, I don't know, but my best guess is that by upgrading PEFT, you also upgraded other PEFT dependencies, like transformers, and those are the issue.

Also note that LLM-Pruner code hasn't been updated for more than half a year, so is quite outdated by now. I would recommend using package version from back then when you want to run LLM-Pruner.

azuryl · 2024-04-09T06:11:49Z

I have replaced the LLM-pruner's PEFT by V0.10 and not update transfers, I guess is v0.3.0 just for lora, but v0.10 support another module such as loha,,ia3 etc , v0.10 has base layer https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py#L39 v0.3.0 has not.

Thanks for the link. I took a look at the LLM-Pruner repository. From what I can tell, this package does not use PEFT. Instead, it vendors its own PEFT code, which looks like a copy of our code base from a year ago or so + some of their modifications. Checking their requirements, PEFT is not mentioned there.

This means that the PEFT version should not have any influence on LLM-Pruner. In fact, you should be able to uninstall PEFT from your environment and it should still run. Why you get different memory usage when you upgrade PEFT, I don't know, but my best guess is that by upgrading PEFT, you also upgraded other PEFT dependencies, like transformers, and those are the issue.

Also note that LLM-Pruner code hasn't been updated for more than half a year, so is quite outdated by now. I would recommend using package version from back then when you want to run LLM-Pruner.

BenjaminBossan · 2024-04-09T09:36:15Z

I have replaced the LLM-pruner's PEFT by V0.10

You mean you have replaced their vendored PEFT code with PEFT v0.10.0? I wouldn't expect that to work, because the rest of their code probably makes assumptions about PEFT that don't hold up anymore. Maybe you can create a PR on their repo for them to use a more up-to-date PEFT version, but that probably requires more adjustments on their side and I'm not sure if the repo is still being actively developed.

As it stands, I would not expect code from > 6 months ago with their own custom PEFT code to work with the current PEFT version out of the box.

azuryl · 2024-04-09T15:37:56Z

I have replaced the LLM-pruner's PEFT by V0.10

You mean you have replaced their vendored PEFT code with PEFT v0.10.0? I wouldn't expect that to work, because the rest of their code probably makes assumptions about PEFT that don't hold up anymore. Maybe you can create a PR on their repo for them to use a more up-to-date PEFT version, but that probably requires more adjustments on their side and I'm not sure if the repo is still being actively developed.

As it stands, I would not expect code from > 6 months ago with their own custom PEFT code to work with the current PEFT version out of the box.

yes I have replaced their PEFT v0.10.0 and not modify, and work in 48G A6000GPU success but out of memory in 24G GPU. they just use PEFT for Lora and not modify PEFT。
the difference is https://github.com/horseee/LLM-Pruner/blob/main/post_training.py#L33 use pruned_dict = torch.load(args.prune_model, map_location='cpu') load pruned model.

their PEFT is V0.3.0
https://github.com/huggingface/peft/blob/v0.3.0/src/peft/tuners/lora.py
https://github.com/horseee/LLM-Pruner/blob/main/LLMPruner/peft/tuners/lora.py

and you can try if the orginal LLamA 7-B can be run in 24G GPU by V0.10 so there is no relationship with pruning

azuryl · 2024-04-12T01:05:31Z

hello @BenjaminBossan

I want to porting dora module in v0.3.0
could you tell me how to realize
weight = self.get_base_layer().weight https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py#L185
in v0.3.0
because v0.3.0 have no base layer.

Thank you

BenjaminBossan · 2024-04-12T09:04:33Z

Try self.weight instead.

azuryl · 2024-04-17T19:01:39Z

Try self.weight instead.
@BenjaminBossan
I have port Dora to v0.3.0 and can runing on 24G GPU now , thank you

and I have a question how can I set use_dora

in lora.py: I added

use_dora: bool = field(
        default=False,
        metadata={
            "help": (
                "Enable 'Weight-Decomposed Low-Rank Adaptation' (DoRA). This technique decomposes the updates of the "
                "weights into two parts, magnitude and direction. Direction is handled by normal LoRA, whereas the "
                "magnitude is handled by a separate learnable parameter. This can improve the performance of LoRA, "
                "especially at low ranks. Right now, DoRA only supports linear and Conv2D layers. DoRA introduces a bigger"
                "overhead than pure LoRA, so it is recommended to merge weights for inference. For more information, "
                "see  https://arxiv.org/abs/2402.09353."
            )
        },
    )#jliu
class Linear(nn.Linear, LoraLayer):
    # Lora implemented in a dense layer
    def __init__(
        self,
        adapter_name: str,
        in_features: int,
        out_features: int,
        r: int = 0,
        lora_alpha: int = 1,
        lora_dropout: float = 0.0,
        fan_in_fan_out: bool = False,  # Set this to True if the layer to replace stores weight like (fan_in, fan_out)
        use_rslora: bool = False
        use_dora: bool = False,
        **kwargs,
    ):

in my tuning.py
set

config = LoraConfig(
        r=args.lora_r,
        lora_alpha=args.lora_alpha,
        target_modules=args.lora_target_modules.split(","),
        lora_dropout=args.lora_dropout,
        bias="none",
        task_type="CAUSAL_LM",
        use_dora=True
        use_rslora=args.use_rslora,
    )

but lora.py can not receive the use_dora flag at class Linear(nn.Linear, LoraLayer). I have to manually set the use_dora=True in class Linear init.

BenjaminBossan · 2024-04-18T08:57:14Z

@azuryl The question you have is not related to the initial issue, could you please move it to our discussion page? Also, please clarify what you mean by:

but lora.py can not receive the use_dora flag at class Linear(nn.Linear, LoraLayer). I have to manually set the use_dora=True in class Linear init.

There should not be any need to modify the PEFT code just to use DoRA.

azuryl · 2024-04-18T12:10:50Z

@azuryl The question you have is not related to the initial issue, could you please move it to our discussion page? Also, please clarify what you mean by:

but lora.py can not receive the use_dora flag at class Linear(nn.Linear, LoraLayer). I have to manually set the use_dora=True in class Linear init.

There should not be any need to modify the PEFT code just to use DoRA.

I mean modify the v0.3.0 version since it does have not Dora

BenjaminBossan · 2024-04-18T14:18:52Z

I mean modify the v0.3.0 version since it does have not Dora

I see. I cannot give you a complete recipe to backport DoRA to PEFT v0.3.0. If you paste the complete error message, I might be able to give some tips. Make sure to update the update_layer method too when you add new arguments.

azuryl · 2024-04-29T23:06:35Z

I mean modify the v0.3.0 version since it does have not Dora

I see. I cannot give you a complete recipe to backport DoRA to PEFT v0.3.0. If you paste the complete error message, I might be able to give some tips. Make sure to update the update_layer method too when you add new arguments.

in 0.30 version lora.py

def _find_and_replace(self, adapter_name):
        lora_config = self.peft_config[adapter_name]
        loaded_in_8bit = getattr(self.model, "is_loaded_in_8bit", False)
        if loaded_in_8bit and not is_bnb_available():
            raise ImportError(
                "To use Lora with 8-bit quantization, please install the `bitsandbytes` package. "
                "You can install it with `pip install bitsandbytes`."
            )
        is_target_modules_in_base_model = False
        kwargs = {
            "r": lora_config.r,
            "lora_alpha": lora_config.lora_alpha,
            "lora_dropout": lora_config.lora_dropout,
            "fan_in_fan_out": lora_config.fan_in_fan_out,
            "init_lora_weights": lora_config.init_lora_weights,
+           "use_dora": lora_config.use_dora,
+           "use_rslora": lora_config.use_rslora,

        
class LoraConfig(PeftConfig):
    """
+   "use_dora": lora_config.use_dora,
+   "use_rslora": lora_config.use_rslora,

It can be run now

BenjaminBossan · 2024-04-30T09:18:55Z

Okay, great, good look with your further training. I'll close the issue then, feel free to re-open if something else comes up.

azuryl changed the title ~~peft v0.10 akes up too much GPU memory than v0.3.0~~ peft v0.10 takes up too much GPU memory than v0.3.0 Apr 2, 2024

BenjaminBossan closed this as completed Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

peft v0.10 takes up too much GPU memory than v0.3.0 #1613

peft v0.10 takes up too much GPU memory than v0.3.0 #1613

azuryl commented Apr 2, 2024 •

edited by BenjaminBossan

Loading

BenjaminBossan commented Apr 2, 2024

azuryl commented Apr 2, 2024 •

edited by BenjaminBossan

Loading

BenjaminBossan commented Apr 3, 2024

azuryl commented Apr 9, 2024

BenjaminBossan commented Apr 9, 2024

azuryl commented Apr 9, 2024 •

edited

Loading

azuryl commented Apr 12, 2024 •

edited

Loading

BenjaminBossan commented Apr 12, 2024

azuryl commented Apr 17, 2024 •

edited by BenjaminBossan

Loading

BenjaminBossan commented Apr 18, 2024

azuryl commented Apr 18, 2024

BenjaminBossan commented Apr 18, 2024

azuryl commented Apr 29, 2024 •

edited by BenjaminBossan

Loading

BenjaminBossan commented Apr 30, 2024

peft v0.10 takes up too much GPU memory than v0.3.0 #1613

peft v0.10 takes up too much GPU memory than v0.3.0 #1613

Comments

azuryl commented Apr 2, 2024 • edited by BenjaminBossan Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Apr 2, 2024

azuryl commented Apr 2, 2024 • edited by BenjaminBossan Loading

BenjaminBossan commented Apr 3, 2024

azuryl commented Apr 9, 2024

BenjaminBossan commented Apr 9, 2024

azuryl commented Apr 9, 2024 • edited Loading

azuryl commented Apr 12, 2024 • edited Loading

BenjaminBossan commented Apr 12, 2024

azuryl commented Apr 17, 2024 • edited by BenjaminBossan Loading

BenjaminBossan commented Apr 18, 2024

azuryl commented Apr 18, 2024

BenjaminBossan commented Apr 18, 2024

azuryl commented Apr 29, 2024 • edited by BenjaminBossan Loading

BenjaminBossan commented Apr 30, 2024

azuryl commented Apr 2, 2024 •

edited by BenjaminBossan

Loading

azuryl commented Apr 2, 2024 •

edited by BenjaminBossan

Loading

azuryl commented Apr 9, 2024 •

edited

Loading

azuryl commented Apr 12, 2024 •

edited

Loading

azuryl commented Apr 17, 2024 •

edited by BenjaminBossan

Loading

azuryl commented Apr 29, 2024 •

edited by BenjaminBossan

Loading