-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
peft v0.10 takes up too much GPU memory than v0.3.0 #1613
Comments
For us to investigate, you have to give us more information. Ideally, you can share the script that produces the error. If that can't be done, share as many details as you can (model, data, hyper params, lib versions, how much memory the old version uses, etc.) |
https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#1-pruning-discovery-stage--estimation-stage
if you use v0.3.0 there is no problem if peft v0.10 will out of memory |
Thanks for the link. I took a look at the LLM-Pruner repository. From what I can tell, this package does not use PEFT. Instead, it vendors its own PEFT code, which looks like a copy of our code base from a year ago or so + some of their modifications. Checking their requirements, PEFT is not mentioned there. This means that the PEFT version should not have any influence on LLM-Pruner. In fact, you should be able to uninstall PEFT from your environment and it should still run. Why you get different memory usage when you upgrade PEFT, I don't know, but my best guess is that by upgrading PEFT, you also upgraded other PEFT dependencies, like transformers, and those are the issue. Also note that LLM-Pruner code hasn't been updated for more than half a year, so is quite outdated by now. I would recommend using package version from back then when you want to run LLM-Pruner. |
I have replaced the LLM-pruner's PEFT by V0.10 and not update transfers, I guess is v0.3.0 just for lora, but v0.10 support another module such as loha,,ia3 etc , v0.10 has base layer https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py#L39 v0.3.0 has not.
|
You mean you have replaced their vendored PEFT code with PEFT v0.10.0? I wouldn't expect that to work, because the rest of their code probably makes assumptions about PEFT that don't hold up anymore. Maybe you can create a PR on their repo for them to use a more up-to-date PEFT version, but that probably requires more adjustments on their side and I'm not sure if the repo is still being actively developed. As it stands, I would not expect code from > 6 months ago with their own custom PEFT code to work with the current PEFT version out of the box. |
yes I have replaced their PEFT v0.10.0 and not modify, and work in 48G A6000GPU success but out of memory in 24G GPU. they just use PEFT for Lora and not modify PEFT。 their PEFT is V0.3.0 and you can try if the orginal LLamA 7-B can be run in 24G GPU by V0.10 so there is no relationship with pruning |
hello @BenjaminBossan I want to porting dora module in v0.3.0 Thank you |
Try |
and I have a question how can I set use_dora in lora.py: I added use_dora: bool = field(
default=False,
metadata={
"help": (
"Enable 'Weight-Decomposed Low-Rank Adaptation' (DoRA). This technique decomposes the updates of the "
"weights into two parts, magnitude and direction. Direction is handled by normal LoRA, whereas the "
"magnitude is handled by a separate learnable parameter. This can improve the performance of LoRA, "
"especially at low ranks. Right now, DoRA only supports linear and Conv2D layers. DoRA introduces a bigger"
"overhead than pure LoRA, so it is recommended to merge weights for inference. For more information, "
"see https://arxiv.org/abs/2402.09353."
)
},
)#jliu
class Linear(nn.Linear, LoraLayer):
# Lora implemented in a dense layer
def __init__(
self,
adapter_name: str,
in_features: int,
out_features: int,
r: int = 0,
lora_alpha: int = 1,
lora_dropout: float = 0.0,
fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out)
use_rslora: bool = False
use_dora: bool = False,
**kwargs,
): in my tuning.py config = LoraConfig(
r=args.lora_r,
lora_alpha=args.lora_alpha,
target_modules=args.lora_target_modules.split(","),
lora_dropout=args.lora_dropout,
bias="none",
task_type="CAUSAL_LM",
use_dora=True
use_rslora=args.use_rslora,
) but lora.py can not receive the use_dora flag at class Linear(nn.Linear, LoraLayer). I have to manually set the use_dora=True in class Linear init. |
@azuryl The question you have is not related to the initial issue, could you please move it to our discussion page? Also, please clarify what you mean by:
There should not be any need to modify the PEFT code just to use DoRA. |
I mean modify the v0.3.0 version since it does have not Dora |
I see. I cannot give you a complete recipe to backport DoRA to PEFT v0.3.0. If you paste the complete error message, I might be able to give some tips. Make sure to update the |
in 0.30 version lora.py def _find_and_replace(self, adapter_name):
lora_config = self.peft_config[adapter_name]
loaded_in_8bit = getattr(self.model, "is_loaded_in_8bit", False)
if loaded_in_8bit and not is_bnb_available():
raise ImportError(
"To use Lora with 8-bit quantization, please install the `bitsandbytes` package. "
"You can install it with `pip install bitsandbytes`."
)
is_target_modules_in_base_model = False
kwargs = {
"r": lora_config.r,
"lora_alpha": lora_config.lora_alpha,
"lora_dropout": lora_config.lora_dropout,
"fan_in_fan_out": lora_config.fan_in_fan_out,
"init_lora_weights": lora_config.init_lora_weights,
+ "use_dora": lora_config.use_dora,
+ "use_rslora": lora_config.use_rslora,
class LoraConfig(PeftConfig):
"""
+ "use_dora": lora_config.use_dora,
+ "use_rslora": lora_config.use_rslora, It can be run now |
Okay, great, good look with your further training. I'll close the issue then, feel free to re-open if something else comes up. |
System Info
peft v0.10
Who can help?
@pacman100 @younesbelkada @BenjaminBossan @sayakpaul
Information
Tasks
examples
folderReproduction
in 24G GPU even bloom 3B Lora , the peft v0.10(https://github.com/huggingface/peft/blob/v0.10.0/src/peft/tuners/lora/layer.py) is out of GPU memory,but the v0.3.0 (https://github.com/huggingface/peft/blob/v0.3.0/src/peft/tuners/lora.py) has no problem
Expected behavior
The updated version should not perform worse than the old version.
peft v0.10 should not out of GPU memory in 24G GPU such NVIDIA TITANRTX 24G for Bloom3b, llama 3b.....
The text was updated successfully, but these errors were encountered: