Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepspeed Zero3 LoRA merge error #297

Closed
hendrydong opened this issue Apr 12, 2023 · 11 comments
Closed

Deepspeed Zero3 LoRA merge error #297

hendrydong opened this issue Apr 12, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@hendrydong
Copy link

Hi,

I found that when I use Deepspeed zero3, the LoRA merge cannot work. (The zero2 case works properly.) Could you help me to check that?

RuntimeErrorRuntimeError: : The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1

Full error:

    target.weight.data += (
RuntimeError: The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1
    target.weight.data += (
    target.weight.data += (
RuntimeErrorRuntimeError: : The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1The size of tensor a (0) must match t
he size of tensor b (2048) at non-singleton dimension 1

Traceback (most recent call last):

    model.merge_lora_weights()
  File "hf_decoder_model.py", line 415, in merge_lora_weights
    self.get_backend_model().merge_and_unload()
  File "lib/python3.9/site-packages/peft/tuners/lora.py", line 275, in merge_and_unload
    target.weight.data += (
RuntimeError: The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1

The config:

    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "bf16": {
        "enabled": "auto"
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },
"zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}```
@bestpredicts
Copy link

same error here , any update?

@ericzhou571
Copy link

same error here, any update?

@ericzhou571
Copy link

ericzhou571 commented May 11, 2023

Here is deepspeed config:
{
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},

"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": "auto",
        "betas": "auto",
        "eps": "auto",
        "weight_decay": "auto"
    }
},

"scheduler": {
    "type": "WarmupLR",
    "params": {
        "warmup_min_lr": "auto",
        "warmup_max_lr": "auto",
        "warmup_num_steps": "auto"
    }
},

"zero_optimization": {
    "stage": 3,
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "offload_param": {
        "device": "cpu",
        "pin_memory": true
    },
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_gather_16bit_weights_on_model_save": true
},

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false

}

I find that is quite strange.

before training with huggingface Trainer, deepspeed zero3 and lora A and B weight are not empty tensor.
截屏2023-05-11 17 04 46
After training, no matter we apply gathering parameters or not all lora A and B weight are empty tensor.
截屏2023-05-11 17 04 28

@2018211801
Copy link

same question

1 similar comment
@linyubupa
Copy link

same question

@younesbelkada younesbelkada added the bug Something isn't working label Jun 21, 2023
@huggingface huggingface deleted a comment from github-actions bot Jun 27, 2023
@RicardoLeeV587
Copy link

Same problem
Zero3 is quite useful for me since I only have V100 graphic cards.

@1ytic
Copy link

1ytic commented Jul 18, 2023

Let me clarify few points about ZeRO3 model initialisation with transformers library.

The method from_pretrained() can handle ZeRO3 config if it was initialised before. For example, you create TrainingArguments(..., deepspeed="ds_config_zero3.json") object before you run from_pretrained() method. In this case, the model will be created with the special context deepspeed.zero.Init().

This context manager will partitioned the model parameters and store them in the new param.ds_tensor tensor. The original param.data tensor becomes empty after this initialisation. @ericzhou571 this could probably explain why you are seeing empty tensors.

Things get more complicated when we add LoRA parameters to this model. For example, we wrap the partitioned model with this get_peft_model() function. If this model has the classification head score like this one LlamaForSequenceClassification, it will be copied inside ModulesToSaveWrapper class and breaks the forward pass. @hendrydong I didn't used LMFlow package, but maybe it is similar case.

@RicardoLeeV587
Copy link

@1ytic
Thanks for your clarification!

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@ANYMS-A
Copy link

ANYMS-A commented Jun 21, 2024

@1ytic Thanks for your clarification!

Hello , I met a same situation as you did. Have you find a solution to combine LoRA and DeepSpeed ZeRO3 to finetune model?

@RicardoLeeV587
Copy link

@1ytic Thanks for your clarification!

Hello , I met a same situation as you did. Have you find a solution to combine LoRA and DeepSpeed ZeRO3 to finetune model?

Hi Anyms:

This is a quite old issue so I am not sure if there has already had some new and elegant method to tackle it.

One year ago, I gave a try to first call "model_wrapped._zero3_consolidated_16bit_state_dict()" then "get_peft_model_state_dict" on the new engine_state_dict. This is based on Deepspeed 0.10 and Peft "13e53fc" head.

Even though my method could successfully save the correct "adapter.bin", it was time consuming to aggregate all model slices. Therefore, I would not suggest you to follow my solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants