Deepspeed Zero3 LoRA merge error #297

hendrydong · 2023-04-12T03:21:23Z

Hi,

I found that when I use Deepspeed zero3, the LoRA merge cannot work. (The zero2 case works properly.) Could you help me to check that?

RuntimeErrorRuntimeError: : The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1

Full error:

    target.weight.data += (
RuntimeError: The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1
    target.weight.data += (
    target.weight.data += (
RuntimeErrorRuntimeError: : The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1The size of tensor a (0) must match t
he size of tensor b (2048) at non-singleton dimension 1

Traceback (most recent call last):

    model.merge_lora_weights()
  File "hf_decoder_model.py", line 415, in merge_lora_weights
    self.get_backend_model().merge_and_unload()
  File "lib/python3.9/site-packages/peft/tuners/lora.py", line 275, in merge_and_unload
    target.weight.data += (
RuntimeError: The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1

The config:

    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "bf16": {
        "enabled": "auto"
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },
"zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}```

The text was updated successfully, but these errors were encountered:

bestpredicts · 2023-05-09T02:40:48Z

same error here , any update?

ericzhou571 · 2023-05-11T09:01:00Z

same error here, any update?

ericzhou571 · 2023-05-11T09:07:01Z

Here is deepspeed config:
{
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},

"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": "auto",
        "betas": "auto",
        "eps": "auto",
        "weight_decay": "auto"
    }
},

"scheduler": {
    "type": "WarmupLR",
    "params": {
        "warmup_min_lr": "auto",
        "warmup_max_lr": "auto",
        "warmup_num_steps": "auto"
    }
},

"zero_optimization": {
    "stage": 3,
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "offload_param": {
        "device": "cpu",
        "pin_memory": true
    },
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_gather_16bit_weights_on_model_save": true
},

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false

}

I find that is quite strange.

before training with huggingface Trainer, deepspeed zero3 and lora A and B weight are not empty tensor.

After training, no matter we apply gathering parameters or not all lora A and B weight are empty tensor.

2018211801 · 2023-06-08T13:27:37Z

same question

linyubupa · 2023-06-16T16:31:31Z

same question

RicardoLeeV587 · 2023-06-29T09:08:29Z

Same problem
Zero3 is quite useful for me since I only have V100 graphic cards.

1ytic · 2023-07-18T10:48:56Z

Let me clarify few points about ZeRO3 model initialisation with transformers library.

The method from_pretrained() can handle ZeRO3 config if it was initialised before. For example, you create TrainingArguments(..., deepspeed="ds_config_zero3.json") object before you run from_pretrained() method. In this case, the model will be created with the special context deepspeed.zero.Init().

This context manager will partitioned the model parameters and store them in the new param.ds_tensor tensor. The original param.data tensor becomes empty after this initialisation. @ericzhou571 this could probably explain why you are seeing empty tensors.

Things get more complicated when we add LoRA parameters to this model. For example, we wrap the partitioned model with this get_peft_model() function. If this model has the classification head score like this one LlamaForSequenceClassification, it will be copied inside ModulesToSaveWrapper class and breaks the forward pass. @hendrydong I didn't used LMFlow package, but maybe it is similar case.

RicardoLeeV587 · 2023-07-19T09:58:32Z

@1ytic
Thanks for your clarification!

github-actions · 2023-08-12T15:03:34Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

ANYMS-A · 2024-06-21T01:47:20Z

@1ytic Thanks for your clarification!

Hello , I met a same situation as you did. Have you find a solution to combine LoRA and DeepSpeed ZeRO3 to finetune model?

RicardoLeeV587 · 2024-06-24T01:43:14Z

@1ytic Thanks for your clarification!

Hello , I met a same situation as you did. Have you find a solution to combine LoRA and DeepSpeed ZeRO3 to finetune model?

Hi Anyms:

This is a quite old issue so I am not sure if there has already had some new and elegant method to tackle it.

One year ago, I gave a try to first call "model_wrapped._zero3_consolidated_16bit_state_dict()" then "get_peft_model_state_dict" on the new engine_state_dict. This is based on Deepspeed 0.10 and Peft "13e53fc" head.

Even though my method could successfully save the correct "adapter.bin", it was time consuming to aggregate all model slices. Therefore, I would not suggest you to follow my solution.

younesbelkada added the bug Something isn't working label Jun 21, 2023

huggingface deleted a comment from github-actions bot Jun 27, 2023

github-actions bot closed this as completed Aug 20, 2023

BGFGB mentioned this issue Sep 28, 2023

dpo训练出现问题；显示RuntimeError: The size of tensor a (0) must match the size of tensor b (5120) at non-singleton dimension 1 hiyouga/LLaMA-Factory#513

Closed

lewtun mentioned this issue Dec 4, 2023

"RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1" (DPO + LoRA) huggingface/alignment-handbook#57

Open

ANYMS-A mentioned this issue Jun 24, 2024

Can't apply LoRA's PiSSA weight init when using DeepSpeed ZeRO3 + LoRA to finetune! huggingface/accelerate#2886

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepspeed Zero3 LoRA merge error #297

Deepspeed Zero3 LoRA merge error #297

hendrydong commented Apr 12, 2023

bestpredicts commented May 9, 2023

ericzhou571 commented May 11, 2023

ericzhou571 commented May 11, 2023 •

edited

Loading

2018211801 commented Jun 8, 2023

linyubupa commented Jun 16, 2023

RicardoLeeV587 commented Jun 29, 2023

1ytic commented Jul 18, 2023

RicardoLeeV587 commented Jul 19, 2023

github-actions bot commented Aug 12, 2023

ANYMS-A commented Jun 21, 2024

RicardoLeeV587 commented Jun 24, 2024

Deepspeed Zero3 LoRA merge error #297

Deepspeed Zero3 LoRA merge error #297

Comments

hendrydong commented Apr 12, 2023

bestpredicts commented May 9, 2023

ericzhou571 commented May 11, 2023

ericzhou571 commented May 11, 2023 • edited Loading

2018211801 commented Jun 8, 2023

linyubupa commented Jun 16, 2023

RicardoLeeV587 commented Jun 29, 2023

1ytic commented Jul 18, 2023

RicardoLeeV587 commented Jul 19, 2023

github-actions bot commented Aug 12, 2023

ANYMS-A commented Jun 21, 2024

RicardoLeeV587 commented Jun 24, 2024

ericzhou571 commented May 11, 2023 •

edited

Loading