Skip to content

Train lora with deepspeed using Half ,but encounter runtimeError #2586

@zuitbjc1096

Description

@zuitbjc1096

ds shell scripts:
deepspeed --include localhost:0,1 --master_port 22267 fastchat/train/train_lora.py
--model_name_or_path
--lora_r 8
--lora_alpha 16
--lora_dropout 0.05
--data_path
--output_dir
--num_train_epochs 3
--fp16 True
...
--deepspeed /data/xixiaoyan/FastChat0907/FastChat-main/playground/deepspeed_config_s1.json
--gradient_checkpointing True
--flash_attn False

ds config : {
"zero_optimization": {
"stage": 1,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients" : true,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
}

},  

"contiguous_gradients": true,
"overlap_comm": true,

"fp16":{
"enabled": true
},

}

But encounter error as follow ;
"/lib/python3.8/site-packages/peft/tuners/lora.py", line 1076, in forward
self.lora_Aself.active_adapter
File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions