Train lora with deepspeed using Half ,but encounter runtimeError

ds shell scripts: 
deepspeed --include localhost:0,1   --master_port 22267 fastchat/train/train_lora.py \
    --model_name_or_path  \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --data_path  \
    --output_dir  \
    --num_train_epochs 3 \
    --fp16 True \
    ...
    --deepspeed /data/xixiaoyan/FastChat0907/FastChat-main/playground/deepspeed_config_s1.json \
    --gradient_checkpointing True \
    --flash_attn False

ds config : {
  "zero_optimization": {
    "stage": 1,
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 5e8,
    "contiguous_gradients" : true,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
  }    
    
    },  
    
    "contiguous_gradients": true,
    "overlap_comm": true,
  
  "fp16":{
    "enabled": true
	}, 
 
}

But encounter error as follow ;
"/lib/python3.8/site-packages/peft/tuners/lora.py", line 1076, in forward
    self.lora_A[self.active_adapter](self.lora_dropout[self.active_adapter](x))
  File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
**RuntimeError: expected scalar type Float but found Half**




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train lora with deepspeed using Half ,but encounter runtimeError #2586

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Train lora with deepspeed using Half ,but encounter runtimeError #2586

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions