qwen1.5-72B-Chat DDP+MP 微调报 input module parameters locate in {'cuda', 'meta'}

**Describe the bug**
```
Traceback (most recent call last):
  File "/opt/swift/lib/python3.10/site-packages/swift/cli/sft.py", line 5, in <module>
    sft_main()
  File "/opt/swift/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
    result = llm_x(args, **kwargs)
  File "/opt/swift/lib/python3.10/site-packages/swift/llm/sft.py", line 236, in llm_sft
    trainer.train(training_args.resume_from_checkpoint)
  File "/opt/swift/lib/python3.10/site-packages/swift/trainers/trainers.py", line 50, in train
    res = super().train(*args, **kwargs)
  File "/opt/swift/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train
    return inner_training_loop(
  File "/opt/swift/lib/python3.10/site-packages/transformers/trainer.py", line 1776, in _inner_training_loop
    model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
  File "/opt/swift/lib/python3.10/site-packages/accelerate/accelerator.py", line 1228, in prepare
    result = tuple(
  File "/opt/swift/lib/python3.10/site-packages/accelerate/accelerator.py", line 1229, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/opt/swift/lib/python3.10/site-packages/accelerate/accelerator.py", line 1105, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/opt/swift/lib/python3.10/site-packages/accelerate/accelerator.py", line 1356, in prepare_model
    model = torch.nn.parallel.DistributedDataParallel(
  File "/opt/swift/lib/python3.10/site-packages/swift/llm/utils/utils.py", line 857, in <lambda>
    _old_ddp_init(self, model, *args, **kwargs))
  File "/opt/swift/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 697, in __init__
    self._log_and_throw(
  File "/opt/swift/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1037, in _log_and_throw
    raise err_type(err_msg)
ValueError: DistributedDataParallel's input module must be on the same type of devices, but input module parameters locate in {'cuda', 'meta'}.
```


**Your hardware and system info**
V100-32G * 8
ms-swift==1.7.3


**Additional context**
运行命令
```
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=2 \
swift sft         \
--model_type qwen1half-72b-chat         \
--sft_type lora     \
--dtype AUTO     \
--output_dir output     \
--dataset ms-bench-mini     \
--train_dataset_sample 1000     \
--num_train_epochs 3     \
--max_length 4096     \
--check_dataset_strategy warning     \
--lora_target_modules ALL     \
--self_cognition_sample 500 \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope 
```

如果设置NPROC_PER_NODE=1的话是可以运行的；
同样的命令，改成14b的话也是可以运行的；
所以是跟显存有关系？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qwen1.5-72B-Chat DDP+MP 微调报 input module parameters locate in {'cuda', 'meta'} #634

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

qwen1.5-72B-Chat DDP+MP 微调报 input module parameters locate in {'cuda', 'meta'} #634

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions