PPO训练模型设置报错

**Describe the bug**
问题1：在使用ppo训练Atom-7Bchat模型时，设置`--lora_target_modules ALL \`报错，若指定名称则不报错，`--lora_target_modules o_proj,up_proj,down_proj,v_proj,k_proj,gate_proj,q_proj \`
![baocuo](https://github.com/user-attachments/assets/03fb838b-cfaf-4c7f-a473-263029fe3db3)
![baocuo2](https://github.com/user-attachments/assets/3cdc0a1b-c6fc-4252-9b2d-84639f4f285f)

问题2：在使用ppo训练Atom-7Bchat模型时，使用DDP+MP训练，如下图，报错。`AttributeError: 'DistributedDataParallel' object has no attribute 'policy'`
![baocuo3](https://github.com/user-attachments/assets/d042d742-7c71-43d1-b1bd-099703e85197)
![baocuo4](https://github.com/user-attachments/assets/902a68eb-d430-469f-add5-822ec7b6bc03)

**Your hardware and system info**
ms-swift==2.6.0dev0
pytorch==2.4.0+cu121
python==3.11.10
CUDA==12.1
trl==0.11.4
transformers==4.45.2
GPU：H800-80G

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PPO训练模型设置报错 #2267

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PPO训练模型设置报错 #2267

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions