Skip to content

PPO训练模型设置报错 #2267

@weiliang987644015

Description

@weiliang987644015

Describe the bug
问题1:在使用ppo训练Atom-7Bchat模型时,设置--lora_target_modules ALL \报错,若指定名称则不报错,--lora_target_modules o_proj,up_proj,down_proj,v_proj,k_proj,gate_proj,q_proj \
baocuo
baocuo2

问题2:在使用ppo训练Atom-7Bchat模型时,使用DDP+MP训练,如下图,报错。AttributeError: 'DistributedDataParallel' object has no attribute 'policy'
baocuo3
baocuo4

Your hardware and system info
ms-swift==2.6.0dev0
pytorch==2.4.0+cu121
python==3.11.10
CUDA==12.1
trl==0.11.4
transformers==4.45.2
GPU:H800-80G

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions