Skip to content

swift2.6.0.dev0执行DPO训练报错KeyError: 'prompt_input_ids' #2380

@Betty-J

Description

@Betty-J

执行命令
CUDA_VISIBLE_DEVICES=0,1,2,3 \ swift rlhf \ --rlhf_type dpo \ --model_type internvl2-1b \ --beta 0.1 \ --rpo_alpha 0.1 \ --sft_type lora \ --dataset local_train \ --num_train_epochs 20 \ --lora_target_modules ALL \ --gradient_checkpointing true \ --batch_size 1 \ --learning_rate 5e-5 \ --gradient_accumulation_steps 16 \ --warmup_ratio 0.03 \ --save_total_limit 2
报错如下:
截屏2024-11-05 13 57 54

版本:
ms-swift=2.6.0.dev0
transformers=4.46.1

数据格式按照:{"system": "123", "query": "11111", "response": "22222", "rejected_response": "33333", "history": [["query1", "response1"], ["query2", "response2"]]}}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions