视频DPO训练报错

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

训练配置如下：
### model
model_name_or_path: /raid/zhanghang02/weights/MiniCPM-V-2_6
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true

### method
stage: dpo
do_train: true
finetuning_type: lora
# freeze_vision_tower: true
lora_rank: 8
lora_target: all
pref_beta: 0.1
pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]

### dataset
dataset: dpo_test_video
template: minicpm_v
cutoff_len: 256
max_samples: 100000
overwrite_cache: true
preprocessing_num_workers: 1
dataloader_num_workers: 1

### output
output_dir: saves/minicpmv/lora/dpo
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 5.0e-6
num_train_epochs: 300.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

报错如下：
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00,  1.31s/it]
[INFO|modeling_utils.py:4888] 2025-05-26 16:46:08,068 >> All model checkpoint weights were used when initializing MiniCPMV.

[INFO|modeling_utils.py:4896] 2025-05-26 16:46:08,069 >> All the weights of MiniCPMV were initialized from the model checkpoint at /raid/zhanghang02/weights/MiniCPM-V-2_6.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MiniCPMV for predictions without further training.
[INFO|configuration_utils.py:1093] 2025-05-26 16:46:08,156 >> loading configuration file /raid/zhanghang02/weights/MiniCPM-V-2_6/generation_config.json
[INFO|configuration_utils.py:1140] 2025-05-26 16:46:08,156 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645
}

[INFO|2025-05-26 16:46:08] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled.
[INFO|2025-05-26 16:46:08] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
[INFO|2025-05-26 16:46:08] llamafactory.model.adapter:143 >> Upcasting trainable params to float32.
[INFO|2025-05-26 16:46:08] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA
[INFO|2025-05-26 16:46:08] llamafactory.model.model_utils.misc:143 >> Found linear modules: q_proj,v_proj,up_proj,k_proj,o_proj,down_proj,gate_proj
[INFO|2025-05-26 16:46:08] llamafactory.model.model_utils.visual:143 >> Set vision model not trainable: ['vpm'].
[INFO|2025-05-26 16:46:08] llamafactory.model.model_utils.visual:143 >> Set multi model projector not trainable: resampler.
[INFO|2025-05-26 16:46:08] llamafactory.model.loader:143 >> trainable params: 20,185,088 || all params: 8,119,360,240 || trainable%: 0.2486
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:741] 2025-05-26 16:46:09,007 >> Using auto half precision backend
[INFO|trainer.py:2369] 2025-05-26 16:46:09,246 >> ***** Running training *****
[INFO|trainer.py:2370] 2025-05-26 16:46:09,246 >>   Num examples = 109
[INFO|trainer.py:2371] 2025-05-26 16:46:09,246 >>   Num Epochs = 300
[INFO|trainer.py:2372] 2025-05-26 16:46:09,246 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2375] 2025-05-26 16:46:09,246 >>   Total train batch size (w. parallel, distributed & accumulation) = 1
[INFO|trainer.py:2376] 2025-05-26 16:46:09,246 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:2377] 2025-05-26 16:46:09,246 >>   Total optimization steps = 32,700
[INFO|trainer.py:2378] 2025-05-26 16:46:09,250 >>   Number of trainable parameters = 20,185,088
  0%|                                                                                                                              | 0/32700 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
  File "/home/zhanghang02/anaconda3/envs/test1/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/home/zhanghang02/factory/LLaMA-Factory/src/llamafactory/cli.py", line 115, in main
    COMMAND_MAP[command]()
  File "/home/zhanghang02/factory/LLaMA-Factory/src/llamafactory/train/tuner.py", line 110, in run_exp
    _training_function(config={"args": args, "callbacks": callbacks})
  File "/home/zhanghang02/factory/LLaMA-Factory/src/llamafactory/train/tuner.py", line 78, in _training_function
    run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
  File "/home/zhanghang02/factory/LLaMA-Factory/src/llamafactory/train/dpo/workflow.py", line 80, in run_dpo
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/transformers/trainer.py", line 2171, in train
    return inner_training_loop(
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/transformers/trainer.py", line 2480, in _inner_training_loop
    batch_samples, num_items_in_batch = self.get_batch_samples(epoch_iterator, num_batches)
  File "/home/zhanghang02/factory/LLaMA-Factory/src/llamafactory/train/dpo/trainer.py", line 133, in get_batch_samples
    return Trainer.get_batch_samples(self, *args, **kwargs)
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/transformers/trainer.py", line 5153, in get_batch_samples
    batch_samples += [next(epoch_iterator)]
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/accelerate/data_loader.py", line 566, in __iter__
    current_batch = next(dataloader_iter)
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 733, in __next__
    data = self._next_data()
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1515, in _next_data
    return self._process_data(data, worker_id)
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1550, in _process_data
    data.reraise()
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/torch/_utils.py", line 750, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "/home/zhanghang02/anaconda3/envs/test1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
    return self.collate_fn(data)
  File "/home/zhanghang02/factory/LLaMA-Factory/src/llamafactory/data/collator.py", line 264, in __call__
    return super().__call__(concatenated_features)
  File "/home/zhanghang02/factory/LLaMA-Factory/src/llamafactory/data/collator.py", line 157, in __call__
    mm_inputs = self.template.mm_plugin.get_mm_inputs(
  File "/home/zhanghang02/factory/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 1080, in get_mm_inputs
    image_bounds = torch.hstack(
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 3 but got size 2 for tensor number 1 in the list.

  0%|          | 0/32700 [00:00<?, ?it/s] 

### Reproduction

```text
Put your message here.
```


### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

视频DPO训练报错 #8157

Reminder

System Info

model

method

freeze_vision_tower: true

dataset

output

train

eval

val_size: 0.1

per_device_eval_batch_size: 1

eval_strategy: steps

eval_steps: 500

Reproduction

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

视频DPO训练报错 #8157

Description

Reminder

System Info

model

method

freeze_vision_tower: true

dataset

output

train

eval

val_size: 0.1

per_device_eval_batch_size: 1

eval_strategy: steps

eval_steps: 500

Reproduction

Others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions