Skip to content

Qwen3-VL-8B SFT训练, 直接使用官方的微调脚本qwen_vl_finetune未出现OOM, 切换为swift微调出现OOM,都配置的deepspeed=zero0 #6484

@Furture-day

Description

@Furture-day

配置:GPU 8卡 140G显存
transformer配置:
torchrun --nproc_per_node 8 --rdzv_conf="timeout=7200" --nnodes $VC_WORKER_NUM --node_rank $RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT qwen-vl-finetune/qwenvl/train/train_qwen.py
--deepspeed=./qwen-vl-finetune/scripts/zero0.json
--model_name_or_path=./data/Qwen3-VL-8B-Instruct
--dataset_use=qwen-vl-finetune/qwenvl/data/data_config/sft_test.json
--data_flatten=True
--tune_mm_vision=False
--tune_mm_mlp=True
--tune_mm_llm=True
--bf16=True
--output_dir=$OUTPUT_URL/qwen3vl_sft
--num_train_epochs=1
--per_device_train_batch_size=2
--per_device_eval_batch_size=1
--gradient_accumulation_steps=1
--max_pixels=1806336
--min_pixels=50176
--eval_strategy=no
--save_strategy=steps
--save_steps=10000
--save_total_limit=10
--learning_rate=1e-04
--weight_decay=0
--warmup_ratio=0.03
--max_grad_norm=1
--lr_scheduler_type=cosine
--logging_steps=1
--model_max_length=81920
--gradient_checkpointing=True
--dataloader_num_workers=8
--run_name=qwen3vl
--report_to=tensorboard
--data_packing=False

swift配置:
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True'
IMAGE_MAX_TOKEN_NUM=1764
IMAGE_MIN_TOKEN_NUM=49
torchrun --nproc_per_node 8 --rdzv_conf="timeout=7200" --nnodes $VC_WORKER_NUM --node_rank $RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT ./features/qwen3vl/train/train.py
--model ./data/Qwen3-VL-8B-Instruct
--model_type qwen3_vl
--train_type full
--freeze_vit true
--freeze_llm false
--freeze_aligner false
--dataset './features/qwen3vl/dataset/dataset_info.json'
--split_dataset_ratio 0
--remove_unused_columns false
--dataset_shuffle false
--train_dataloader_shuffle false
--torch_dtype bfloat16
--num_train_epochs 1
--per_device_train_batch_size 2
--learning_rate 1e-4
--eval_steps 5000000
--save_steps 1000
--save_total_limit 40
--modules_to_save embed_tokens lm_head
--gradient_accumulation_steps 1
--logging_steps 10
--dataset_num_proc 8
--dataloader_num_workers 8
--max_new_tokens 3000
--output_dir ${OUTPUT_URL}
--warmup_ratio 0.01
--gradient_checkpointing True
--attn_impl flash_attn
--padding_free True
--max_length 81920
--use_liger_kernel true
--deepspeed ./ms-swift-3.9/swift/llm/ds_config/zero0.json \

报错信息:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions