-
Notifications
You must be signed in to change notification settings - Fork 939
Open
Description
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
# 80GiB * 2
nproc_per_node=8
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
accelerate launch --config_file "./examples/train/multi-gpu/fsdp_qlora/fsdp_offload.json" \
swift/cli/sft.py \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--train_type full \
--dataset test/vlm.jsonl \
--load_from_cache_file true \
--split_dataset_ratio 0 \
--torch_dtype bfloat16 \
--attn_impl flash_attn \
--freeze_vit true \
--freeze_llm true \
--freeze_aligner false \
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--learning_rate 5e-6 \
--gradient_accumulation_steps 2 \
--eval_steps -1 \
--save_steps 1000 \
--save_total_limit 10 \
--logging_steps 5 \
--max_length 3072 \
--output_dir output \
--warmup_ratio 0.05 \
--dataloader_num_workers 4 \
--dataset_num_proc 8 \
--padding_free true \
--gradient_checkpointing true
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
torch 2.6.0+cu124
torchao 0.13.0
torchaudio 2.6.0+cu124
torchelastic 0.2.2
torchvision 0.21.0+cu124
tqdm 4.67.1
traitlets 5.14.3
transformers 4.57.0
transformers-stream-generator 0.0.5
Additional context
Add any other context about the problem here(在这里补充其他信息)
Metadata
Metadata
Assignees
Labels
No labels