- 
                Notifications
    
You must be signed in to change notification settings  - Fork 942
 
Closed
Description
Describe the bug
在已经训练好的 lora 上继续训练,使用 resume_only_model,仍然会读取上一次的 step,然后直接就结束训练。
命令:
nproc_per_node=2
CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=$nproc_per_node \
nohup swift sft \
--model Qwen3-VL-30B-A3B-Instruct \
--dataset  data.jsonl \
--val_dataset eval.jsonl \
--train_type lora \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--learning_rate 1e-5 \
--lora_rank 512 \
--lora_alpha 512 \
--freeze_llm false \
--freeze_vit true \
--gradient_accumulation_steps 2 \
--save_steps 500 \
--save_total_limit 5 \
--logging_steps 5 \
--max_new_tokens 8192 \
--max_length 12800 \
--output_dir outputs/Qwen3-VL-30B-A3B-Instruct/
--resume_from_checkpoint 'outputs/Qwen3-VL-30B-A3B-Instruct/v1-20251004-222628/checkpoint-35200' \
--resume_only_model true \
--warmup_steps 100 \
--dataloader_num_workers 0 \
--optim adamw_8bit \
--attn_impl flash_attn \
--do_eval false \
--truncation_strategy delete \
--max_pixels 409600  > ./v2.log 2>&1 &
Train:   0%|          | 0/30255 [00:00<?, ?it/s]
                                                
{'train_runtime': 0.8557, 'train_samples_per_second': 141423.405, 'train_steps_per_second': 35355.559, 'train_loss': 0.0, 'epoch': 0.35, 'global_step/max_steps': '35200/30255', 'percentage': '116.34%', 'elapsed_time': '0s', 'remaining_time': '23h 59m 59s', 'memory(GiB)': 83.59, 'train_speed(iter/s)': 86436.446281}
Train:   0%|          | 0/30255 [00:00<?, ?it/s]
Train:   0%|          | 0/30255 [00:00<?, ?it/s]
Train:   0%|          | 0/30255 [00:00<?, ?it/s]
[INFO:swift] last_model_checkpoint: None
[INFO:swift] best_model_checkpoint: outputs/Qwen3-VL-30B-A3B-Instruct/v1-20251004-222628/checkpoint-35200
[INFO:swift] images_dir: outputs/Qwen3-VL-30B-A3B-Instruct/v2-20251010-191404/images
[INFO:swift] End time of running main: 2025-10-10 21:13:41.979910
transformers 4.38.2
ms-swift 源码安装最新
Metadata
Metadata
Assignees
Labels
No labels