-
Couldn't load subscription status.
- Fork 929
Open
Description
不同的loss scale设置下微调qwen2.5-omni,模型在train loss上的表现与不自定义的loss scale表现一样,eval loss与不自定义的loss scale表现不一样,但是不同的scale表现还是一样的
脚本:
swift sft \
--model Qwen2.5-Omni-7B \
--dataset $train_data \
--val_dataset $val_dataset \
--system "You are a helpful assistant." \
--train_type lora \
--torch_dtype bfloat16 \
--num_train_epochs 8 \
--early_stop_interval 6 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 8 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--freeze_vit true \
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 50 \
--logging_steps 5 \
--max_length 2048 \
--output_dir $model_root \
--warmup_ratio 0.05 \
--dataloader_num_workers 8 \
--external_plugins ms_swift_think_half/plugin_loss_scale_v3.py \
--loss_scale think_half_v3 \
plugin_loss_scale_v3.py代码:
import os, json
from swift.plugin.loss_scale.loss_scale import LossScale, loss_scale_map
# Print on import so you can confirm the file is actually loaded
print("[swift] loading loss_scale plugin: think_half_v3")
class ThinkHalfV3(LossScale):
# path to regex->weight json; set in __init__ for portability
loss_scale_config = None
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
cfg = os.path.join(os.path.dirname(os.path.abspath(__file__)), "think_half_plus_empty_ignore.json")
self.loss_scale_config = cfg
try:
type(self).loss_scale_config = cfg
except Exception:
pass
# Print the patterns so you can verify the rules in logs
try:
with open(cfg, "r", encoding="utf-8") as f:
data = json.load(f)
print("[swift] think_half_v3 patterns loaded:")
for k, v in data.items():
print(" -", k, "=>", v)
except Exception as e:
print("[swift] think_half_v3 failed to read config:", e)
# Register the custom loss_scale into the global mapping
loss_scale_map["think_half_v3"] = ThinkHalfV3
think_half_plus_empty_ignore.json内容
{
"<think>\\s*</think>": [0.0],
"<think>[\\s\\S]*?</think>": [0.3],
"<answer>[\\s\\S]*?</answer>": [1.2]
}
微调数据示例
{"messages": [{"role": "user", "content": "<audio> Please suggest XXX."}, {"role": "assistant", "content": "<think>\nExtract key XX balance\n</think>\n\n<answer>\n XXX.\n</answer>"}], "audios": ["4309.wav"]}
{"messages": [{"role": "user", "content": "<audio> Please XX XXX."}, {"role": "assistant", "content": "<think>\nFor each XX XXX\n</think>\n\n<answer>\nThe XX XXX ment.\n</answer>"}], "audios": ["197.wav"]}
{"messages": [{"role": "user", "content": "<audio> Determine if this speech XX XXX."}, {"role": "assistant", "content": "<think>\n</think>\n\n<answer>\nXX XXX\n</answer>"}], "audios": ["SSB03090296.wav"]}
类似于上方的setting,设置了几个think的权重方案:紫红色为默认设置,绿色灰色为think权重0.2和0.5,橘黄色权重设置如上
观察训练的loss,train loss高度重合,eval loss三个一样,这是什么原因?请问是否脚本设置有误?

Metadata
Metadata
Assignees
Labels
No labels