Skip to content

loss_scale定制化失败 #6003

@jiusansan222

Description

@jiusansan222

不同的loss scale设置下微调qwen2.5-omni,模型在train loss上的表现与不自定义的loss scale表现一样,eval loss与不自定义的loss scale表现不一样,但是不同的scale表现还是一样的
脚本:

swift sft \
        --model Qwen2.5-Omni-7B \
        --dataset $train_data \
        --val_dataset $val_dataset \
        --system "You are a helpful assistant." \
        --train_type lora \
        --torch_dtype bfloat16 \
        --num_train_epochs 8 \
        --early_stop_interval 6 \
        --per_device_train_batch_size 4 \
        --per_device_eval_batch_size 8 \
        --learning_rate 1e-4 \
        --lora_rank 8 \
        --lora_alpha 32 \
        --target_modules all-linear \
        --freeze_vit true \
        --gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
        --eval_steps 50 \
        --save_steps 50 \
        --save_total_limit 50 \
        --logging_steps 5 \
        --max_length 2048 \
        --output_dir $model_root \
        --warmup_ratio 0.05 \
        --dataloader_num_workers 8 \
        --external_plugins ms_swift_think_half/plugin_loss_scale_v3.py \
        --loss_scale think_half_v3 \

plugin_loss_scale_v3.py代码:

import os, json
from swift.plugin.loss_scale.loss_scale import LossScale, loss_scale_map

# Print on import so you can confirm the file is actually loaded
print("[swift] loading loss_scale plugin: think_half_v3")

class ThinkHalfV3(LossScale):
    # path to regex->weight json; set in __init__ for portability
    loss_scale_config = None

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        cfg = os.path.join(os.path.dirname(os.path.abspath(__file__)), "think_half_plus_empty_ignore.json")
        self.loss_scale_config = cfg
        try:
            type(self).loss_scale_config = cfg
        except Exception:
            pass

        # Print the patterns so you can verify the rules in logs
        try:
            with open(cfg, "r", encoding="utf-8") as f:
                data = json.load(f)
            print("[swift] think_half_v3 patterns loaded:")
            for k, v in data.items():
                print("   -", k, "=>", v)
        except Exception as e:
            print("[swift] think_half_v3 failed to read config:", e)

# Register the custom loss_scale into the global mapping
loss_scale_map["think_half_v3"] = ThinkHalfV3

think_half_plus_empty_ignore.json内容

{
  "<think>\\s*</think>": [0.0],                 
  "<think>[\\s\\S]*?</think>": [0.3],           
  "<answer>[\\s\\S]*?</answer>": [1.2]         
}

微调数据示例

{"messages": [{"role": "user", "content": "<audio> Please suggest XXX."}, {"role": "assistant", "content": "<think>\nExtract key XX balance\n</think>\n\n<answer>\n XXX.\n</answer>"}], "audios": ["4309.wav"]}
{"messages": [{"role": "user", "content": "<audio> Please XX  XXX."}, {"role": "assistant", "content": "<think>\nFor each XX  XXX\n</think>\n\n<answer>\nThe XX  XXX ment.\n</answer>"}], "audios": ["197.wav"]}
{"messages": [{"role": "user", "content": "<audio> Determine if this speech XX  XXX."}, {"role": "assistant", "content": "<think>\n</think>\n\n<answer>\nXX  XXX\n</answer>"}], "audios": ["SSB03090296.wav"]}

类似于上方的setting,设置了几个think的权重方案:紫红色为默认设置,绿色灰色为think权重0.2和0.5,橘黄色权重设置如上

Image

观察训练的loss,train loss高度重合,eval loss三个一样,这是什么原因?请问是否脚本设置有误?

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions