Skip to content

基于Qwen2.5-7B进行LoRA训练时占用过量显存 #2123

@Youzhuqinghuan

Description

@Youzhuqinghuan

Describe the bug/ 问题描述 (Mandatory / 必填)
使用Qwen2.5-7B作为主体的模型,进行LoRA训练时单卡显存占用达到57554MB。具体参数见下。

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境:
    Ascend 910B3

  • Software Environment / 软件环境 (Mandatory / 必填):
    -- MindSpore 2.5.0:
    -- Python version 3.11 :
    -- Atlas 800T A2训练服务器:
    -- Mindnlp 0.4.1

  • Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):
    Pynative

To Reproduce / 重现步骤 (Mandatory / 必填)
Steps to reproduce the behavior:

  1. 超参配置如下
    | 参数名称 | 类型 | 默认值 | 说明 |
    | ----------------- | ----- | ---- | -------------- |
    | --lora_rank | int | 16 | LoRA 秩参数 |
    | --lora_alpha | int | 16 | LoRA 缩放参数 |
    | --lora_dropout | float | 0.1 | LoRA dropout 率 |
    | --epochs | int | 5 | 训练轮数 |
    | --batch_size | int | 2 | 批次大小 |
    | --learning_rate | float | 5e-4 | 学习率 |
    | --max_length | int | 512 | 最大序列长度 |

  2. 加载LoRA配置

def create_lora_config(args):
    """创建LoRA配置"""
    # 生成所有28层的具体路径
    target_modules = []
    for i in range(28):
        target_modules.extend([
            f"qwen.model.layers.{i}.self_attn.q_proj",
            f"qwen.model.layers.{i}.self_attn.v_proj", 
            f"qwen.model.layers.{i}.self_attn.k_proj",
            f"qwen.model.layers.{i}.self_attn.o_proj",
        ])
    
    lora_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,  # 因果语言模型任务
        r=args.lora_rank,              # 低秩矩阵的秩
        lora_alpha=args.lora_alpha,    # LoRA缩放参数
        target_modules=target_modules,
        lora_dropout=args.lora_dropout,
        bias="none",                   # 不训练偏置
        inference_mode=False,          # 训练模式
        modules_to_save=None           # 不使用modules_to_save,避免意外的全参数训练
    )
    return lora_config
  1. 手动冻结基础模型部分参数
def setup_model_for_lora(config, lora_config):
    """设置模型用于LoRA训练"""
    # 1. 创建基础模型
    base_model = MultiModalQwen(config)
    
    # 2. 确保正确的冻结策略
    print("=== 应用冻结策略 ===")
    
    # 冻结所有参数
    for param in base_model.get_parameters():
        param.requires_grad = False
    
    # 3. 应用LoRA
    print("=== 应用LoRA配置 ===")
    lora_model = get_peft_model(base_model, lora_config)
    
    # 4. 手动设置投影层为可训练(在LoRA之后)
    print("=== 设置投影层为可训练 ===")
    for name, param in lora_model.named_parameters():
        if 'proj.weight' in name and 'qwen' not in name:  # 只训练MultiModalQwen的投影层
            param.requires_grad = True
            print(f"设置 {name} 为可训练")
    
    # 5. 打印参数统计
    if hasattr(lora_model, 'print_trainable_parameters'):
        lora_model.print_trainable_parameters()
    
    return lora_model
  1. 使用Trainer进行训练
    Expected behavior / 预期结果 (Mandatory / 必填)
    显示trainable params: 12,845,056 || all params: 7,170,556,928 || trainable%: 0.1791360995941876。但是显存占用达到57554MB

Screenshots/ 日志 / 截图 (Mandatory / 必填)

Image Image

Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions