Skip to content

[BUG] Why is LoRA much slower than Freeze? #6507

@gugugu-469

Description

@gugugu-469

Model:Qwen2-72B
Machine: 8 * A100 80G
Environment: Deepspeed zero 3

Freeze Method Log:

  • Set trainable layers: 77,78,79
  • trainable params: 2633054208 || all params: 72706203648 || trainable%: 3.6215
  • epochs:10
  • train_runtime = 20:36:14.47
  • train_samples_per_second = 1.051
  • train_steps_per_second = 0.008

LoRA Method Log:

  • trainable params: 16384000 || all params: 72722587648 || trainable%: 0.0225
  • epochs:6
  • train_runtime = 1 day, 8:57:44.99
  • train_samples_per_second = 0.394
  • train_steps_per_second = 0.002

In my experiment, LoRA method is much slower than Freeze method. Is this normal and why?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtraining

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions