Model:Qwen2-72B
Machine: 8 * A100 80G
Environment: Deepspeed zero 3
Freeze Method Log:
- Set trainable layers: 77,78,79
- trainable params: 2633054208 || all params: 72706203648 || trainable%: 3.6215
- epochs:10
- train_runtime = 20:36:14.47
- train_samples_per_second = 1.051
- train_steps_per_second = 0.008
LoRA Method Log:
- trainable params: 16384000 || all params: 72722587648 || trainable%: 0.0225
- epochs:6
- train_runtime = 1 day, 8:57:44.99
- train_samples_per_second = 0.394
- train_steps_per_second = 0.002
In my experiment, LoRA method is much slower than Freeze method. Is this normal and why?
Model:Qwen2-72B
Machine: 8 * A100 80G
Environment: Deepspeed zero 3
Freeze Method Log:
LoRA Method Log:
In my experiment, LoRA method is much slower than Freeze method. Is this normal and why?