Any recommend way to improve training speed on hardware with low VRAM? #344

QIU-Shuo · 2022-09-20T07:47:00Z

❓ Questions and Help

Before asking:

search the issues.
search the docs.

What is your question?

Hi, I am working on training a 10B model in a limited resources (16 * A100 40GB). The problem I am facing now is that I cannot achieve the flop/s target (130T/s ~ 150T/s).
I have tuned parameters around, but the most prominent speed-up comes from reducing the model size and increasing batch size. So I am thinking the reason may be the small batch size I have to use to accomodate the lower VRAM.

n params (B)	hidden	ffw	# heads	# layers	# tensor parallel	batch size	wps	Tflop/s/A100
8.172	4096	16384	32	40	2	8	17k	69
8.172	4096	16384	32	40	4	16	OOM	OOM
4.144	4096	16384	32	20	2	16	43k	89
4.144	4096	16384	32	20	4	32	27k	56

The most straightforward way to validate this is to increasing parallel size. However from my observations, increasing tensor parallel size from 2 to 4 only slows down training. Is it as expected? If it is, is there any other way to improve the training speed here?

Seq_len 2048, Flops calculation: wps * n_params * 8 / n_gpus,

Code

What have you tried?

What's your environment?

metaseq Version (e.g., 1.0 or master): master
PyTorch Version (e.g., 1.0) nightly 1.13.0a0+d321be6
OS (e.g., Linux): Ubuntu 20.04
How you installed metaseq (pip, source): source
Build command you used (if compiling from source): pip install . -e
Python version: 3.9
CUDA/cuDNN version: 450.80.02/11.7/8600
GPU models and configuration: A100 40GB * 16
Any other relevant information:

The text was updated successfully, but these errors were encountered:

stephenroller · 2022-09-24T17:31:46Z

Checkpoint activations and FSDP will significantly lower memory pressure.

QIU-Shuo added the question Further information is requested label Sep 20, 2022

suchenzang closed this as completed Jan 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any recommend way to improve training speed on hardware with low VRAM? #344

Any recommend way to improve training speed on hardware with low VRAM? #344

QIU-Shuo commented Sep 20, 2022

stephenroller commented Sep 24, 2022

Any recommend way to improve training speed on hardware with low VRAM? #344

Any recommend way to improve training speed on hardware with low VRAM? #344

Comments

QIU-Shuo commented Sep 20, 2022

❓ Questions and Help

Before asking:

What is your question?

Code

What have you tried?

What's your environment?

stephenroller commented Sep 24, 2022