How did you train the large-sized models without out-of-memory? #27

jiang719 · 2022-08-06T09:52:02Z

I would like to fine-tune the 2B model, but I got the out-of-memory issue even with the batch size setting to 1 (on a single GPU with 24G memory).

I wonder what devices you used to pre-train the 2B and 16B models? How did you address the memory issue? Did you parallel the model by layers on different GPUs? Thank you.

Nan

enijkamp · 2022-08-07T20:20:13Z

The models were pre-trained in JAX and TPU-v4 hardware and then later converted to PyTorch for sampling.

The training code in JAX will be released soon.

You may try to fine-tune the models in PyTorch using DeepSpeed:

https://news.ycombinator.com/item?id=32331764

xanderdunn · 2022-10-04T02:48:37Z

Training code in JAX has been released: #16 (comment)

enijkamp · 2022-10-04T03:33:44Z

@jiang719 Here is DeepSpeed fine-tuning code with CPU parameter offloading, so that you should be able to avoid OOM:

https://github.com/salesforce/jaxformer/blob/main/jaxformer/hf/train.py

enijkamp closed this as completed Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How did you train the large-sized models without out-of-memory? #27

How did you train the large-sized models without out-of-memory? #27

jiang719 commented Aug 6, 2022

enijkamp commented Aug 7, 2022

xanderdunn commented Oct 4, 2022

enijkamp commented Oct 4, 2022

How did you train the large-sized models without out-of-memory? #27

How did you train the large-sized models without out-of-memory? #27

Comments

jiang719 commented Aug 6, 2022

enijkamp commented Aug 7, 2022

xanderdunn commented Oct 4, 2022

enijkamp commented Oct 4, 2022