Add larger GPT models #213

jbloxham · 2022-01-10T23:09:44Z

Time to go bigger :)

Specs copied from https://arxiv.org/pdf/2005.14165.pdf. Only fields changed from the existing GPT models are:
model.gpt2.n_embd
model.gpt2.n_head
model.gpt2.n_inner
model.gpt2.n_layer
optimizer.adamw.lr
train_batch_size
grad_accum

moinnadeem · 2022-01-10T23:11:27Z

What machine is grad_accum tuned for?

jbloxham · 2022-01-10T23:12:21Z

What machine is grad_accum tuned for?

It's just assuming a microbatch size of 1 for any 8-GPU machine. I'm of the opinion that providing pre-tuned grad_accum is counterproductive.

abhi-mosaic

I think these are fine for now for the purpose of performance benchmarking, in the near future we can update / replace them with GPT3 convergence run YAMLs.

add larger GPT models

4c6ea99

jbloxham requested review from moinnadeem and abhi-mosaic January 10, 2022 23:09

abhi-mosaic approved these changes Jan 10, 2022

View reviewed changes

jbloxham merged commit aad5330 into mosaicml:dev Jan 10, 2022

coryMosaicML pushed a commit to coryMosaicML/composer that referenced this pull request Feb 23, 2022

add larger GPT models (mosaicml#213)

331378e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add larger GPT models #213

Add larger GPT models #213

jbloxham commented Jan 10, 2022

moinnadeem commented Jan 10, 2022

jbloxham commented Jan 10, 2022

abhi-mosaic left a comment

Add larger GPT models #213

Add larger GPT models #213

Conversation

jbloxham commented Jan 10, 2022

moinnadeem commented Jan 10, 2022

jbloxham commented Jan 10, 2022

abhi-mosaic left a comment

Choose a reason for hiding this comment