Multi-gpu with pmap docs #147

sooheon · 2021-01-29T01:48:08Z

One of the selling points of jax is the pmap transformation, but best practices around actually getting your training loop parallelizable still is confusing. What is elegy's story around multigpu training? Is it possible to get to pytorch-lightning like api as a single arg to model.fit?

cgarciae · 2021-01-29T02:16:34Z

Hey @sooheon

Right now we don't handle that case as we are still defining some of the basic APIs (check #139 is you are interested in Pytorch Lightning-like APIs) but:

I believe you can easily use pmap inside your Module
A flag could be added as you say, but we would have to test how it integrates with hooks like add_loss, add_summary, ect.

sooheon · 2021-01-29T02:21:53Z

Adding pmap in Module and remembering also to shape your data in pmap friendly way as input to .fit seems like the default way then?

cgarciae · 2021-02-03T16:46:02Z

Giving it some thought, I think it would be good to get a working example using pmap with the new low-level API to get a better sense of how to generalize / automate it via simple arguments to Model.

I believe there are multiple ways of doing this, we can check what Keras and Pytorch Lightning propose, but I can think of 2 strategies:

Just parallelize the call to the main module, gradients calculated outside pmap so should not have device dimension.
Parallelize everthing before calling the optimizer, gradients calculated inside pmap so should have device dimension.

If there are examples on Flax or Haiku we can get a better sense on how to properly do this.

sooheon · 2021-02-04T00:59:44Z

Yeah I was trying to do just that, get an example working with low lvl api. Some examples to look at:
flax transformer
haiku imagenet

Looks like they both do strat 2 IIUC.

cgarciae · 2021-02-20T16:29:30Z

@sooheon I think this is a first good step in this direction:

https://github.com/poets-ai/elegy/blob/master/examples/elegy_mnist_conv_pmap.py

We can build on this to add it to ModelCore on a future PR, ideally you just pass a flag to enable distributed training.

My main concern is how synchronizing the states / batch statistics: are there states that you synchronize and states that you don't? If there are 2 types of states we might need to expand the GeneralizedModule API to differentiate these for all frameworks e.g. instead of params, states have params, sync_states, states.

cgarciae · 2021-11-09T20:04:05Z

Took a while but its finally supported in 0.8.1 :)

cgarciae closed this as completed Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-gpu with pmap docs #147

Multi-gpu with pmap docs #147

sooheon commented Jan 29, 2021

cgarciae commented Jan 29, 2021

sooheon commented Jan 29, 2021

cgarciae commented Feb 3, 2021

sooheon commented Feb 4, 2021

cgarciae commented Feb 20, 2021 •

edited

cgarciae commented Nov 9, 2021

Multi-gpu with pmap docs #147

Multi-gpu with pmap docs #147

Comments

sooheon commented Jan 29, 2021

cgarciae commented Jan 29, 2021

sooheon commented Jan 29, 2021

cgarciae commented Feb 3, 2021

sooheon commented Feb 4, 2021

cgarciae commented Feb 20, 2021 • edited

cgarciae commented Nov 9, 2021

cgarciae commented Feb 20, 2021 •

edited