How to use l2l.algorithms.MAML correctly with nn.DistributedDataParallel? #170

AyanamiReiFan · 2020-08-13T13:50:54Z

This work is awesome!

Using nn.DistributedDataParallel in the following way will raise Error when execute learner = maml.clone()
How to use it correctly? Should I use nn.DistributedDataParallel on MyModel and then use MAML?
Thanks!

model = MyModel()
maml = l2l.algorithms.MAML(model, lr=0.5)
model = nn.DistributedDataPrallel(model, device_ids=[rank])
...
learner = maml.clone()

The text was updated successfully, but these errors were encountered:

seba-1511 · 2020-08-15T18:56:22Z

Hello @AyanamiReiFan, and thanks for the kind words.

Parallelizing MAML with DistributedDataParallel is a bit tricky as the implementation relies on gradient hooks which don't play well with clone/grad. If you want to use torch.distributed to parallelize the training loop, cherry's Distributed optimizer is another option:

opt = optim.Adam(model.parameters())
opt = Distributed(model.parameters(), opt, sync=1)
# Training code
opt.step()

If you want to parallelize the model over GPUs, I would use torch.nn.DataParallel:

learner = maml.clone()
learner = torch.nn.DataParallel(learner, device_ids=[0, 1])
# Training code

Let me know if you ever find a solution to using DistributedDataParallel, I'd be curious to know the solution.

AyanamiReiFan · 2020-08-17T12:19:07Z

Hello @AyanamiReiFan, and thanks for the kind words.

Parallelizing MAML with DistributedDataParallel is a bit tricky as the implementation relies on gradient hooks which don't play well with clone/grad. If you want to use torch.distributed to parallelize the training loop, cherry's Distributed optimizer is another option:
opt = optim.Adam(model.parameters())
opt = Distributed(model.parameters(), opt, sync=1)
# Training code
opt.step()
If you want to parallelize the model over GPUs, I would use torch.nn.DataParallel:
learner = maml.clone()
learner = torch.nn.DataParallel(learner, device_ids=[0, 1])
# Training code
Let me know if you ever find a solution to using DistributedDataParallel, I'd be curious to know the solution.

Thanks very much!
My main target is to accelerate the training by using multi GPUs.
I have tried to use torch.nn.DataParallel to parallelize the model over GPUs, but after learner = torch.nn.DataParallel(learner, device_ids=[0, 1]), I have to call learner.module.adapt to execute the adapt action, this usage is not paralleled. Do you have any suggestion about it?

AyanamiReiFan · 2020-08-17T12:42:56Z

I'm training a 1-Way 5-Shot Segmentation model on MAML, so the batch size on training can only be 5.

So I think it will not speed up a lot by parallelizing the adapt and evaluate action in each iteration in

learner = maml.clone()
learner = torch.nn.DataParallel(learner, device_ids=[0, 1])

This is why I tried to parallelize the MAML module and want to let the MAML to calculate different batch on many gpus, but it seems that my effort to use DistributedDataParallel is wrong, Do you have any suggestion about it?

Thanks very much!

janbolle · 2020-08-25T09:29:11Z

Hello @AyanamiReiFan,
actually I wrote a paper about parallelizing MAML.
I implemented it using Ray. I used n separate learners (n would be the number of GPUs you want to use). After training separately I averaged the weights of the learners in a central learner.

Maybe this helps.

AyanamiReiFan · 2020-08-27T03:00:18Z

Thanks very much! @janbolle
It's really helpful!

Kulbear · 2020-09-03T08:46:00Z

@janbolle That's an exciting work!
A few questions around your paper:

It is implemented using Ray, so I suppose it works on CPU perfectly. Have you tried to do a GPU version of it?
Do you have plan to release the implementation publicly?

Thank you!

janbolle · 2020-09-05T08:16:07Z

@Kulbear

I did not use a GPU version as the batch-sizes and NNs are relatively small and I suppose this won't speed things up much in this setting - but you could easily implement it since Ray also supports GPU
I only did experiments for Regression and Classification. Also, the implementation is done using TF2.0. Would the implementation be helpful for you?

Kulbear · 2020-09-05T09:08:00Z

@janbolle Thanks for the reply!

Got it, I can give it a try.
This one is more on my personal curiosity :) So if you are willing to share that would be nice please feel free to do it or not at your own convenience! I'm pretty new to TF2.0. If I remember correctly I turned to PyTorch after... well... maybe TF 1.9...

:D

seba-1511 · 2020-10-10T23:04:43Z

Closing since dormant. Feel free to reopen.

zhaozj89 · 2020-11-14T00:27:09Z

I have a large batch that cannot fit into one 2080Ti GPU (11G). I have tried:

learner = maml.clone()
learner = torch.nn.DataParallel(learner, device_ids=[0, 1])
# Training code

But all memory still goes to one GPU. Is there an easy way to get around this? Thanks.

seba-1511 · 2020-11-14T20:33:01Z

@zhaozj89 This worked for me:

learner = model.clone()
learner.module = torch.nn.DataParallel(learner.module, device_ids=[0, 1])

brando90 · 2021-02-23T18:25:11Z

Hello @AyanamiReiFan, and thanks for the kind words.
Parallelizing MAML with DistributedDataParallel is a bit tricky as the implementation relies on gradient hooks which don't play well with clone/grad. If you want to use torch.distributed to parallelize the training loop, cherry's Distributed optimizer is another option:
opt = optim.Adam(model.parameters())
opt = Distributed(model.parameters(), opt, sync=1)
# Training code
opt.step()
If you want to parallelize the model over GPUs, I would use torch.nn.DataParallel:
learner = maml.clone()
learner = torch.nn.DataParallel(learner, device_ids=[0, 1])
# Training code
Let me know if you ever find a solution to using DistributedDataParallel, I'd be curious to know the solution.
Thanks very much!
My main target is to accelerate the training by using multi GPUs.
I have tried to use torch.nn.DataParallel to parallelize the model over GPUs, but after learner = torch.nn.DataParallel(learner, device_ids=[0, 1]), I have to call learner.module.adapt to execute the adapt action, this usage is not paralleled. Do you have any suggestion about it?

see this thread: #197 it seems you can use a lighting wrapper to parallelize MAML. Not tried it myself yet...but I assume it works. Seems DDP is tricky to work for technical reasons I don't understant.

SungFeng-Huang · 2021-04-14T10:41:14Z

I'm training a 1-Way 5-Shot Segmentation model on MAML, so the batch size on training can only be 5.

So I think it will not speed up a lot by parallelizing the adapt and evaluate action in each iteration in
learner = maml.clone()
learner = torch.nn.DataParallel(learner, device_ids=[0, 1])
This is why I tried to parallelize the MAML module and want to let the MAML to calculate different batch on many gpus, but it seems that my effort to use DistributedDataParallel is wrong, Do you have any suggestion about it?

Thanks very much!

Hi, here's my implementation of ParallellMAML using Learn2Learn's LightningMAML + PyTorch Lightning DDP: https://gist.github.com/SungFeng-Huang/dec22eef5650f5a74d24a732ffd0080f
It should work with adding the argument "--meta_task_ddp".

seba-1511 closed this as completed Oct 10, 2020

arijitthegame mentioned this issue Sep 16, 2021

DDP Wrapper for L2l datasets #263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use l2l.algorithms.MAML correctly with nn.DistributedDataParallel? #170

How to use l2l.algorithms.MAML correctly with nn.DistributedDataParallel? #170

AyanamiReiFan commented Aug 13, 2020

seba-1511 commented Aug 15, 2020

AyanamiReiFan commented Aug 17, 2020

AyanamiReiFan commented Aug 17, 2020

janbolle commented Aug 25, 2020

AyanamiReiFan commented Aug 27, 2020

Kulbear commented Sep 3, 2020

janbolle commented Sep 5, 2020

Kulbear commented Sep 5, 2020

seba-1511 commented Oct 10, 2020

zhaozj89 commented Nov 14, 2020 •

edited

Loading

seba-1511 commented Nov 14, 2020

brando90 commented Feb 23, 2021

SungFeng-Huang commented Apr 14, 2021

How to use l2l.algorithms.MAML correctly with nn.DistributedDataParallel? #170

How to use l2l.algorithms.MAML correctly with nn.DistributedDataParallel? #170

Comments

AyanamiReiFan commented Aug 13, 2020

seba-1511 commented Aug 15, 2020

AyanamiReiFan commented Aug 17, 2020

AyanamiReiFan commented Aug 17, 2020

janbolle commented Aug 25, 2020

AyanamiReiFan commented Aug 27, 2020

Kulbear commented Sep 3, 2020

janbolle commented Sep 5, 2020

Kulbear commented Sep 5, 2020

seba-1511 commented Oct 10, 2020

zhaozj89 commented Nov 14, 2020 • edited Loading

seba-1511 commented Nov 14, 2020

brando90 commented Feb 23, 2021

SungFeng-Huang commented Apr 14, 2021

zhaozj89 commented Nov 14, 2020 •

edited

Loading