Multi-GPU support for the PyTorch bindings? #63

yongsiang-fb · 2022-03-14T15:35:01Z

Hi,

I found that the following code would fail:

import torch as th
import tinycudann as tcnn
config = {
        "otype": "FullyFusedMLP",
        "activation": "ReLU",
        "output_activation": "None",
        "n_neurons": 64,
        "n_hidden_layers": 1
        }
net = tcnn.Network(16, 16, config)
net = net.to("cuda:1")
out = net(th.rand((256, 16), device="cuda:1"))

It seems the module does not have proper support to run on a different gpu even if we have called .to(device). Is it possible to fix this?

In addition, I also tried using torch.nn.DataParallel together with the hash encoding & tiny mlp. They seem to fail in such use cases. Is it possible to fix this? Thanks a lot!

The text was updated successfully, but these errors were encountered:

Tom94 · 2022-03-14T16:21:38Z

Hi there, yes, unfortunately tiny-cuda-nn does not support multi-GPU operation as of now. This is something that'd be cool to have in the future, but currently is not a high priority.

I'm going to leave this issue open to serve as a TODO marker.

Cheers!

yongsiang-fb · 2022-03-14T18:16:47Z

@Tom94 Thanks for the quick response! Hopefully it could be implemented one day. But I think even if we do not support multi-GPU for now, it should still be possible to support the single GPU case where both input & the network is on cuda:1? In particular, I think we probably just need to use a CUDAGuard to change the default GPU to be the same as the one net.params is using?

I am guessing that the error was caused by a mismatch between the default GPU and the inputs/net.params' GPU. I even have some hope that maybe th.nn.DataParallel would also work as long as there is CUDAGuard, but maybe that's just my wishful thinking. Would be great to hear about your thoughts. Thanks!

Tom94 · 2022-03-16T08:10:17Z

tcnn uses whichever CUDA device is "current" on the CPU thread, i.e. the device returned by cudaGetDevice. This device needs to remain the same across all calls into tcnn.

If CUDAGuard controls this (I haven't tried it myself), then yes, it should work the way you describe. If not, please let me know and I can add a tcnn-specific version of CUDAGuard that'll do the trick.

yongsiang-fb · 2022-03-16T13:21:30Z

Yes, you are absolutely right. It's "current" device instead of "default" device. CUDAGuard would store the original current device, and sets the current device to the specified device. After a CUDAGuard goes out of scope, it sets the current device back to the original current device.

MultiPath · 2022-07-11T23:29:58Z

Hi, do you have any plans on supporting multi-GPU training recently? Or do you have some hints about why the current code prevents using multiple GPUs with Pytorch distributed training? I think it will be super useful to have tinycudann used to train large-scale models with tiny MLPs. Thanks

xmk2222 mentioned this issue Mar 17, 2022

Distributed data parallel training ashawkey/torch-ngp#30

Closed

kwea123 mentioned this issue Jul 4, 2022

Multi GPU Support kwea123/ngp_pl#2

Closed

MultiPath mentioned this issue Jul 12, 2022

Multi GPU Training Support kwea123/ngp_pl#16

Closed

Tom94 mentioned this issue Jul 13, 2022

Multi GPU support #120

Merged

Tom94 closed this as completed in #120 Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU support for the PyTorch bindings? #63

Multi-GPU support for the PyTorch bindings? #63

yongsiang-fb commented Mar 14, 2022 •

edited

Tom94 commented Mar 14, 2022 •

edited

yongsiang-fb commented Mar 14, 2022

Tom94 commented Mar 16, 2022

yongsiang-fb commented Mar 16, 2022

MultiPath commented Jul 11, 2022 •

edited

Multi-GPU support for the PyTorch bindings? #63

Multi-GPU support for the PyTorch bindings? #63

Comments

yongsiang-fb commented Mar 14, 2022 • edited

Tom94 commented Mar 14, 2022 • edited

yongsiang-fb commented Mar 14, 2022

Tom94 commented Mar 16, 2022

yongsiang-fb commented Mar 16, 2022

MultiPath commented Jul 11, 2022 • edited

yongsiang-fb commented Mar 14, 2022 •

edited

Tom94 commented Mar 14, 2022 •

edited

MultiPath commented Jul 11, 2022 •

edited