-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to parallelize computations across GPUs? #27
Comments
Hi @KeAWang , I know that @benoitmartin88 and @bcharlier worked on it for the Deformetrica software ~6 months ago and got it to work (I think). They may be busy at the moment (as we're working on the R backends with deadlines on Tuesday + Wesnesday), but will probably be able to answer your question. Best regards, |
Hello, @bcharlier told me about the issue sometimes ago and I forgot to write down the answer I got. I think there is a confusion about the parallelization/asynchronous features at the language level. It is true that pytorch is completely capable to distribute asynchronously workloads across multiple GPU. The solution is to instantiate tensors directly on multiple gpu and to use the asychronous/multiprocessing python facilities like multiprocess/async. I'll provide a minimal example. |
Hi @KeAWang, @bcharlier and I took a look at your issue. Here is a working example: import torch
from pykeops.torch import Genred
def work(d):
_my_conv = Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
_x = torch.randn(1000000, 3).to(d) #, device="cuda:" + str(d))
_y = torch.randn(2000000, 3).to(d)
return _my_conv(_x, _y, device_id=d).cpu()
if __name__ == '__main__':
# an intempt of asynchroneous call to keops
torch.multiprocessing.set_start_method("spawn", force=True)
pool = mp.Pool(processes=3)
res = pool.map(work, range(3)) Please note that using python's multiprocessing will cost you the instantiation of spawed python processes. |
Thank you so much @benoitmartin88 and @fradav! I will try this out. |
The code you provided works great! However, it seems that PyTorch is unable to maintain the autograd graph across processes. Sadly, this means that using multiple GPUs with PyKeops through multiprocessing means you won't be able to use the automatic differentiation in PyTorch... |
Hi @KeAWang, The multiprocessing package is used in Deformetrica to dispatch the computation load on several GPUs. Deformetrica needs the gradient to perform the estimation process... Well, I am not sure, but I would bet that it is still possible (with some work) to keep the autograd graph alive with the map function. @benoitmartin88 can you confirm ? Best, b. |
Hi @KeAWang, As @bcharlier mentioned we do use Pytorch tensors in a multiprocessing context within Deformetrica. I hope this helps. |
Hi,
I'm trying to parallel computations on multiple GPUs with Keops, but it seems like the computation happens sequantially across the GPUs. What I'm doing is:
However, according to the GPU usage in
nvidia-smi
, the matrix multiplications are happening sequentially since only one GPU has 100% utilization at a time.On the other hand, in pytorch for example, the following will dispatch the computations in parallel and all GPUs will simultaneously have high usage:
Is there anyway to do keops computations on each GPU in parallel in the same way?
The text was updated successfully, but these errors were encountered: