Issue when discrepancy between available CUDA device at build time / runtime #39

timlacroix · 2019-12-09T16:05:52Z

Hey, first off, thanks for the library !

I have had some weird issues today when trying to use a kernel on 'cuda:1' when the kernel was built on a machine with only 2 gpus. I run into this because I use a shared home filesystem (and hence shared .cache folder) on a cluster where I have access to machines with various number of GPUS.

Here is how to reproduce, on a machine with 2 GPUs:

test.py :

import torch
from pykeops.torch import LazyTensor

def test(data):
	neigh_state = LazyTensor(data[None, :, :])
	state = LazyTensor(data[:, None, :])
	all_distances = ((neigh_state - state) ** 2).sum(dim=2)
	return (- all_distances).logsumexp(dim=1)

tensor = torch.randn(10,128).to('cuda:0')
print(torch.cuda.device_count())
test(tensor)

run CUDA_VISIBLE_DEVICES=0 python test.py. This should build a kernel.
then change 'cuda:0' to 'cuda:1' in test.py
run python test.py.

This fails with error :
invalid Gpu device number. If the number of available Gpus is > 12, add required lines at the end of function SetGpuProps and recompile.

Recompiling is not a great option for me, as I might run different experiments using the same kernel but on machines with different number of available gpus.

The text was updated successfully, but these errors were encountered:

bcharlier · 2019-12-12T14:52:11Z

Hi @timlacroix ,

when you set CUDA_VISIBLE_DEVICES=0, the nvidia driver expose only the GPU with id=0. So at compilation time, keops only detect the gpu with id=0. This is thus the expected behavior.

Can you try to call python test.py without setting the env variable CUDA_VISIBLE_DEVICE. It should work as you expected, as you already ask keops to run on gpu 0 through to the .to('cuda:0')...

For instance:

import torch
from pykeops.torch import LazyTensor

def test(data):
	neigh_state = LazyTensor(data[None, :, :])
	state = LazyTensor(data[:, None, :])
	all_distances = ((neigh_state - state) ** 2).sum(dim=2)
	return (- all_distances).logsumexp(dim=1)

tensor = torch.randn(10,128).to('cuda:0')
test(tensor) # should run on gpu 0

tensor1 = torch.randn(10,128).to('cuda:1')
print(torch.cuda.device_count())
test(tensor) # should run on gpu 1... without recompiling

timlacroix · 2019-12-12T15:02:08Z

hi, I used CUDA_VISIBLE_DEVICES here to make the problem reproducible.

My question is in a set-up where development (and thus compilation) happens on a machine with N gpus and test happens on a machine with M gpus, but sharing the same compilation cache.

Couldn't the number of GPUs available at compile time be used in the compiled code hash ? This way, changing the number of GPUs would just force a rebuild, but wouldn't raise an error.

bcharlier · 2019-12-12T15:06:50Z

Maybe @joanglaunes know that better than me, but I think it will not be possible to make the same shared lib working on 2 different system. Why don't you define 2 separated cache folder ?

bcharlier · 2019-12-12T15:09:28Z

hi, I used CUDA_VISIBLE_DEVICES here to make the problem reproducible.

My question is in a set-up where development (and thus compilation) happens on a machine with N gpus and test happens on a machine with M gpus, but sharing the same compilation cache.

Couldn't the number of GPUs available at compile time be used in the compiled code hash ? This way, changing the number of GPUs would just force a rebuild, but wouldn't raise an error.

ok, a quick solution could be: include the number of gpu and their respected arch in the name of the cache folder. So when you call your code from different node, it will get the sharedlib from the right cache dir

timlacroix · 2019-12-12T15:11:45Z

@bcharlier yes, if that's possible that would be great :)

bcharlier · 2019-12-12T15:15:32Z

is the hostname unique in your case? I mean, is one of those output different on the various nodes :

import platform
print(platform.node())

import socket
print(socket.gethostname())

timlacroix · 2019-12-12T16:00:54Z

both are different on various nodes.
(However, I might want to vary the number of GPUs available at runtime on the same machine, for exemple while developing, I have two things running on 1 GPU, then at some point I want to try 1 thing on 2 GPUs ...)

I don't know if including the hostname is a good idea. This means using a separate cache folder per machine which the user can do if necessary by just using a random cache at runtime. In my case I would be happy to re-use the same cache for various nodes of the cluster.

joanglaunes · 2019-12-12T16:16:22Z

Hello @timlacroix ,
In fact the technical problem for us is that detection of Gpus and their properties is currently done at compilation in the Cmake scripts that are launched after the Python code detected there is a need for compilation.
So as @bcharlier is suggesting the easiest solution for us is to include hostname and node (+ the content of CUDA_VISIBLE_DEVICES maybe) in the hash code, because this is easy to do with Python.
However ok, maybe including the Gpu properties in the hash code is not so difficult, I guess it can be done with GPUtils...

gdurif closed this as completed in 726eb51 Jan 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when discrepancy between available CUDA device at build time / runtime #39

Issue when discrepancy between available CUDA device at build time / runtime #39

timlacroix commented Dec 9, 2019

bcharlier commented Dec 12, 2019

timlacroix commented Dec 12, 2019 •

edited

Loading

bcharlier commented Dec 12, 2019

bcharlier commented Dec 12, 2019

timlacroix commented Dec 12, 2019

bcharlier commented Dec 12, 2019 •

edited

Loading

timlacroix commented Dec 12, 2019

joanglaunes commented Dec 12, 2019

Issue when discrepancy between available CUDA device at build time / runtime #39

Issue when discrepancy between available CUDA device at build time / runtime #39

Comments

timlacroix commented Dec 9, 2019

bcharlier commented Dec 12, 2019

timlacroix commented Dec 12, 2019 • edited Loading

bcharlier commented Dec 12, 2019

bcharlier commented Dec 12, 2019

timlacroix commented Dec 12, 2019

bcharlier commented Dec 12, 2019 • edited Loading

timlacroix commented Dec 12, 2019

joanglaunes commented Dec 12, 2019

timlacroix commented Dec 12, 2019 •

edited

Loading

bcharlier commented Dec 12, 2019 •

edited

Loading