Get wrong results when running random walk on gpu #43

EricGz · 2020-01-04T03:10:47Z

Hi, I got an issue when running random walk on gpu. Plz help!
The demo below can reproduce the issue.

import torch
from torch_cluster import random_walk

device = 'cpu'
# device = 'cuda:0'

num_nodes = 3
walk_length = 3
p = 1
q = 1
edge_index = torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]]).to(device)
subset = torch.arange(num_nodes, device=edge_index.device)

rw = random_walk(edge_index[0], edge_index[1], subset,
                         walk_length, p, q, num_nodes)
print(rw)

There are three nodes and two edges in the graph. When I ran this code on cpu, I got the following results:

tensor([[0, 1, 0, 1],
        [1, 0, 1, 0],
        [2, 1, 2, 1]])

However, when I ran this code on gpu, the results became:

tensor([[-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1]], device='cuda:0')

Do you have any idea about it?

The text was updated successfully, but these errors were encountered:

rusty1s · 2020-01-04T08:09:51Z

Thanks for reporting. I will look into it. I guess python setup.py test also fails for you?

EricGz · 2020-01-04T14:04:50Z

Thank you for looking into it! You are right. The test failed.

rusty1s · 2020-01-04T14:34:01Z

Do all GPU tests fail?

EricGz · 2020-01-04T15:13:18Z

Yes, I think so. 55 failed and 56 passed. All the failed ones are GPU tests.

rusty1s · 2020-01-04T15:15:07Z

Ok, so this is not a problem with the random walk function, but the installation of torch-cluster. Can you post the log of

rm -rf build && python setup.py install

EricGz · 2020-01-04T16:49:04Z

Here's the log. log.txt

EricGz · 2020-01-06T06:26:54Z

It seems the whole installation goes fine. However, I still get wrong results running random walk on GPU. Do you have any idea what went wrong?

rusty1s · 2020-01-06T18:24:35Z

Unfortunately no :( Logs look okay to me. Maybe you have multiple versions installed where one installation failed? You can try removing torch-cluster repeatedly and install again.

EricGz · 2020-01-14T08:02:00Z

Hi, @rusty1s, thanks for your timely reply. I tried your suggestion, but the problem is still unsolved.

I tried some other tests and got more information about this error. When I used the GPU version of scatter_max and scatter_min in package torch_scatter, I met this error again, and the interesting thing is that the GPU version of functions' like scatter_add and scatter_mean worked fine.

Maybe there's something common about scatter_max and random_walk that caused the error?

P.S. Here's the test results of scatter_max and scatter_add

import torch
from torch_scatter import *

# device = 'cpu'
device = 'cuda:1'

src = torch.tensor([[1., 1.], [1., 1.], [4., 2.], [2., 4.]]).to(device)
index = torch.tensor([0, 0, 1, 1]).to(device)
index = index.view(-1,1).repeat(1,src.size()[1])

res1, _ = scatter_max(src, index, dim=0, fill_value=1.)
res2 = scatter_add(src, index, dim=0, dim_size=2, fill_value=0.)

print(res1)
print(res2)

The results are

tensor([[1., 1.],
        [1., 1.]], device='cuda:1')
tensor([[2., 2.],
        [6., 6.]], device='cuda:1')

I tried to debug it and I found that line 13 func(src, index, out, arg, dim) of max.py did not change the variable out at all. Do you have any clue about what caused the problem?

rusty1s · 2020-01-14T08:26:56Z

Yeah, those are the functions that call our own kernel implementations. It seems that there is something wrong with you GPU setup in conjunction with the provided cuda code.

github-actions · 2021-09-16T06:07:35Z

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

github-actions bot added the stale label Sep 16, 2021

github-actions bot closed this as completed Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get wrong results when running random walk on gpu #43

Get wrong results when running random walk on gpu #43

EricGz commented Jan 4, 2020

rusty1s commented Jan 4, 2020

EricGz commented Jan 4, 2020

rusty1s commented Jan 4, 2020

EricGz commented Jan 4, 2020

rusty1s commented Jan 4, 2020

EricGz commented Jan 4, 2020

EricGz commented Jan 6, 2020

rusty1s commented Jan 6, 2020

EricGz commented Jan 14, 2020

rusty1s commented Jan 14, 2020

github-actions bot commented Sep 16, 2021

Get wrong results when running random walk on gpu #43

Get wrong results when running random walk on gpu #43

Comments

EricGz commented Jan 4, 2020

rusty1s commented Jan 4, 2020

EricGz commented Jan 4, 2020

rusty1s commented Jan 4, 2020

EricGz commented Jan 4, 2020

rusty1s commented Jan 4, 2020

EricGz commented Jan 4, 2020

EricGz commented Jan 6, 2020

rusty1s commented Jan 6, 2020

EricGz commented Jan 14, 2020

rusty1s commented Jan 14, 2020

github-actions bot commented Sep 16, 2021