Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour in radius_kernel.cu #11

Closed
matherm opened this issue Jan 23, 2019 · 5 comments
Closed

Strange behaviour in radius_kernel.cu #11

matherm opened this issue Jan 23, 2019 · 5 comments

Comments

@matherm
Copy link

matherm commented Jan 23, 2019

Hi,

I face a very strange behaviour with the radius_kernel.cu on MASTER causing different results on CPU and GPU.
My Test-Scenario is as follows:

import numpy as np 
from torch_geometric.nn import radius
import torch

def grid3d(m, dtype=np.float32):
    x_ = np.linspace(-1, 1, m)
    y_ = np.linspace(-1, 1, m)
    z_ = np.linspace(-1, 1, m)
    x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
    P = np.array([x,y,z]).T.reshape(-1, 3)
    return P


# transform to pytorch
# transform to pytorch
grid = torch.from_numpy(grid3d(100)).float()
centroids = torch.from_numpy(np.array([[0.5,0.5,0.5],
                                       [0.5,1.0,5.0],
                                       [1.0,1.0,1.0],
                                       [0.5,1.0,0.0]])).float()

batch_grid = torch.from_numpy(np.zeros(grid.shape[0])).long()
batch_centroids = torch.from_numpy(np.zeros(centroids.shape[0])).long()

# compute radius search
edge_index = radius(grid, centroids, 0.5,  batch_grid, batch_centroids, max_num_neighbors=32)  
retrieved_points = grid[edge_index[1].unique()]
returned_centroids = centroids[edge_index[0].unique()]

# tp numpy
retrieved_points_cpu = retrieved_points.cpu().numpy()
returned_centroids_cpu = returned_centroids.cpu().numpy()


# compute radius search
device = "cuda"
edge_index = radius(grid.to(device), centroids.to(device), 0.5,  batch_grid.to(device), batch_centroids.to(device), max_num_neighbors=32)  
retrieved_points = grid[edge_index[1].unique()]
returned_centroids = centroids[edge_index[0].unique()]

# tp numpy
retrieved_points_gpu = retrieved_points.cpu().numpy()
returned_centroids_gpu = returned_centroids.cpu().numpy()


assert np.allclose(retrieved_points_cpu, retrieved_points_gpu)
assert np.allclose(returned_centroids_cpu, returned_centroids_gpu)

I searched a lot and it seems that the inner loop of radius_kernel.cu runs too often.
This results in too many comparisons and some points are retrieved multiple times.

Do you have an idea what causes the following debug-output?

  for (ptrdiff_t n_y = start_idx_y + idx; n_y < end_idx_y; n_y += THREADS) {
    size_t count = 0;
    printf("Before ny %d: batch_idx: %d \n idx: %d \n start_idx_x: %d \n end_idx_x: %d  \n start_idx_y: %d  \n end_idx_y: %d \n", n_y, batch_idx, idx, start_idx_x, end_idx_x, start_idx_y, end_idx_y);
    for (ptrdiff_t n_x = start_idx_x ; n_x < end_idx_x; n_x++) {
      printf("WTF?? nx %d: batch_idx: %d \n idx: %d \n start_idx_x: %d \n end_idx_x: %d  \n start_idx_y: %d  \n end_idx_y: %d \n", n_x, batch_idx, idx, start_idx_x, end_idx_x, start_idx_y, end_idx_y);
      /**
       *  > out.txt
       *  WTF??  nx 1: batch_idx: 0 
          idx: 0 
          start_idx_x: 0 
          end_idx_x: 0  
          start_idx_y: 0  
          end_idx_y: 0 
       */ 
     ...

This printf should never occur or what am I overseeing?

EDIT

My failure was that I tried to printf with %d specifier which is wrong. %ld is the right for ptrdiff_t datatype (assuming a long cast e.g. (long) start_idx_x).
The correct printf is:
printf("WTF? nx %ld: batch_idx: %ld \n idx: %ld \n start_idx_x: %ld \n end_idx_x: %ld \n start_idx_y: %ld \n end_idx_y: %ld \n", (long) n_x, (long) batch_idx, (long) idx, (long) start_idx_x, (long) end_idx_x, (long) start_idx_y, (long) end_idx_y);

Anyway, the function still returns different results for CPU and GPU. I updated the test-case.

Best regards,
Matthias

@rusty1s
Copy link
Owner

rusty1s commented Jan 24, 2019

Thanks, I will check this.

@rusty1s
Copy link
Owner

rusty1s commented Jan 24, 2019

I have identified the error and fixed it. The error occurred in the GPU version in case there were more potential neighbors than max_neighbors allowed. Sorry for the inconvenience.

@rusty1s rusty1s closed this as completed Jan 24, 2019
@matherm
Copy link
Author

matherm commented Jan 24, 2019

Yeah, break vs. continue :-)

Thanks.

@matherm
Copy link
Author

matherm commented Apr 4, 2019

Since the last update this test fails again in my environment. I checked the diff and could not find something suspicous (I attached the file).
test_radius_search.txt

I get the following error:
Line 60: assert np.allclose(retrieved_points_cpu, retrieved_points_gpu)
>> ValueError: operands could not be broadcast together with shapes (128,3) (125,3)

Can you check please? Thanks in advance!

@rusty1s
Copy link
Owner

rusty1s commented Apr 4, 2019

Try increasing the max_num_neighbors argument. This works fine when setting it to 100.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants