Strange behaviour in radius_kernel.cu #11

matherm · 2019-01-23T17:11:21Z

Hi,

I face a very strange behaviour with the radius_kernel.cu on MASTER causing different results on CPU and GPU.
My Test-Scenario is as follows:

import numpy as np 
from torch_geometric.nn import radius
import torch

def grid3d(m, dtype=np.float32):
    x_ = np.linspace(-1, 1, m)
    y_ = np.linspace(-1, 1, m)
    z_ = np.linspace(-1, 1, m)
    x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
    P = np.array([x,y,z]).T.reshape(-1, 3)
    return P


# transform to pytorch
# transform to pytorch
grid = torch.from_numpy(grid3d(100)).float()
centroids = torch.from_numpy(np.array([[0.5,0.5,0.5],
                                       [0.5,1.0,5.0],
                                       [1.0,1.0,1.0],
                                       [0.5,1.0,0.0]])).float()

batch_grid = torch.from_numpy(np.zeros(grid.shape[0])).long()
batch_centroids = torch.from_numpy(np.zeros(centroids.shape[0])).long()

# compute radius search
edge_index = radius(grid, centroids, 0.5,  batch_grid, batch_centroids, max_num_neighbors=32)  
retrieved_points = grid[edge_index[1].unique()]
returned_centroids = centroids[edge_index[0].unique()]

# tp numpy
retrieved_points_cpu = retrieved_points.cpu().numpy()
returned_centroids_cpu = returned_centroids.cpu().numpy()


# compute radius search
device = "cuda"
edge_index = radius(grid.to(device), centroids.to(device), 0.5,  batch_grid.to(device), batch_centroids.to(device), max_num_neighbors=32)  
retrieved_points = grid[edge_index[1].unique()]
returned_centroids = centroids[edge_index[0].unique()]

# tp numpy
retrieved_points_gpu = retrieved_points.cpu().numpy()
returned_centroids_gpu = returned_centroids.cpu().numpy()


assert np.allclose(retrieved_points_cpu, retrieved_points_gpu)
assert np.allclose(returned_centroids_cpu, returned_centroids_gpu)

I searched a lot and it seems that the inner loop of radius_kernel.cu runs too often.
This results in too many comparisons and some points are retrieved multiple times.

Do you have an idea what causes the following debug-output?

  for (ptrdiff_t n_y = start_idx_y + idx; n_y < end_idx_y; n_y += THREADS) {
    size_t count = 0;
    printf("Before ny %d: batch_idx: %d \n idx: %d \n start_idx_x: %d \n end_idx_x: %d  \n start_idx_y: %d  \n end_idx_y: %d \n", n_y, batch_idx, idx, start_idx_x, end_idx_x, start_idx_y, end_idx_y);
    for (ptrdiff_t n_x = start_idx_x ; n_x < end_idx_x; n_x++) {
      printf("WTF?? nx %d: batch_idx: %d \n idx: %d \n start_idx_x: %d \n end_idx_x: %d  \n start_idx_y: %d  \n end_idx_y: %d \n", n_x, batch_idx, idx, start_idx_x, end_idx_x, start_idx_y, end_idx_y);
      /**
       *  > out.txt
       *  WTF??  nx 1: batch_idx: 0 
          idx: 0 
          start_idx_x: 0 
          end_idx_x: 0  
          start_idx_y: 0  
          end_idx_y: 0 
       */ 
     ...

This printf should never occur or what am I overseeing?

EDIT

My failure was that I tried to printf with %d specifier which is wrong. %ld is the right for ptrdiff_t datatype (assuming a long cast e.g. (long) start_idx_x).
The correct printf is:
printf("WTF? nx %ld: batch_idx: %ld \n idx: %ld \n start_idx_x: %ld \n end_idx_x: %ld \n start_idx_y: %ld \n end_idx_y: %ld \n", (long) n_x, (long) batch_idx, (long) idx, (long) start_idx_x, (long) end_idx_x, (long) start_idx_y, (long) end_idx_y);

Anyway, the function still returns different results for CPU and GPU. I updated the test-case.

Best regards,
Matthias

The text was updated successfully, but these errors were encountered:

rusty1s · 2019-01-24T09:41:59Z

Thanks, I will check this.

rusty1s · 2019-01-24T13:22:40Z

I have identified the error and fixed it. The error occurred in the GPU version in case there were more potential neighbors than max_neighbors allowed. Sorry for the inconvenience.

matherm · 2019-01-24T15:13:48Z

Yeah, break vs. continue :-)

Thanks.

matherm · 2019-04-04T07:52:55Z

Since the last update this test fails again in my environment. I checked the diff and could not find something suspicous (I attached the file).
test_radius_search.txt

I get the following error:
Line 60: assert np.allclose(retrieved_points_cpu, retrieved_points_gpu)
>> ValueError: operands could not be broadcast together with shapes (128,3) (125,3)

Can you check please? Thanks in advance!

rusty1s · 2019-04-04T08:10:25Z

Try increasing the max_num_neighbors argument. This works fine when setting it to 100.

rusty1s closed this as completed Jan 24, 2019

sirbardo mentioned this issue Mar 6, 2019

Issue with KNN on cuda:1 #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behaviour in radius_kernel.cu #11

Strange behaviour in radius_kernel.cu #11

matherm commented Jan 23, 2019 •

edited

rusty1s commented Jan 24, 2019

rusty1s commented Jan 24, 2019 •

edited

matherm commented Jan 24, 2019

matherm commented Apr 4, 2019 •

edited

rusty1s commented Apr 4, 2019

Strange behaviour in radius_kernel.cu #11

Strange behaviour in radius_kernel.cu #11

Comments

matherm commented Jan 23, 2019 • edited

EDIT

rusty1s commented Jan 24, 2019

rusty1s commented Jan 24, 2019 • edited

matherm commented Jan 24, 2019

matherm commented Apr 4, 2019 • edited

rusty1s commented Apr 4, 2019

matherm commented Jan 23, 2019 •

edited

rusty1s commented Jan 24, 2019 •

edited

matherm commented Apr 4, 2019 •

edited