CUDA kernel failed : an illegal memory access was encountered #21

HelinXu · 2022-02-27T05:12:57Z

Hi! Thanks for the great work.
I'm running HAIS on my own dataset, and have modified class_numpoint_mean_dict and class_radius_mean accordingly. The problem is, I get CUDA kernel failed: an illegal memory access was encountered when testing.
I traced the problem by inserting cudaGetLastError() in hiererchical_aggregation.cu, and found the code breaks around the following lines:

    assert(primary_num <= MAX_PRIMARY_NUM);
    concat_fragments_<<<primary_num, (int)1>>>(
        cuda_fragment_idxs, cuda_fragment_offsets,
        cuda_primary_idxs, cuda_primary_offsets,
        cuda_primary_absorb_fragment_idx, cuda_primary_absorb_fragment_cnt,
        cuda_concat_idxs, cuda_concat_point_num,
        primary_num);
    cudaDeviceSynchronize();

// got CUDA kernel failed : an illegal memory access was encountered here.

Do you have any suggestions?

The text was updated successfully, but these errors were encountered:

outsidercsy · 2022-03-09T02:32:22Z

It seems a problem of memory overflow.
If point clouds of your datasets are much denser or larger than ScanNet, I suggest to increase the values of these macros.

HAIS/lib/hais_ops/src/hierarchical_aggregation/hierarchical_aggregation.cu

Lines 8 to 11 in 73a6f59

    
           #define MAX_PRIMARY_NUM 1024 
        
           #define MAX_PER_PRIMARY_ABSORB_FRAGMENT_NUM  1024 
        
           #define INFINITY_DIS_SQUARE 10000 
        
           #define MAX_PER_PRIMARY_ABSORB_POINT_NUM 8192

HelinXu · 2022-03-20T07:53:38Z

Hi, sorry to bother you again.

Each point cloud in my dataset always consists of 20,000 points. I've tried your suggestion out, and have increased/decreased those macros. I found that with different macros, the number of samples passes before failing is actually different! For example, with original settings, I can get 8 samples to pass the test script while increasing the numbers by 16 times, I can only get 3 samples to pass. It breaks after either one of the case (both after sync):

    // // for each fragment, find its primary
    int *cuda_primary_absorb_fragment_idx; // array for saving the fragment idxs
    int *cuda_primary_absorb_fragment_cnt; // array for saving the fragment nums
    cudaMalloc((void**)&cuda_primary_absorb_fragment_idx, primary_num * MAX_PER_PRIMARY_ABSORB_FRAGMENT_NUM * sizeof(int) + sizeof(int));
    cudaMalloc((void**)&cuda_primary_absorb_fragment_cnt, primary_num * sizeof(int) + sizeof(int));
    printf("fragment_num = %d\n", fragment_num); // I've printed out fragment number, which is mostly 10~40, never 0 on my dataset.
    if (fragment_num != 0) {
        fragment_find_primary_<<<int(DIVUP(fragment_num, MAX_THREADS_PER_BLOCK)), (int)MAX_THREADS_PER_BLOCK>>>(
            primary_num, cuda_primary_offsets, cuda_primary_centers,
            fragment_num, cuda_fragment_offsets, cuda_fragment_centers,
            cuda_primary_absorb_fragment_idx, cuda_primary_absorb_fragment_cnt);
        }
    err  = cudaGetLastError();
    if (cudaSuccess != err) {
        fprintf(stderr, "before sync CUDA kernel failed : %s\n", cudaGetErrorString(err)); // This never fails in my case.
        exit(-1);
    }
    cudaDeviceSynchronize();
    err  = cudaGetLastError();
    if (cudaSuccess != err) {
        fprintf(stderr, " after sync CUDA kernel failed : %s\n", cudaGetErrorString(err)); // However, this fails!
        exit(-1);
    }

    // concatenate fragments belonging to the same primary
    int *cuda_concat_idxs;
    int *cuda_concat_point_num;
    cudaMalloc((void**)&cuda_concat_idxs, primary_num * MAX_PER_PRIMARY_ABSORB_POINT_NUM * 2 * sizeof(int) + sizeof(int));
    cudaMalloc((void**)&cuda_concat_point_num, primary_num *  sizeof(int) + sizeof(int));
    assert(primary_num <= MAX_PRIMARY_NUM);
    concat_fragments_<<<primary_num, (int)1>>>(
        cuda_fragment_idxs, cuda_fragment_offsets,
        cuda_primary_idxs, cuda_primary_offsets,
        cuda_primary_absorb_fragment_idx, cuda_primary_absorb_fragment_cnt,
        cuda_concat_idxs, cuda_concat_point_num,
        primary_num);
    err  = cudaGetLastError();
    if (cudaSuccess != err) {
        fprintf(stderr, "before sync 2 CUDA kernel failed : %s\n", cudaGetErrorString(err));
        exit(-1);
    }
    cudaDeviceSynchronize();
    err  = cudaGetLastError();
    if (cudaSuccess != err) {
        fprintf(stderr, "after sync 2 CUDA kernel failed : %s\n", cudaGetErrorString(err));
        exit(-1);
    }

Dragon-Vae · 2022-05-19T02:45:06Z

@HelinXu Hi, I have the same problem as you, How did you solve it?

HelinXu · 2022-05-19T02:53:18Z

@Dragon-Vae Hi, I am on a different project and have not fully understood this behavior yet. I'll let you know if I figure it out :)

Dragon-Vae · 2022-05-19T02:59:29Z

@HelinXu OK, thank you for your reply

eamonn-zh · 2022-08-05T23:07:26Z

@outsidercsy Hi, I also encountered an illegal memory access issue, will #32 be the trick?

hmax233 · 2022-10-10T03:54:06Z

@eamonn-zh I encountered the same question,and I found that when I change the batch size to 4 instead of 1,there is no problem. So I think maybe it's not because of memory overflow?
And do you figure out this question now?

meidachen mentioned this issue May 20, 2022

Train, validation, test split meidachen/STPLS3D#2

Closed

meidachen mentioned this issue Jun 20, 2022

CUDA kernel failed : an illegal memory access was encountered meidachen/STPLS3D#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA kernel failed : an illegal memory access was encountered #21

CUDA kernel failed : an illegal memory access was encountered #21

HelinXu commented Feb 27, 2022

outsidercsy commented Mar 9, 2022

HelinXu commented Mar 20, 2022

Dragon-Vae commented May 19, 2022

HelinXu commented May 19, 2022

Dragon-Vae commented May 19, 2022

eamonn-zh commented Aug 5, 2022

hmax233 commented Oct 10, 2022

CUDA kernel failed : an illegal memory access was encountered #21

CUDA kernel failed : an illegal memory access was encountered #21

Comments

HelinXu commented Feb 27, 2022

outsidercsy commented Mar 9, 2022

HelinXu commented Mar 20, 2022

Dragon-Vae commented May 19, 2022

HelinXu commented May 19, 2022

Dragon-Vae commented May 19, 2022

eamonn-zh commented Aug 5, 2022

hmax233 commented Oct 10, 2022