-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA kernel failed : an illegal memory access was encountered #21
Comments
It seems a problem of memory overflow.
|
Hi, sorry to bother you again. Each point cloud in my dataset always consists of 20,000 points. I've tried your suggestion out, and have increased/decreased those macros. I found that with different macros, the number of samples passes before failing is actually different! For example, with original settings, I can get 8 samples to pass the test script while increasing the numbers by 16 times, I can only get 3 samples to pass. It breaks after either one of the case (both after sync): // // for each fragment, find its primary
int *cuda_primary_absorb_fragment_idx; // array for saving the fragment idxs
int *cuda_primary_absorb_fragment_cnt; // array for saving the fragment nums
cudaMalloc((void**)&cuda_primary_absorb_fragment_idx, primary_num * MAX_PER_PRIMARY_ABSORB_FRAGMENT_NUM * sizeof(int) + sizeof(int));
cudaMalloc((void**)&cuda_primary_absorb_fragment_cnt, primary_num * sizeof(int) + sizeof(int));
printf("fragment_num = %d\n", fragment_num); // I've printed out fragment number, which is mostly 10~40, never 0 on my dataset.
if (fragment_num != 0) {
fragment_find_primary_<<<int(DIVUP(fragment_num, MAX_THREADS_PER_BLOCK)), (int)MAX_THREADS_PER_BLOCK>>>(
primary_num, cuda_primary_offsets, cuda_primary_centers,
fragment_num, cuda_fragment_offsets, cuda_fragment_centers,
cuda_primary_absorb_fragment_idx, cuda_primary_absorb_fragment_cnt);
}
err = cudaGetLastError();
if (cudaSuccess != err) {
fprintf(stderr, "before sync CUDA kernel failed : %s\n", cudaGetErrorString(err)); // This never fails in my case.
exit(-1);
}
cudaDeviceSynchronize();
err = cudaGetLastError();
if (cudaSuccess != err) {
fprintf(stderr, " after sync CUDA kernel failed : %s\n", cudaGetErrorString(err)); // However, this fails!
exit(-1);
} // concatenate fragments belonging to the same primary
int *cuda_concat_idxs;
int *cuda_concat_point_num;
cudaMalloc((void**)&cuda_concat_idxs, primary_num * MAX_PER_PRIMARY_ABSORB_POINT_NUM * 2 * sizeof(int) + sizeof(int));
cudaMalloc((void**)&cuda_concat_point_num, primary_num * sizeof(int) + sizeof(int));
assert(primary_num <= MAX_PRIMARY_NUM);
concat_fragments_<<<primary_num, (int)1>>>(
cuda_fragment_idxs, cuda_fragment_offsets,
cuda_primary_idxs, cuda_primary_offsets,
cuda_primary_absorb_fragment_idx, cuda_primary_absorb_fragment_cnt,
cuda_concat_idxs, cuda_concat_point_num,
primary_num);
err = cudaGetLastError();
if (cudaSuccess != err) {
fprintf(stderr, "before sync 2 CUDA kernel failed : %s\n", cudaGetErrorString(err));
exit(-1);
}
cudaDeviceSynchronize();
err = cudaGetLastError();
if (cudaSuccess != err) {
fprintf(stderr, "after sync 2 CUDA kernel failed : %s\n", cudaGetErrorString(err));
exit(-1);
} |
@HelinXu Hi, I have the same problem as you, How did you solve it? |
@Dragon-Vae Hi, I am on a different project and have not fully understood this behavior yet. I'll let you know if I figure it out :) |
@HelinXu OK, thank you for your reply |
@outsidercsy Hi, I also encountered an illegal memory access issue, will #32 be the trick? |
@eamonn-zh I encountered the same question,and I found that when I change the batch size to 4 instead of 1,there is no problem. So I think maybe it's not because of memory overflow? |
Hi! Thanks for the great work.
I'm running HAIS on my own dataset, and have modified
class_numpoint_mean_dict
andclass_radius_mean
accordingly. The problem is, I get CUDA kernel failed: an illegal memory access was encountered when testing.I traced the problem by inserting
cudaGetLastError()
inhiererchical_aggregation.cu
, and found the code breaks around the following lines:Do you have any suggestions?
The text was updated successfully, but these errors were encountered: