Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: zero size memory allocation when calling 'cuda_partition' #126

Closed
hyeonjang opened this issue Jan 28, 2022 · 1 comment
Closed

Error: zero size memory allocation when calling 'cuda_partition' #126

hyeonjang opened this issue Jan 28, 2022 · 1 comment

Comments

@hyeonjang
Copy link

hyeonjang commented Jan 28, 2022

Hello, I am a big fan of this lib.

During using CUDAArray, I encounter the error to allocate the memory with size zero.
With debugging, I found that It occurs from calling the function "cuda_partition"

enoki/src/cuda/horiz.cu

Lines 81 to 103 in 2a18afa

size_t clamped_size = std::min(size, (size_t) 511);
uint32_t *counts_h = (uint32_t *) cuda_host_malloc(sizeof(uint32_t) * (clamped_size + 1));
void **ptrs_unique_h = (void **) cuda_host_malloc(sizeof(void *) * clamped_size);
cuda_check(cudaMemcpyAsync(counts_h, counts, (clamped_size + 1) * sizeof(uint32_t), cudaMemcpyDeviceToHost));
cuda_check(cudaMemcpyAsync(ptrs_unique_h, ptrs_unique, clamped_size * sizeof(void *), cudaMemcpyDeviceToHost));
cuda_check(cudaDeviceSynchronize());
size_t num_runs_h = (size_t) counts_h[0];
if (num_runs_h > clamped_size) {
cuda_host_free(counts_h);
cuda_host_free(ptrs_unique_h);
counts_h = (uint32_t *) cuda_host_malloc(sizeof(uint32_t) * (num_runs_h + 1));
ptrs_unique_h = (void **) cuda_host_malloc(sizeof(void *) * num_runs_h);
cuda_check(cudaMemcpyAsync(counts_h, counts, num_runs_h * sizeof(uint32_t), cudaMemcpyDeviceToHost));
cuda_check(cudaMemcpyAsync(ptrs_unique_h, ptrs_unique, num_runs_h * sizeof(void *), cudaMemcpyDeviceToHost));
cuda_check(cudaDeviceSynchronize());
}

I think the line 99 should be changed as like the line 86.
cuda_check(cudaMemcpyAsync(counts_h, counts, (num_runs_h+1) * sizeof(uint32_t), cudaMemcpyDeviceToHost));

Or it will be fixed on the next version release?

@Speierers
Copy link
Member

This will be fixed in the upcoming release and therefore we won't take the time to fix it in the current codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants