-
Notifications
You must be signed in to change notification settings - Fork 444
DeviceRadixSort::SortPairs fails to sort array #64
Comments
Hrm. Seems to work just fine for me. What OS, host, and CUDA compilers are Compiling: [dumerrill@dt06 removeme]$ nvcc -arch=sm_52 -std=c++11 -O3 main.cpp For 100M items: [dumerrill@dt06 removeme]$ ./a.out 100000000 Array length 100001408 (381MiB) With 40M: [dumerrill@dt06 removeme]$ ./a.out 40000000 Array length 40000896 (152MiB) On Tue, Nov 15, 2016 at 4:05 AM, daktfi notifications@github.com wrote:
|
I found the problem: it is necessary to check cudaPeekAtLastError()/cudaGetLastError() after sort. It seems sorting requires additional amount of videomemory beside allocated buffers and temp_storage (roughly again as much as keys size doubled). Mind the row with device specs in the original post: there were only 904 Mb of free memory. The setup is: |
The implementation does peek errors after each kernel launch, e.g., However, as you mention, this doesn't capture all runtime errors: others On Tue, Nov 15, 2016 at 12:32 PM, daktfi notifications@github.com wrote:
|
Thanks for advice on debug, this'll be quite useful. |
The sorting won't fail with a memory allocation error. If that's the error you're getting from CUB, then program was already failed and simply returning a latent error from an earlier failed attempt to allocate memory that wasn't cleared.
CUB does no allocation whatsoever. Everything its sorting needs is bundled up in the temp storage, which you can allocate (conservatively, even, using an upper bound of problem size, if that's available) way in advance. In general, CUDA device memory allocation is a stream-blocking, host-synchronizing event, and CUB doesn't want to impose that upon an application right in the middle of what the application is presuming to be an asynchornous stream computation. |
When I try to sort array of 40m (roughly) pairs or longer it simply does not sort them without reporting any errors.
Device is:
Device 0: GeForce GTX 950 (PTX version 520, SM520, 6 SMs, 904 free / 1995 total MB physmem, 105.760 GB/s @ 3305000 kHz mem clock, ECC off)
cub version 1.5.5 (latest at the moment).
Sample project to reproduce the problem is attached
check_dev_radix.zip
When run with increasingly larger size of array to sort it eventually fails to sort it.
As I understand, the critical size depends on amount of free RAM. The problem is - no error reported.
The text was updated successfully, but these errors were encountered: