Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add view label to View bounds error message for CUDA backend #870

Closed
stanmoore1 opened this issue Jun 6, 2017 · 10 comments
Closed

Add view label to View bounds error message for CUDA backend #870

stanmoore1 opened this issue Jun 6, 2017 · 10 comments
Assignees
Labels
Enhancement Improve existing capability; will potentially require voting
Milestone

Comments

@stanmoore1
Copy link
Contributor

It would be really helpful for debugging if the the view label was added to a View bounds error on the device with the CUDA backend. Right now, all I get is:

:0: : block: [0,0,0], thread: [0,26,0] Assertion `View bounds error` failed.
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorAssert): device-side assert triggered ../../lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:122
Traceback functionality not available
@ibaned ibaned added the Enhancement Improve existing capability; will potentially require voting label Jun 6, 2017
@ibaned
Copy link
Contributor

ibaned commented Jun 6, 2017

Hmmm... one problem would be that View::label() returns a std::string right now, which is a no-go inside GPU code. The second problem being that even simple string appending without std::string or even strcpy becomes a real chore on the GPU. There may be ways around those obstacles, just noting that this is non-trivial to do.

@stanmoore1
Copy link
Contributor Author

Well, trying to find out which view is out of bounds without any hints from Kokkos is also non-trivial :)

@ibaned
Copy link
Contributor

ibaned commented Jun 6, 2017

yep, yep. I'll try to figure it out, its not impossible yet.

@stanmoore1
Copy link
Contributor Author

stanmoore1 commented Jun 6, 2017

This would be a non-issue if gdb could give me the line number in my functor where it is crashing, but the last line in my code that the stack trace shows is where I call the parallel_for, and then the stack trace goes into Kokkos_Parallel.hpp, which isn't helpful at all.

@stanmoore1
Copy link
Contributor Author

Using cuda-gdb only gives a single frame strack trace of Kokkos_Cuda_abort.hpp:75.

@stanmoore1
Copy link
Contributor Author

The view label is stored as a char array, not as std::string, so why not just copy that up to device memory and printf it out as a C-string? Seems pretty trivial to me...

@ibaned
Copy link
Contributor

ibaned commented Jun 7, 2017

It is actually already in device memory, for some views. Much of what I've been doing for the past day is looking for the cases of views that don't have a label, this includes Memory::Unmanaged and views that don't have Memory::Unmanaged but were constructed by being given a pointer. I think that I can detect all these edge cases safely inside GPU code...

Although printf technically works on device, we are right now calling the CUDA assertion fail handler. Its not clear to me whether the printf call is guaranteed to get its output to the user if it is immediately followed by an assertion fail. This is definitely something that is not guaranteed with regular printf on CPUs. What we know works is to build up a C-string in a buffer and pass that to the assertion failure handler; that is guaranteed to get you the output.

Getting to that char array without the aid of any of the pointers currently stored in structures in host memory requires walking backwards from the start of the user-writable portion of a shared allocation to access the hidden header. This is right now handled by code that can only be run on the host, so I need to add the equivalent code as a GPU-callable function. In addition, I had to confirm that all views have a pointer to the true start of the shared allocation. I was worried this wouldn't be the case for sub-views, but I think it is okay now.

So yes I've been thinking about this, yes I think it can be done safely in 90% of the cases, but no it really isn't trivial, in part because there is a whole zoo of different species of Kokkos::View, including user-defined ones that Kokkos doesn't even know about ahead of time.

@stanmoore1
Copy link
Contributor Author

I'm totally fine if this isn't 100% general and gives the old behavior (no label) for difficult edge cases.

@ibaned
Copy link
Contributor

ibaned commented Jun 7, 2017

yea, it has to do that (some views just don't have a label), but that still requires correctly detecting such edge cases. I basically can't let the behavior get worse for any of the special cases.

@ibaned
Copy link
Contributor

ibaned commented Jun 9, 2017

okay, I think I have a decent implementation as #882

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

3 participants