Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModernGPUv1 #34

Closed
r-barnes opened this issue May 6, 2019 · 1 comment
Closed

ModernGPUv1 #34

r-barnes opened this issue May 6, 2019 · 1 comment

Comments

@r-barnes
Copy link

r-barnes commented May 6, 2019

We'd like to compile some code using ModernGPUv1 with CUDA 9. But CUDA 9 raises a warning about using __shfl_up():

moderngpu/include/device/../device/intrinsics.cuh(113): warning: function "__shfl_up(float, unsigned int, int)"
moderngpu/include/device/../device/intrinsics.cuh(123): warning: function "__shfl_up(int, unsigned int, int)"
moderngpu/include/device/../device/intrinsics.cuh(124): warning: function "__shfl_up(int, unsigned int, int)"
/usr/include/sm_30_intrinsics.hpp(175): here was declared deprecated ("__shfl_up() is deprecated in favor of __shfl_up_sync() and may be removed in a future release (Use -Wno-deprecated-declarations to suppress this warning).")

It suggests using __shfl_up_sync() instead.

Nvidia warns that not to upgrade blindly:

Don’t just use FULL_MASK (i.e. 0xffffffff for 32 threads) as the mask value. If not all threads in the warp can reach the primitive according to the program logic, then using FULL_MASK may cause the program to hang.

Do you know if the 0xffffffff (FULL_MASK) mask is appropriate for ModernGPUv1? The ModernGPUv1 code raising the warnings is copied below for easy reference:

#pragma push_macro("__shfl_up")
#undef __shfl_up
__device__ __forceinline__ float shfl_up(float var, 
  unsigned int delta, int width = 32) {

#if __CUDA_ARCH__ >= 300
  var = __shfl_up(var, delta, width);
#endif  
  return var;
}

__device__ __forceinline__ double shfl_up(double var, 
  unsigned int delta, int width = 32) {

#if __CUDA_ARCH__ >= 300
  int2 p = mgpu::double_as_int2(var);
  p.x = __shfl_up(p.x, delta, width);
  p.y = __shfl_up(p.y, delta, width);
  var = mgpu::int2_as_double(p);
#endif
  
  return var;
}
#pragma pop_macro("__shfl_up")
@neoblizz
Copy link
Collaborator

neoblizz commented May 8, 2019

The recent PR #33 and #32 should fix these warnings and will add support for CUDA 9+. And yes, you can use the FULL_MASK here. Waiting for PR #33 to be merged by Sean.

@neoblizz neoblizz linked a pull request Dec 22, 2021 that will close this issue
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants