Skip to content
Permalink
Branch: testing
Commits on Aug 7, 2019
  1. fix bug on GPUs of compute capability ≥ 7.0 (Volta)

    fhoefling committed Aug 7, 2019
    Since Volta, Nvidia dropped the paradigm of executing threads within a
    warp in lock-step fashion. Thus, additional sync's are needed in kernels
    relying on this feature (or limitation), e.g., the reduction algorithm.
    
    The issue was detected by failing tests unit/observables/ssf/gpu/*.
    
    The proper directive for this is __syncwarp(), which is only available
    since CUDA SDK ≥ 9.0. On older compilers, we call __syncthreads() which
    may result in a performance penalty.
    
    The explicit synchronisation is only necessary for Volta hardware, but
    it appears to be impossible to determine the exact hardware within the
    compute kernel. (__CUDA_ARCH__ yields only the version of the PTX code,
    which is sort of the minimal hardware. So one can generate PTX code for
    < 7.0 which is then run on a Volta GPU.)
You can’t perform that action at this time.