Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"access violation" problem #5774

Closed
2 tasks done
yhnmj6666 opened this issue May 29, 2020 · 8 comments
Closed
2 tasks done

"access violation" problem #5774

yhnmj6666 opened this issue May 29, 2020 · 8 comments
Assignees
Labels
bug CUDA CUDA related issue/PR
Milestone

Comments

@yhnmj6666
Copy link

yhnmj6666 commented May 29, 2020

Reporting a bug

  • I am using the latest released version of Numba
  • I have included below a minimal working reproducer

Thank you all for developing numba! I'm now using numba 0.49.1 with cuda 10.2.89 on python 3.7.5, windows 10. I have a problem in my code that it... sometimes fail to execute due to OSError: exception: access violation reading 0x<non-zero-non-0xFF...address>.

cuda-memcheck.exe shows nothing.

The memory access violation occurs every time in the numba\cuda\cudadrv\nvvm.py in compile(self, **options), line 232, err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
 in 
     39 arr2 = np.arange(16,dtype=np.int8)
     40 arr5 = np.zeros(16*16,dtype=np.int8)
---> 41 foo[2, 16](arr1,arr2,arr5)   # more threads than array elements

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in __call__(self, *args)
    756         Specialize and invoke this kernel with *args*.
    757         '''
--> 758         kernel = self.specialize(*args)
    759         cfg = kernel[self.griddim, self.blockdim, self.stream, self.sharedmem]
    760         cfg(*args)

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in specialize(self, *args)
    767         argtypes = tuple(
    768             [self.typingctx.resolve_argument_type(a) for a in args])
--> 769         kernel = self.compile(argtypes)
    770         return kernel
    771 

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in compile(self, sig)
    786             self.definitions[(cc, argtypes)] = kernel
    787             if self.bind:
--> 788                 kernel.bind()
    789         return kernel
    790 

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in bind(self)
    528         Force binding to current CUDA context
    529         """
--> 530         self._func.get()
    531 
    532     @property

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in get(self)
    405         cufunc = self.cache.get(device.id)
    406         if cufunc is None:
--> 407             ptx = self.ptx.get()
    408 
    409             # Link

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in get(self)
    376             arch = nvvm.get_arch_option(*cc)
    377             ptx = nvvm.llvm_to_ptx(self.llvmir, opt=3, arch=arch,
--> 378                                    **self._extra_options)
    379             self.cache[cc] = ptx
    380             if config.DUMP_ASSEMBLY:

V:\Program Files\Python\lib\site-packages\numba\cuda\cudadrv\nvvm.py in llvm_to_ptx(llvmir, **opts)
    493     cu.add_module(libdevice.get())
    494 
--> 495     ptx = cu.compile(**opts)
    496     # XXX remove debug_pubnames seems to be necessary sometimes
    497     return patch_ptx_debug_pubnames(ptx)

V:\Program Files\Python\lib\site-packages\numba\cuda\cudadrv\nvvm.py in compile(self, **options)
    229         c_opts = (c_char_p * len(opts))(*[c_char_p(x.encode('utf8'))
    230                                           for x in opts])
--> 231         err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)
    232         self._try_error(err, 'Failed to compile\n')
    233 

OSError: exception: access violation reading 0x000002087936AB58

If run the program again, it shows CudaAPIError: [500] Call to cuModuleGetFunction results in CUDA_ERROR_NOT_FOUND.

Here's a reproducer works on my machine:

import numpy as np
import numba
from numba import cuda

sbox = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],dtype=np.int8)
pbox = np.array([15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0],dtype=np.int8)

@cuda.jit(device=True)
def bar(txt,el,r):
  cudas_box=cuda.const.array_like(sbox)
  cudap_box=cuda.const.array_like(pbox)
  if r!=0:
    temp=cuda.local.array(16,numba.int8)

    #error do not occur if either for loop is removed
    for i in range(16):
      temp[i]=cudas_box[txt[cudap_box[i]]]
    for i in range(16):
      txt[i]=temp[i]
    
  return txt[10]+el

@cuda.jit(debug=True)
def foo(arr1,arr2,arr5):
  tx = cuda.threadIdx.x
  ty = cuda.blockIdx.x
  bw = cuda.blockDim.x
  pos = tx + ty * bw

  pos1=pos%16
  pos2=(pos//16)%16
  if pos<len(arr5):
    roundtext=cuda.local.array(16,numba.int8)
    for i in range(16):
      roundtext[i]=arr1[pos1][i]
    arr5[pos] = bar(roundtext,arr2[pos2],pos%2)

arr1 = np.arange(16,dtype=np.int8).repeat(16).reshape(16,16)
arr2 = np.arange(16,dtype=np.int8)
arr5 = np.zeros(16*16,dtype=np.int8)
foo[2, 16](arr1,arr2,arr5)
@gmarkall gmarkall added bug CUDA CUDA related issue/PR labels May 29, 2020
@gmarkall gmarkall self-assigned this May 29, 2020
@gmarkall gmarkall added this to the Numba 0.51 RC milestone May 29, 2020
@gmarkall
Copy link
Member

Thanks for the report and nice reproducer! I can reproduce this.

@pittnerf
Copy link

Hi, I have a similar problem and it is very annoying (OSError: exception: access violation reading ... in nvvm.py) see #4783 . I managed to upload a minimal reproducer (see https://gitlab.com/pittnerf1/test-project-for-numba-and-cuda-functions/-/tree/master ) which crashes on both Windows 10 and Google Colab, too.

@stuartarchibald
Copy link
Contributor

For me, the reproducer is intermittently segfaulting, and repeated runs reduce the likelihood of segfault.

@gmarkall
Copy link
Member

I believe #6030 will fix this.

@gmarkall
Copy link
Member

Closing now that #6030 is merged.

@sgbaird
Copy link
Contributor

sgbaird commented Aug 20, 2021

Since these (in my opinion, non-descript) access violation errors still pop up on occasion (as it did for me), debugging with Cuda Simulator seems to help pinpoint the source of the error in the code. See my answer for Numba code compilation for CUDA GPU fails with OSError: exception: access violation reading

@gmarkall
Copy link
Member

Thanks for linking to the answer. This issue was caused by some transformations that were done to the IR that made it invalid when passing it to NVVM. However, for errors in user code using the simulator can also be a good option. If the simulator is too slow, then one can also add debug=True,opt=False kwargs to the @cuda.jit decorator and run under compute-sanitizer (e.g. compute-sanitizer python bug.py - if there are any source lines with invalid accesses these will be reported. It's not as interactive / deep a tool for CUDA Python code as the CUDA simulator, but it is a lot faster.

@sgbaird
Copy link
Contributor

sgbaird commented Aug 20, 2021

@gmarkall good distinction. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug CUDA CUDA related issue/PR
Projects
None yet
Development

No branches or pull requests

5 participants