"access violation" problem #5774

yhnmj6666 · 2020-05-29T14:05:13Z

Reporting a bug

I am using the latest released version of Numba
I have included below a minimal working reproducer

Thank you all for developing numba! I'm now using numba 0.49.1 with cuda 10.2.89 on python 3.7.5, windows 10. I have a problem in my code that it... sometimes fail to execute due to OSError: exception: access violation reading 0x<non-zero-non-0xFF...address>.

cuda-memcheck.exe shows nothing.

The memory access violation occurs every time in the numba\cuda\cudadrv\nvvm.py in compile(self, **options), line 232, err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
 in 
     39 arr2 = np.arange(16,dtype=np.int8)
     40 arr5 = np.zeros(16*16,dtype=np.int8)
---> 41 foo[2, 16](arr1,arr2,arr5)   # more threads than array elements

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in __call__(self, *args)
    756         Specialize and invoke this kernel with *args*.
    757         '''
--> 758         kernel = self.specialize(*args)
    759         cfg = kernel[self.griddim, self.blockdim, self.stream, self.sharedmem]
    760         cfg(*args)

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in specialize(self, *args)
    767         argtypes = tuple(
    768             [self.typingctx.resolve_argument_type(a) for a in args])
--> 769         kernel = self.compile(argtypes)
    770         return kernel
    771 

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in compile(self, sig)
    786             self.definitions[(cc, argtypes)] = kernel
    787             if self.bind:
--> 788                 kernel.bind()
    789         return kernel
    790 

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in bind(self)
    528         Force binding to current CUDA context
    529         """
--> 530         self._func.get()
    531 
    532     @property

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in get(self)
    405         cufunc = self.cache.get(device.id)
    406         if cufunc is None:
--> 407             ptx = self.ptx.get()
    408 
    409             # Link

V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in get(self)
    376             arch = nvvm.get_arch_option(*cc)
    377             ptx = nvvm.llvm_to_ptx(self.llvmir, opt=3, arch=arch,
--> 378                                    **self._extra_options)
    379             self.cache[cc] = ptx
    380             if config.DUMP_ASSEMBLY:

V:\Program Files\Python\lib\site-packages\numba\cuda\cudadrv\nvvm.py in llvm_to_ptx(llvmir, **opts)
    493     cu.add_module(libdevice.get())
    494 
--> 495     ptx = cu.compile(**opts)
    496     # XXX remove debug_pubnames seems to be necessary sometimes
    497     return patch_ptx_debug_pubnames(ptx)

V:\Program Files\Python\lib\site-packages\numba\cuda\cudadrv\nvvm.py in compile(self, **options)
    229         c_opts = (c_char_p * len(opts))(*[c_char_p(x.encode('utf8'))
    230                                           for x in opts])
--> 231         err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)
    232         self._try_error(err, 'Failed to compile\n')
    233 

OSError: exception: access violation reading 0x000002087936AB58

If run the program again, it shows CudaAPIError: [500] Call to cuModuleGetFunction results in CUDA_ERROR_NOT_FOUND.

Here's a reproducer works on my machine:

import numpy as np
import numba
from numba import cuda

sbox = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],dtype=np.int8)
pbox = np.array([15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0],dtype=np.int8)

@cuda.jit(device=True)
def bar(txt,el,r):
  cudas_box=cuda.const.array_like(sbox)
  cudap_box=cuda.const.array_like(pbox)
  if r!=0:
    temp=cuda.local.array(16,numba.int8)

    #error do not occur if either for loop is removed
    for i in range(16):
      temp[i]=cudas_box[txt[cudap_box[i]]]
    for i in range(16):
      txt[i]=temp[i]
    
  return txt[10]+el

@cuda.jit(debug=True)
def foo(arr1,arr2,arr5):
  tx = cuda.threadIdx.x
  ty = cuda.blockIdx.x
  bw = cuda.blockDim.x
  pos = tx + ty * bw

  pos1=pos%16
  pos2=(pos//16)%16
  if pos<len(arr5):
    roundtext=cuda.local.array(16,numba.int8)
    for i in range(16):
      roundtext[i]=arr1[pos1][i]
    arr5[pos] = bar(roundtext,arr2[pos2],pos%2)

arr1 = np.arange(16,dtype=np.int8).repeat(16).reshape(16,16)
arr2 = np.arange(16,dtype=np.int8)
arr5 = np.zeros(16*16,dtype=np.int8)
foo[2, 16](arr1,arr2,arr5)

The text was updated successfully, but these errors were encountered:

gmarkall · 2020-05-29T14:12:43Z

Thanks for the report and nice reproducer! I can reproduce this.

pittnerf · 2020-05-31T18:21:45Z

Hi, I have a similar problem and it is very annoying (OSError: exception: access violation reading ... in nvvm.py) see #4783 . I managed to upload a minimal reproducer (see https://gitlab.com/pittnerf1/test-project-for-numba-and-cuda-functions/-/tree/master ) which crashes on both Windows 10 and Google Colab, too.

stuartarchibald · 2020-06-01T10:33:14Z

For me, the reproducer is intermittently segfaulting, and repeated runs reduce the likelihood of segfault.

gmarkall · 2020-07-29T14:39:19Z

I believe #6030 will fix this.

gmarkall · 2020-07-30T20:33:23Z

Closing now that #6030 is merged.

sgbaird · 2021-08-20T09:20:46Z

Since these (in my opinion, non-descript) access violation errors still pop up on occasion (as it did for me), debugging with Cuda Simulator seems to help pinpoint the source of the error in the code. See my answer for Numba code compilation for CUDA GPU fails with OSError: exception: access violation reading

gmarkall · 2021-08-20T15:03:07Z

Thanks for linking to the answer. This issue was caused by some transformations that were done to the IR that made it invalid when passing it to NVVM. However, for errors in user code using the simulator can also be a good option. If the simulator is too slow, then one can also add debug=True,opt=False kwargs to the @cuda.jit decorator and run under compute-sanitizer (e.g. compute-sanitizer python bug.py - if there are any source lines with invalid accesses these will be reported. It's not as interactive / deep a tool for CUDA Python code as the CUDA simulator, but it is a lot faster.

sgbaird · 2021-08-20T19:43:46Z

@gmarkall good distinction. Thank you!

gmarkall added bug CUDA CUDA related issue/PR labels May 29, 2020

gmarkall self-assigned this May 29, 2020

gmarkall added this to the Numba 0.51 RC milestone May 29, 2020

gmarkall closed this as completed Jul 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"access violation" problem #5774

"access violation" problem #5774

yhnmj6666 commented May 29, 2020 •

edited

gmarkall commented May 29, 2020

pittnerf commented May 31, 2020

stuartarchibald commented Jun 1, 2020

gmarkall commented Jul 29, 2020

gmarkall commented Jul 30, 2020

sgbaird commented Aug 20, 2021 •

edited

gmarkall commented Aug 20, 2021

sgbaird commented Aug 20, 2021

"access violation" problem #5774

"access violation" problem #5774

Comments

yhnmj6666 commented May 29, 2020 • edited

Reporting a bug

gmarkall commented May 29, 2020

pittnerf commented May 31, 2020

stuartarchibald commented Jun 1, 2020

gmarkall commented Jul 29, 2020

gmarkall commented Jul 30, 2020

sgbaird commented Aug 20, 2021 • edited

gmarkall commented Aug 20, 2021

sgbaird commented Aug 20, 2021

yhnmj6666 commented May 29, 2020 •

edited

sgbaird commented Aug 20, 2021 •

edited