Skip to content

Conversation

thuydotm
Copy link
Contributor

@thuydotm thuydotm commented Apr 26, 2022

When running focal mean with a relatively big data array and passes is more than 1, cudaErrorIllegalAddress randomly happened, see below example. This caused by the type conversion at each iteration. Only do it once at first pass fixed the issue.

[ 50.00%] ··· Running (focal.FocalMean.time_mean--).
[100.00%] ··· focal.FocalMean.time_mean                                                                               1/2 failed
[100.00%] ··· ======= ============ ===========
              --           passes / type      
              ------- ------------------------
                 nx     1 / cupy    10 / cupy 
              ======= ============ ===========
               10000   37.0±0.4ms     failed  
              ======= ============ ===========

[100.00%] ···· For parameters: 10000, 10, 'cupy'
               Traceback (most recent call last):
                 File "cupy_backends/cuda/api/runtime.pyx", line 520, in cupy_backends.cuda.api.runtime.free
                 File "cupy_backends/cuda/api/runtime.pyx", line 132, in cupy_backends.cuda.api.runtime.check_status
               cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
               Exception ignored in: 'cupy.cuda.memory.Memory.__dealloc__'
               Traceback (most recent call last):
                 File "cupy_backends/cuda/api/runtime.pyx", line 520, in cupy_backends.cuda.api.runtime.free
                 File "cupy_backends/cuda/api/runtime.pyx", line 132, in cupy_backends.cuda.api.runtime.check_status
               cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
               Traceback (most recent call last):
                 File "/home/thuydo/miniconda3/envs/benchmarking/lib/python3.9/site-packages/asv/benchmark.py", line 1293, in main_run_server
                   main_run(run_args)
                 File "/home/thuydo/miniconda3/envs/benchmarking/lib/python3.9/site-packages/asv/benchmark.py", line 1167, in main_run
                   result = benchmark.do_run()
                 File "/home/thuydo/miniconda3/envs/benchmarking/lib/python3.9/site-packages/asv/benchmark.py", line 573, in do_run
                   return self.run(*self._current_params)
                 File "/home/thuydo/miniconda3/envs/benchmarking/lib/python3.9/site-packages/asv/benchmark.py", line 669, in run
                   samples, number = self.benchmark_timing(timer, min_repeat, max_repeat,
                 File "/home/thuydo/miniconda3/envs/benchmarking/lib/python3.9/site-packages/asv/benchmark.py", line 705, in benchmark_timing
                   timing = timer.timeit(number)
                 File "/home/thuydo/xarray-spatial/benchmarks/.asv/env/56f724e5012b5dd3507b0bc0039231e3/lib/python3.9/timeit.py", line 177, in timeit
                   timing = self.inner(it, self.timer)
                 File "<timeit-src>", line 6, in inner
                 File "/home/thuydo/miniconda3/envs/benchmarking/lib/python3.9/site-packages/asv/benchmark.py", line 627, in <lambda>
                   func = lambda: self.func(*param)
                 File "/home/thuydo/xarray-spatial/benchmarks/benchmarks/focal.py", line 46, in time_mean
                   mean(self.agg, passes)
                 File "/home/thuydo/xarray-spatial/benchmarks/.asv/env/56f724e5012b5dd3507b0bc0039231e3/lib/python3.9/site-packages/xrspatial/focal.py", line 211, in mean
                   out = _mean(out, tuple(excludes))
                 File "/home/thuydo/xarray-spatial/benchmarks/.asv/env/56f724e5012b5dd3507b0bc0039231e3/lib/python3.9/site-packages/xrspatial/focal.py", line 103, in _mean
                   agg = xr.DataArray(data.astype(float))
                 File "cupy/_core/core.pyx", line 463, in cupy._core.core.ndarray.astype
                 File "cupy/_core/core.pyx", line 522, in cupy._core.core.ndarray.astype
                 File "cupy/_core/core.pyx", line 171, in cupy._core.core.ndarray.__init__
                 File "cupy/cuda/memory.pyx", line 698, in cupy.cuda.memory.alloc
                 File "cupy/cuda/memory.pyx", line 1375, in cupy.cuda.memory.MemoryPool.malloc
                 File "cupy/cuda/memory.pyx", line 1396, in cupy.cuda.memory.MemoryPool.malloc
                 File "cupy/cuda/memory.pyx", line 1076, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
                 File "cupy/cuda/memory.pyx", line 1097, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
                 File "cupy/cuda/memory.pyx", line 1315, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
                 File "cupy/cuda/memory.pyx", line 1312, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
                 File "cupy/cuda/memory.pyx", line 1047, in cupy.cuda.memory.SingleDeviceMemoryPool._alloc
                 File "cupy/cuda/memory.pyx", line 592, in cupy.cuda.memory._malloc
                 File "cupy/cuda/memory.pyx", line 593, in cupy.cuda.memory._malloc
                 File "cupy/cuda/memory.pyx", line 102, in cupy.cuda.memory.Memory.__init__
                 File "cupy_backends/cuda/api/runtime.pyx", line 455, in cupy_backends.cuda.api.runtime.malloc
                 File "cupy_backends/cuda/api/runtime.pyx", line 132, in cupy_backends.cuda.api.runtime.check_status
               cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
               asv: benchmark failed (exit status 1)

@thuydotm thuydotm changed the title focal.mean(): free memory after each iteration focal.mean(): only do data type conversion once Apr 26, 2022
@thuydotm thuydotm merged commit 2c1220b into master Apr 26, 2022
@thuydotm thuydotm deleted the focal_mean_gpu branch June 1, 2022 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant