'out of ressources error' returned from opencl code with nvidia cards with memory >8GB #30

mwibral · 2019-02-08T12:25:57Z

There is an issue with nividia cards with a memory of larger (!) than 8 GB ironically reporting an 'out of ressources' error sometime into the computation (e.g. when running systemtest_lorenz2_opencl.py). Cards of the same chip architecture with up to 8GB do not seem to have that problem, e.g.:
Cards running fine: Titan 1st gen. (Kepler, 6GB), GTX 1080 8GB (Pascal chip)
Cards returning errors: Quadro P6000 (24GB), Tesla V100 (32GB)

pwollstadt · 2019-03-04T09:22:16Z

Could not reproduce this on a GeForce GTX TITAN X with 12 GB main memory (running Ubuntu 14.04.5 LTS). Maybe collecting devices/setups that do and don't work helps to narrow down the list of possible causes?

orlandi · 2019-03-05T07:53:18Z

Maybe collecting devices/setups that do and don't work helps to narrow down the list of possible causes?

In case it helps, I was able to run systemtest_lorenz2_opencl.py succesfully on Gefoce RTX 2080 Ti (Turing, 11 GB) and on Tesla K20c (Kepler, 5GB) cards.

orlandi · 2019-03-07T04:09:59Z

I spoke too soon. Although the Lorenz code did run, I'm experiencing the same issue when using my own data. OUT_OF_RESOURCES error on the 2080 Ti (11 GB), but no problems on the Tesla K20c (5GB) or when using CPUs.
I'm running the standard multivariate TE with the OpenCLKraskovCMI CMI estimator:

from idtxl.multivariate_te import MultivariateTE
from idtxl.data import Data
from idtxl.visualise_graph import plot_network
from idtxl import idtxl_io as io
import matplotlib.pyplot as plt
import pickle
import numpy, scipy.io

data = io.import_matarray(
        file_name='test.mat',
        array_name='XR',
        dim_order='rps',
        file_version='v7.3',
        normalise=False)

network_analysis = MultivariateTE()
settings = {'cmi_estimator': 'OpenCLKraskovCMI',
            'max_lag_sources': 3,
            'min_lag_sources': 1}

results = network_analysis.analyse_network(settings=settings, data=data)

pickle.dump(results, open('results.p', 'wb'))

Data structure contains 16 processes, 46 samples, 1106 replications.
With 200 replications it runs fine, but with the above number it results in the following error on computing sources for the first target:


---------------------------- (2) include source candidates
candidate set: [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3), (5, 1), (5, 2), (5, 3), (6, 1), (6, 2), (6, 3), (7, 1), (7, 2), (7, 3), (8, 1), (8, 2), (8, 3), (9, 1), (9, 2), (9, 3), (10, 1), (10, 2), (10, 3), (11, 1), (11, 2), (11, 3), (12, 1), (12, 2), (12, 3), (13, 1), (13, 2), (13, 3), (14, 1), (14, 2), (14, 3), (15, 1), (15, 2), (15, 3)]
testing candidate: (14, 1) maximum statistic, n_perm: 200
Traceback (most recent call last):
  File "multivariateTEtestR.py", line 29, in <module>
    results = network_analysis.analyse_network(settings=settings, data=data)
  File "/home/benuccilab/IDTxl/idtxl/multivariate_te.py", line 159, in analyse_network
    settings, data, targets[t], sources[t])
  File "/home/benuccilab/IDTxl/idtxl/multivariate_te.py", line 276, in analyse_single_target
    self._include_source_candidates(data)
  File "/home/benuccilab/IDTxl/idtxl/network_inference.py", line 826, in _include_source_candidates
    self._include_candidates(candidates, data)
  File "/home/benuccilab/IDTxl/idtxl/network_inference.py", line 120, in _include_candidates
    conditional=self._selected_vars_realisations)
  File "/home/benuccilab/IDTxl/idtxl/estimator.py", line 278, in estimate_parallel
    return self.estimate(n_chunks=n_chunks, **data)
  File "/home/benuccilab/IDTxl/idtxl/estimators_opencl.py", line 539, in estimate
    n_chunks_current_run)
  File "/home/benuccilab/IDTxl/idtxl/estimators_opencl.py", line 680, in _estimate_single_run
    cl.enqueue_copy(self.queue, distances, d_distances)
  File "/home/benuccilab/conda/envs/idtxl/lib/python3.7/site-packages/pyopencl/__init__.py", line 1709, in enqueue_copy
    return _cl._enqueue_read_buffer(queue, src, dest, **kwargs)
pyopencl._cl.RuntimeError: clEnqueueReadBuffer failed: OUT_OF_RESOURCES

I did check memory usage on the card, and was always very low, less than 1GB.

mwibral · 2019-03-07T10:39:07Z

Hi all, my gut feeling is that one of the buffers that we allocate in the opencl is too small, i.e we use more than we request - but only a 'little bit'. On some cards/systems/drivers and on some smaller datasets we get do not get a problem, as long as our buffer stays within the bounds of the last memory page we requested (say we requested 129 kbyte, and the pages have 64kb each, then we will actually get 3 pages, and can silently consume another 63 kB that we did not request. Things that could influence this error are: (1) Card type will influence the page size, i.e. it used to be 64k on AMD Hawaii and the fglrx driver, but seems to be only 4k with Vega64 and both the amdgpu-pro and the ROCm driver. (2) Different drivers may have different ways of allocating memory as well. (3) The interaction of dataset size and card memory size may lead to too large scheduled computations that then amplify our misallocation problem until we overrun the slack we have on the last memory page we requested. ALTERNATIVELY, the interplay of available card memory, dataset size, and chip architecture may lead to too large OpenCL-workitems (or something similar) that overtax the 'local' memory of each compute shader (i.e. we're trying to store too much information in local memory). I think this is less likely, because in this case we should see the failures align more with Nvidia's chip architecture Kepler-Pascal-Volta-Turing. However, we have large analyses running on pascal 1080's and failures on pascal Quadro P6000. I will also forward this to Pedro, to get his input. Best, Michael

…

________________________________ From: Javier G. Orlandi <notifications@github.com> Sent: Thursday, March 7, 2019 5:09:59 AM To: pwollstadt/IDTxl Cc: Wibral, Michael; Author Subject: Re: [pwollstadt/IDTxl] 'out of ressources error' returned from opencl code with nvidia cards with memory >8GB (#30) I spoke too soon. Although the Lorenz code did run, I'm experiencing the same issue when using my own data. OUT_OF_RESOURCES error on the 2080 Ti (11 GB), but no problems on the Tesla K20c (5GB) or when using CPUs. I'm running the standard multivariate TE with the OpenCLKraskovCMI CMI estimator: from idtxl.multivariate_te import MultivariateTE from idtxl.data import Data from idtxl.visualise_graph import plot_network from idtxl import idtxl_io as io import matplotlib.pyplot as plt import pickle import numpy, scipy.io data = io.import_matarray( file_name='test.mat', array_name='XR', dim_order='rps', file_version='v7.3', normalise=False) network_analysis = MultivariateTE() settings = {'cmi_estimator': 'OpenCLKraskovCMI', 'max_lag_sources': 3, 'min_lag_sources': 1} results = network_analysis.analyse_network(settings=settings, data=data) pickle.dump(results, open('results.p', 'wb')) Data structure contains 16 processes, 46 samples, 1106 replications. With 200 replications it runs fine, but with the above number it results in the following error on computing sources for the first target:

---------------------------- (2) include source candidates candidate set: [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3), (5, 1), (5, 2), (5, 3), (6, 1), (6, 2), (6, 3), (7, 1), (7, 2), (7, 3), (8, 1), (8, 2), (8, 3), (9, 1), (9, 2), (9, 3), (10, 1), (10, 2), (10, 3), (11, 1), (11, 2), (11, 3), (12, 1), (12, 2), (12, 3), (13, 1), (13, 2), (13, 3), (14, 1), (14, 2), (14, 3), (15, 1), (15, 2), (15, 3)] testing candidate: (14, 1) maximum statistic, n_perm: 200 Traceback (most recent call last): File "multivariateTEtestR.py", line 29, in <module> results = network_analysis.analyse_network(settings=settings, data=data) File "/home/benuccilab/IDTxl/idtxl/multivariate_te.py", line 159, in analyse_network settings, data, targets[t], sources[t]) File "/home/benuccilab/IDTxl/idtxl/multivariate_te.py", line 276, in analyse_single_target self._include_source_candidates(data) File "/home/benuccilab/IDTxl/idtxl/network_inference.py", line 826, in _include_source_candidates self._include_candidates(candidates, data) File "/home/benuccilab/IDTxl/idtxl/network_inference.py", line 120, in _include_candidates conditional=self._selected_vars_realisations) File "/home/benuccilab/IDTxl/idtxl/estimator.py", line 278, in estimate_parallel return self.estimate(n_chunks=n_chunks, **data) File "/home/benuccilab/IDTxl/idtxl/estimators_opencl.py", line 539, in estimate n_chunks_current_run) File "/home/benuccilab/IDTxl/idtxl/estimators_opencl.py", line 680, in _estimate_single_run cl.enqueue_copy(self.queue, distances, d_distances) File "/home/benuccilab/conda/envs/idtxl/lib/python3.7/site-packages/pyopencl/__init__.py", line 1709, in enqueue_copy return _cl._enqueue_read_buffer(queue, src, dest, **kwargs) pyopencl._cl.RuntimeError: clEnqueueReadBuffer failed: OUT_OF_RESOURCES I did check memory usage on the card, and was always very low, less than 1GB. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#30 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AIqYGsq4o_rPaasVMIrEn0z43d-oPg-bks5vUJEXgaJpZM4avYU1>.

mwibral · 2019-04-03T16:53:25Z

Some more info, now that I am testing on multiple machines, including AMD ones.

(1) On two machines with VEGA 64 and AMD ROCm's OpenCL, I get from python /pyopencl:

Memory access fault by GPU node-1 (Agent handle: 0x564110c33270) on address 0xa02a00000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

Note that the address 0xa02a0000 is identical on both systems, althought the cards are slightly different (regular vega64, 8GB, and WX9100, 16GB, radeon pro model)

dmesg returns:

gmc_v9_0_process_interrupt: 6 callbacks suppressed
[Wed Apr 3 18:05:00 2019] amdgpu 0000:19:00.0: [gfxhub] no-retry page fault (src_id:0 ring:24 vmid:8 pasid:32768, for process pid 0 thread pid 0)
[Wed Apr 3 18:05:00 2019] amdgpu 0000:19:00.0: in page starting at address 0x0000000a02a00000 from 27
[Wed Apr 3 18:05:00 2019] amdgpu 0000:19:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00801030

(2) On an AMD APU with the amdgpu pro driver I simply get a system crash. This happens if I run more than 37 or 38 replications in the systemtest_lorenz2_opencl.py.

(3) Using the -older- develop version that Aaron is running in Frankfurt (and obtained from patricia via email I think?), I get a different error (and much earlier in the process):
pyopencl._cl.RuntimeError: clCreateSubBuffer failed: MISALIGNED_SUB_BUFFER_OFFSET

Googling for this last error messsage turns up posts (https://stackoverflow.com/questions/17575032/using-clcreatesubbuffer) where memory management should be done in relation to the device property CL_DEVICE_MEM_BASE_ADDR_ALIGN
(maybe we can use the padding we had constructed for the AMD S10000 in a more flexible way to satisfy these requirements)?

georgedimitriadis · 2019-06-11T14:48:15Z

Hi,

Just to chip in with the same error.

I am running windows 10 on an intel i7-5960X with 2 GeForce Titan X (12GB) cards.
I am on a miniconda / python 3.6 system and everything seem to be working fine (openCL drivers for everything, all dependencies up and ruinning, etc.).

I have a data set with 839 processes, 14422 samples, 1 replications.

I run the following code:
network_analysis = MultivariateTE()
settings_gpu = {'cmi_estimator': 'OpenCLKraskovCMI',
'gpuid': 1,
'max_lag_sources': 8,
'min_lag_sources': 1,
'max_lag_target': 4
}
results = network_analysis.analyse_network(settings=settings_gpu, data=data)

and I get the error:

Traceback (most recent call last):
File "", line 19, in
File "e:\software\develop\source\repos\idtxl\idtxl\multivariate_te.py", line 159, in analyse_network
settings, data, targets[t], sources[t])
File "e:\software\develop\source\repos\idtxl\idtxl\multivariate_te.py", line 276, in analyse_single_target
self._include_source_candidates(data)
File "e:\software\develop\source\repos\idtxl\idtxl\network_inference.py", line 826, in _include_source_candidates
self._include_candidates(candidates, data)
File "e:\software\develop\source\repos\idtxl\idtxl\network_inference.py", line 120, in _include_candidates
conditional=self._selected_vars_realisations)
File "e:\software\develop\source\repos\idtxl\idtxl\estimator.py", line 278, in estimate_parallel
return self.estimate(n_chunks=n_chunks, **data)
File "e:\software\develop\source\repos\idtxl\idtxl\estimators_opencl.py", line 539, in estimate
n_chunks_current_run)
File "e:\software\develop\source\repos\idtxl\idtxl\estimators_opencl.py", line 680, in estimate_single_run
cl.enqueue_copy(self.queue, distances, d_distances)
File "E:\Software\Develop\Languages\Pythons\Miniconda35\lib\site-packages\pyopencl_init.py", line 1712, in enqueue_copy
return _cl._enqueue_read_buffer(queue, src, dest, **kwargs)
pyopencl._cl.RuntimeError: clEnqueueReadBuffer failed: OUT_OF_RESOURCES

Any help with this?

Thanks

Replace unsigned int types in OpenCL/CUDA code. For very large point sets this leads to an overflow and incorrect indexing of arrays. Add test scripts. Update CUDA makefile. Fixes #30.

mwibral · 2019-12-12T10:35:03Z

An update on this issue (after the fix with the int index that's already included in the branch fix gpu_bug):

Unfortunately, there are still errors if the product of npointsdimchunks exceeds a certain threshold AND the padding is used (necessary), i.e. if the number of bytes (datapoints) that go to the GPU card is not a multiple of 1024. In that case the computation on the GPU runs (as seen by the time elapsed until the error), but there is a memory access violation when returning, leading to the following error messages:
(AMD) Memory access fault by GPU node-1 (Agent handle: 0x562731f06a00) on address 0xa06200000. Reason: Page not present or supervisor privilege.
(NVIDIA) clEnqueueReadBuffer failed: OUT_OF_RESOURCES

This does not happen when the data that goes to the GPU is a multiple of 1024, i.e. when we pad with zero points, or when we switch of padding (this only works for nvidia cards, see below).

Note that the padding is only necessary on cards that need manual subbuffer alignment (AMD cards). So on Nvidia cards a simple solution would be to detect the manufacturer and switch off the padding altogether.

On some AMD cards that only provide opencl1.2 capabilitites (e.g. Lexa XT chip and the old Hawaii chips) there seems to be no problem with the padding - for reasons unknown. So for AMD cards that provide only opencl1.2. capabilities the solution could be to detect the capabilities and to use the padding as is.

Btw. running the opencl code on a multicore GPU using Intel's opencl implementation also works (it's just 100x slower), so there are no really gross errors with the implementation of the actual opencl kernel, I guess.

The remaining problems on AMD cards with the rocm driver and opencl2.0 capabilities (definitely Vega, possibly Polaris and Fiji) need to be solved in the opencl code. It is also possible there is a opencl2.0 issue, possibly in pyopencl.

I would be very glad if someone else could confirm the above observations by:

cloning the latest repo
switching to the branch fix_gpu_bug
cd to ..../IDTxl/dev/search_GPU/deliverable2_1
run $> time python test_opencl_search.py --gpuid 0 -p 3670018 -d 2 -c 2 --padding
(step 4 will take somewhere between 5 and 50 minutes, depending on your GPU, if it crashes you'll get an error message, the code will almost certainly not hang, just give it some time :-))

and then report:
(A) whether there was a crash or not, i.e. report the output of the above command
(B) OS, GPU type, GPU driver, GPU RAM size, the output of clinfo (includes the opencl1.2. vs opencl 2.0 capabilitites)
(C) results of other pointsizes, dimensions, chunk numbers

Markwelt · 2020-06-29T11:07:30Z

Not sure if this is still open or under consideration. But anyway...
On Windows with an NVIDIA GPU with 8GB mem with OpenCL 1.2, the test on deliverable2_1 fails, though demo_multivariate_te.py (with 'cmi_estimator': 'OpenCLKraskovCMI') works on the branch fix_gpu_bug (and not on the updated main master IDTxl). More info below.

OS: Win 10 (Enterprise ver 1909 build 18363.778); GPU: NVIDIA GeForce RTX 2070; pyOpenCL: pyopencl-2020.2+cl12-cp38-cp38-win_amd64.whl (max 1.2 on NVIDIA as you know)
python test_opencl_search.py --gpuid 0 -p 3670018 -d 2 -c 2 --padding gives:

gpuid: 0
Applying padding to multiple of 1024

pointset: 56.01, TOTAL: 112.02 MB, PADDING: 1020
pointset shape: (2, 7341056)
pointset shape % n_chunks: 0 (chunkkength: 3670018)

Selected Device: GeForce RTX 2070
DEBUG:pyopencl.cache:build program: binary cache hit (key: 8b09c644ab05a53347cd9f1e19d108a3)
DEBUG:pytools.persistent_dict:pyopencl-invoker-cache-v7: disk cache hit [key=9830e4e464ac850c53a15d29fb1ea0822392841646effb7ab7351bc85139fdf0]
INFO:pyopencl:build program: kernel 'kernelKNNshared' was part of a lengthy cache retrieval (0.50 s)
DEBUG:pytools.persistent_dict:pyopencl-invoker-cache-v7: disk cache hit [key=b0be9d43833a6a67bdabcaf743eafe38855ffce6660ec5502b12c36615acb806]
DEBUG:pytools.persistent_dict:pyopencl-invoker-cache-v7: disk cache hit [key=b4efd189bde0f677b380b90ad80abcdf9858b798940c0e9f9b0bfc8b6beb4077]
INFO:pyopencl:build program: kernel 'kernelBFRSAllshared' was part of a lengthy cache retrieval (0.50 s)
DEBUG:pytools.persistent_dict:pyopencl-invoker-cache-v7: disk cache hit [key=fb92d9fdefe95aeac01927058216ade6f72b3d21cc609bafdbf44c1b956191a5]
Pointset: 14682112 elements, dim 2x7341056, 2 chunks (chunklength: 3670018).
workitems_x: 256, NDrange_x: 7341056
device distances size: 58728448
host distances size: 14682112
clEnqueueReadBuffer failed: OUT_OF_RESOURCES
Execution time: 6.07 min

!!! GPU execution failed

No clinfo on win so below a shortened version of a GPU Caps Viewer report with more GPU and OpenCL info

===================================================
GPU Caps Viewer v1.45.1.0 report
===================================[ System / CPU ]

CPU Name: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz

CPU Core Speed: 3695 MHz

CPU logical cores: 12

Family: 6 - Model: 14 - Stepping: 10

Physical Memory Size: 65536 MB

Operating System: Windows 10 64-bit build 18363

PhysX Version: 9190218
===================================[ Graphics Adapters / GPUs ]

Current Display Mode: 1920x1080 @ 60 Hz - 32 bpp

Num GPUs: 1

GPU 1

Name: NVIDIA GeForce RTX 2070

GPU codename: TU106-400

Device ID: 10DE-1F02

Subdevice ID: 3842-2070

Revision ID: A1

GPU brand type: GeForce

Driver: 27.21.14.5122 (R451.22)

Shader cores: 2304

Texture units: 144

ROP units: 64

TDP: 175W

BIOS version: 90.06.18.40.0c

Memory size: 8191MB

Memory type: GDDR6

Memory bus width: 256-bit

GPU base clock: 1410 MHz
===================================[ OpenGL GPU Capabilities ]

GL_VENDOR: NVIDIA Corporation

GL_RENDERER: GeForce RTX 2070/PCIe/SSE2

GL_VERSION: 4.6.0 NVIDIA 451.22

GL_SHADING_LANGUAGE_VERSION: 4.60 NVIDIA
===================================[ NVIDIA CUDA Capabilities ]

CUDA Device 0

Device name: GeForce RTX 2070

PCI bus ID: 1

Compute Capability: 7.5

Total memory: 4095 MB

Peak memory bandwidth: 448 GB/s

L2 cache: 4 MB

Core clock rate: 1620 MHz

Memory clock rate: 7001 MHz

Multiprocessors (SMs): 36

CUDA cores per SM: 64

CUDA cores: 2304

Async engines: 3

Warp Size: 32

Max Threads Per Block: 1024

Threads Per Block: 1024 x 1024 x 64

Grid Size: 2147483647 x 65535 x 65535

Registers Per Block: 65536

Texture Alignment: 512 byte

Total Constant Memory: 64 Kb
===================================[ OpenCL Capabilities ]

Num OpenCL platforms: 1

CL_PLATFORM_NAME: NVIDIA CUDA

CL_PLATFORM_VENDOR: NVIDIA Corporation

CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 11.0.186

CL_PLATFORM_PROFILE: FULL_PROFILE

Num devices: 1

CL_DEVICE_NAME: GeForce RTX 2070

CL_DEVICE_VENDOR: NVIDIA Corporation

CL_DRIVER_VERSION: 451.22

CL_DEVICE_PROFILE: FULL_PROFILE

CL_DEVICE_VERSION: OpenCL 1.2 CUDA

CL_DEVICE_TYPE: GPU

CL_DEVICE_VENDOR_ID: 0x10DE

CL_DEVICE_MAX_COMPUTE_UNITS: 36

CL_DEVICE_MAX_CLOCK_FREQUENCY: 1620MHz

CL_NV_DEVICE_COMPUTE_CAPABILITY_MAJOR: 7

CL_NV_DEVICE_COMPUTE_CAPABILITY_MINOR: 5

CL_NV_DEVICE_REGISTERS_PER_BLOCK: 65536

CL_NV_DEVICE_WARP_SIZE: 32

CL_NV_DEVICE_GPU_OVERLAP: 1

CL_NV_DEVICE_KERNEL_EXEC_TIMEOUT: 1

CL_NV_DEVICE_INTEGRATED_MEMORY: 0

CL_DEVICE_ADDRESS_BITS: 32

CL_DEVICE_MAX_MEM_ALLOC_SIZE: 2097152KB

CL_DEVICE_GLOBAL_MEM_SIZE: 8192MB

CL_DEVICE_MAX_PARAMETER_SIZE: 4352

CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 128 Bytes

CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 1152KB

CL_DEVICE_ERROR_CORRECTION_SUPPORT: NO

CL_DEVICE_LOCAL_MEM_TYPE: Local (scratchpad)

CL_DEVICE_LOCAL_MEM_SIZE: 48KB

CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64KB

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3

CL_DEVICE_MAX_WORK_ITEM_SIZES: [1024 ; 1024 ; 64]

CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024

CL_EXEC_NATIVE_KERNEL: 6333188

CL_DEVICE_IMAGE_SUPPORT: YES

CL_DEVICE_MAX_READ_IMAGE_ARGS: 256

CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 32

CL_DEVICE_IMAGE2D_MAX_WIDTH: 32768

CL_DEVICE_IMAGE2D_MAX_HEIGHT: 32768

CL_DEVICE_IMAGE3D_MAX_WIDTH: 16384

CL_DEVICE_IMAGE3D_MAX_HEIGHT: 16384

CL_DEVICE_IMAGE3D_MAX_DEPTH: 16384

CL_DEVICE_MAX_SAMPLERS: 32

CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1

CL_DEVICE_EXTENSIONS: 17

Extensions:

cl_khr_global_int32_base_atomics

cl_khr_global_int32_extended_atomics

cl_khr_local_int32_base_atomics

cl_khr_local_int32_extended_atomics

cl_khr_fp64

cl_khr_byte_addressable_store

cl_khr_icd

cl_khr_gl_sharing

cl_nv_compiler_options

cl_nv_device_attribute_query

cl_nv_pragma_unroll

cl_nv_d3d10_sharing

cl_khr_d3d10_sharing

cl_nv_d3d11_sharing

cl_nv_copy_opts

cl_nv_create_buffer

cl_khr_int64_base_atomics

Replace unsigned int types in OpenCL/CUDA code. For very large point sets this leads to an overflow and incorrect indexing of arrays. Add test scripts. Update CUDA makefile. Fixes #30.

mwibral · 2020-12-09T14:06:12Z

I have uploaded a preliminary bugfix for this problem. See branch OpenCL_bugfix. Testing is appreciated.

…riables: signallength_padded and signallength orig, I set padding default to true, made callers aware of additional argument in opencl kernels Fixes #30.

pawarrick mentioned this issue Jun 14, 2019

Installation misses copying gpuKnnKernelNoIdx.cl #37

Closed

pwollstadt added the help wanted label Apr 14, 2020

mwibral closed this as completed Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'out of ressources error' returned from opencl code with nvidia cards with memory >8GB #30

'out of ressources error' returned from opencl code with nvidia cards with memory >8GB #30

mwibral commented Feb 8, 2019 •

edited

pwollstadt commented Mar 4, 2019

orlandi commented Mar 5, 2019

orlandi commented Mar 7, 2019

mwibral commented Mar 7, 2019 via email

mwibral commented Apr 3, 2019 •

edited

georgedimitriadis commented Jun 11, 2019

mwibral commented Dec 12, 2019

Markwelt commented Jun 29, 2020

mwibral commented Dec 9, 2020

'out of ressources error' returned from opencl code with nvidia cards with memory >8GB #30

'out of ressources error' returned from opencl code with nvidia cards with memory >8GB #30

Comments

mwibral commented Feb 8, 2019 • edited

pwollstadt commented Mar 4, 2019

orlandi commented Mar 5, 2019

orlandi commented Mar 7, 2019

mwibral commented Mar 7, 2019 via email

mwibral commented Apr 3, 2019 • edited

georgedimitriadis commented Jun 11, 2019

mwibral commented Dec 12, 2019

Markwelt commented Jun 29, 2020

mwibral commented Dec 9, 2020

mwibral commented Feb 8, 2019 •

edited

mwibral commented Apr 3, 2019 •

edited