Cudasim acting differently than Cuda (when allocating) #6055

mha-py · 2020-07-31T10:36:09Z

Hello!

I had a bug which i finally found but when debugging i found that cudasim was acting differently (had no bug) than cuda (had bug):
It was like:

@cuda.jit(device=True)
def f(n):
    a = cuda.local.array(n, int32)
    for i in range(n): a[i] = i
    return a

@cuda.jit
def kernel(in, out):
    a = f(n)
    # doing something with a

With cuda, "a" seems to be discarded, but with cudasim it has its values which made debugging quite hard. Maybe in cudasim mode a could also be discarded?

I have another question: How do you allocate global memory in a cuda kernel? I only found giving the function an array which is allocated on CPU site, but I cant find a pendant to cuda.local.array?

And another suggestion: In cudasim you can use print(), in cuda it throws an error. I think it would be convenient to simply ignore print() in pure cuda mode, because then there is no need to comment the print statements when switching cuda and cudasim mode.

Thanks!

The text was updated successfully, but these errors were encountered:

gmarkall · 2020-07-31T10:39:46Z

With the code as written, the compiler could optimize away a because nothing is done with it - does it still get discarded if you have code that does something with it? If so, could you post an executable reproducer?

Unfortunately there isn't a way to allocate global memory in a CUDA kernel - this is supported by CUDA C/C++, but has not yet been implemented in Numba.

mha-py · 2020-07-31T11:59:12Z

I got a minimal example:

CUDASIM = False
#CUDASIM = True

if CUDASIM:
    import os
    os.environ['NUMBA_ENABLE_CUDASIM'] = '1'
    
    
from numba import cuda, float32, float64, int32


@cuda.jit(device=True)
def f(p):
    a = cuda.local.array(2, float64)
    a[0] = p[0]
    a[1] = p[1]
    b = 1.
    return a, b ## Variation 1
    #return a ## Variation 2


@cuda.jit
def kernel(ps, ret_a):
    bidx = cuda.threadIdx.x
    
    # worker index
    if bidx >= len(ps): return
    
    # Algorithm
    a, b = f(ps[bidx]) ## Variation 1
    #a = f(ps[bidx])   ## Variation 2
    ret_a[bidx, 0] = a[0]
    ret_a[bidx, 1] = a[1]
    
    return

n = 10
ps = np.random.rand(n, 2)
a = np.zeros((n, 2))
kernel[10,10](ps, a)
print(a)

With cuda it prints a list of [0. 0.] (array is discarded), while with cudasim it gives out random numbers (the input ps).
I also noticed that numba actually doesnt want f to return arrays? When Variation 2 is active, it gives an error
Only accept returning of array passed into the function as argument. With Variation 1 it runs

gmarkall · 2020-08-01T16:05:23Z

Are you using the latest version of Numba (0.50.1)? Varation 1 behaves similarly for me with CUDASIM = True and CUDASIM = False - can you try with the latest version of Numba if you're not already using it?

Variation 2 seems to be a limitation of the current implementation, but that could probably be fixed without too much effort given that returning an array as part of a tuple appears to be working.

mha-py · 2020-08-06T19:46:07Z

Hello, I have Numba version 0.50.1 installed.
My outputs are:

# Cuda
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]

# Cuda Sim
[[0.00181577 0.79502375]
 [0.80317787 0.71867509]
 [0.51786468 0.31117647]
 [0.02104996 0.21970671]
 [0.94196895 0.53035836]
 [0.12497148 0.9050465 ]
 [0.04491077 0.92619277]
 [0.95086811 0.73024004]
 [0.21094949 0.49417312]
 [0.22964463 0.89618185]]

gmarkall · 2020-08-11T08:10:22Z

@mha-py Could you post the output of numba -s please?

mha-py · 2020-08-12T10:41:04Z

Here the output:

System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2020-08-12 12:42:17.015212
UTC start time                                : 2020-08-12 10:42:17.015212
Running time (s)                              : 1.827336

__Hardware Information__
Machine                                       : AMD64
CPU Name                                      : ivybridge
CPU Count                                     : 8
Number of accessible CPUs                     : 8
List of accessible CPUs cores                 : 0 1 2 3 4 5 6 7
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit aes avx cmov cx16 cx8 f16c
                                                fsgsbase fxsr mmx pclmul popcnt
                                                rdrnd sahf sse sse2 sse3 sse4.1
                                                sse4.2 ssse3 xsave xsaveopt

Memory Total (MB)                             : 8153
Memory Available (MB)                         : 2414

__OS Information__
Platform Name                                 : Windows-10-10.0.19041-SP0
Platform Release                              : 10
OS Name                                       : Windows
OS Version                                    : 10.0.19041
OS Specific Version                           : 10 10.0.19041 SP0 Multiprocessor Free
Libc Version                                  : ?

__Python Information__
Python Compiler                               : MSC v.1900 64 bit (AMD64)
Python Implementation                         : CPython
Python Version                                : 3.6.6
Python Locale                                 : de_DE.cp1252

__LLVM Information__
LLVM Version                                  : 9.0.1

__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : 11000
CUDA Detect Output:
Found 1 CUDA devices
id 0     b'GeForce GTX 1070'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 1
Summary:
	1/1 devices are supported

CUDA Librairies Test Output:
Finding cublas from CUDA_HOME
	named  cublas.dll
	trying to open library...	ERROR: failed to open cublas:
[WinError 126] Das angegebene Modul wurde nicht gefunden
Finding cusparse from CUDA_HOME
	named  cusparse.dll
	trying to open library...	ERROR: failed to open cusparse:
[WinError 126] Das angegebene Modul wurde nicht gefunden
Finding cufft from CUDA_HOME
	named  cufft.dll
	trying to open library...	ERROR: failed to open cufft:
[WinError 126] Das angegebene Modul wurde nicht gefunden
Finding curand from CUDA_HOME
	named  curand.dll
	trying to open library...	ERROR: failed to open curand:
[WinError 126] Das angegebene Modul wurde nicht gefunden
Finding nvvm from CUDA_HOME
	named  nvvm64_33_0.dll
	trying to open library...	ok
Finding libdevice from CUDA_HOME
	searching for compute_20...	ok
	searching for compute_30...	ok
	searching for compute_35...	ok
	searching for compute_50...	ok


__ROC information__
ROC Available                                 : False
ROC Toolchains                                : None
HSA Agents Count                              : 0
HSA Agents:
None
HSA Discrete GPUs Count                       : 0
HSA Discrete GPUs                             : None

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available              : False
+--> Disabled due to Unknown import problem.
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda not available.

__Installed Packages__
Package                           Version
--------------------------------- -------------
-rotobuf                          3.12.2
-umba                             0.39.0
absl-py                           0.9.0
adodbapi                          2.6.0.7
aiohttp                           3.6.2
alabaster                         0.7.11
alembic                           1.2.1
algopy                            0.5.7
altair                            2.1.0
altair-widgets                    0.1.2
altgraph                          0.16.1
appdirs                           1.4.3
asciitree                         0.3.3
asteval                           0.9.12
astor                             0.7.1
astroid                           1.6.5
astroML                           0.3
astunparse                        1.6.3
async-timeout                     3.0.1
atari-py                          0.2.6
atomicwrites                      1.1.5
attrs                             18.1.0
auto-py-to-exe                    2.6.5
autobahn                          20.7.1
Babel                             2.6.0
backcall                          0.1.0
backports-abc                     0.5
baresql                           0.7.4
bcolz                             1.2.1
beautifulsoup4                    4.6.0
bleach                            1.5.0
blosc                             1.5.1
bloscpack                         0.13.0
bokeh                             2.1.1
bottle                            0.12.17
bottle-websocket                  0.2.9
Bottleneck                        1.2.1
bqplot                            0.10.5
branca                            0.4.1
brewer2mpl                        1.4.1
Brotli                            1.0.4
bsddb3                            6.2.6
cachetools                        4.1.1
certifi                           2020.6.20
cffi                              1.11.5
cftime                            1.0.0
chardet                           3.0.4
chart-studio                      1.0.0
click                             7.1.2
click-plugins                     1.1.1
cligj                             0.5.0
cloudpickle                       1.2.1
clrmagic                          0.0.1a2
colorama                          0.3.9
colorcet                          0.9.1
coloredlogs                       14.0
comtypes                          1.1.4
configobj                         5.0.6
cryptography                      3.0
cvxopt                            1.2.0
cvxpy                             1.0.6
cx-Freeze                         5.1.1
cycler                            0.10.0
cyordereddict                     1.0.0
Cython                            0.29.20
cytoolz                           0.9.0.1
dask                              0.18.1
dask-searchcv                     0.2.0
datashader                        0.6.7
datashape                         0.5.2
db.py                             0.5.3
decorator                         4.3.0
defusedxml                        0.6.0
dill                              0.3.2
distributed                       1.22.0
docopt                            0.6.2
docrepr                           0.1.1
docutils                          0.14
ecos                              2.0.5
edward                            1.3.5
Eel                               0.10.4
emcee                             2.2.1
entrypoints                       0.2.3
fast-histogram                    0.4
fastcache                         1.0.2
fasteners                         0.14.1
fastparquet                       0.1.5
feather-format                    0.4.0
filelock                          3.0.12
Flask                             1.0.2
flickrapi                         2.4.0
folium                            0.11.0
formlayout                        1.1.0
future                            0.16.0
fuzzywuzzy                        0.16.0
gast                              0.2.2
geographiclib                     1.49
geopy                             1.14.0
get                               2019.4.13
gevent                            1.4.0
gevent-websocket                  0.10.1
gmpy2                             2.0.8
google                            2.0.3
google-auth                       1.20.0
google-auth-oauthlib              0.4.1
google-pasta                      0.2.0
graphviz                          0.14
greenlet                          0.4.15
grpcio                            1.30.0
guidata                           1.7.6
guiqwt                            3.0.3
Gutenberg                         0.8.0
gym                               0.14.0
h5py                              2.10.0
HeapDict                          1.0.0
holoviews                         1.13.3
html5lib                          0.9999999
humanfriendly                     8.2
husl                              4.0.3
hvplot                            0.2.0
idna                              2.10
idna-ssl                          1.1.0
imageio                           2.3.0
imagesize                         1.0.0
importlib-metadata                1.7.0
intake                            0.1.3
ipydatawidgets                    3.1.0
ipykernel                         4.8.2
ipyleaflet                        0.9.0
ipympl                            0.2.0
ipyparallel                       6.2.2
ipyscales                         0.2.2
ipython                           6.4.0
ipython-genutils                  0.2.0
ipython-sql                       0.3.9
ipywidgets                        7.2.1
isodate                           0.6.0
isort                             4.3.4
iterative-stratification          0.1.6
itsdangerous                      0.24
jedi                              0.12.1
Jinja2                            2.10
joblib                            0.12.0
jsonschema                        2.6.0
julia                             0.1.5
jupyter                           1.0.0
jupyter-client                    5.2.3
jupyter-console                   5.2.0
jupyter-contrib-core              0.3.3
jupyter-contrib-nbextensions      0.5.0
jupyter-core                      4.4.0
jupyter-highlight-selected-word   0.2.0
jupyter-latex-envs                1.4.6
jupyter-nbextensions-configurator 0.4.0
jupyter-server-proxy              1.5.0
jupyter-sphinx                    0.1.2
jupyterlab                        0.32.1
jupyterlab-launcher               0.10.5
Keras                             2.3.0
Keras-Applications                1.0.8
keras-contrib                     2.0.8
Keras-Preprocessing               1.1.2
keras-vis                         0.5.0
keyboard                          0.13.4
keyring                           13.2.1
kiwisolver                        1.0.1
lazy-object-proxy                 1.3.1
llvmlite                          0.33.0
lmfit                             0.9.11
locket                            0.2.0
loky                              2.1.4
lxml                              4.2.3
Mako                              1.1.0
Markdown                          3.2.2
MarkupSafe                        1.0
matplotlib                        3.1.1
mccabe                            0.6.1
metakernel                        0.20.14
mistune                           0.8.3
mizani                            0.4.6
mkl-service                       1.1.2
monotonic                         1.5
more-itertools                    4.2.0
moviepy                           0.2.3.5
mpl-scatter-density               0.3
mpld3                             0.3
mpldatacursor                     0.6.2
mpmath                            1.0.0
msgpack                           1.0.0
msgpack-numpy                     0.4.3
msgpack-python                    0.5.4+dummy
multidict                         4.7.6
multipledispatch                  0.5.0
multiprocess                      0.70.10
multitasking                      0.0.7
munch                             2.5.0
mypy                              0.610
mysql-connector-python            8.0.6
nbconvert                         5.4.1
nbconvert-reportlab               0.2
nbformat                          4.4.0
neat-python                       0.92
netCDF4                           1.4.0
networkx                          2.1
nltk                              3.3
notebook                          5.7.2
numba                             0.50.1
numcodecs                         0.5.5
numdifftools                      0.9.20
numexpr                           2.6.5
numpy                             1.18.5
numpydoc                          0.8.0
oauthlib                          3.1.0
oct2py                            4.0.6
octave-kernel                     0.28.4
opencv-contrib-python             3.4.4.19
opencv-python                     3.4.4.19
opt-einsum                        3.3.0
osqp                              0.3.0
packaging                         17.1
palettable                        3.1.1
pandas                            0.25.1
pandas-datareader                 0.6.0
pandocfilters                     1.4.2
panel                             0.9.7
param                             1.9.3
parambokeh                        0.2.2
paramnb                           2.0.2
parso                             0.3.0
partd                             0.3.8
patsy                             0.5.0
pdfrw                             0.4
pdvega                            0.1
pefile                            2019.4.18
pep8                              1.7.1
pexpect                           4.6.0+dummy
pg8000                            1.11.0
pickleshare                       0.7.4
Pillow                            5.2.0
pip                               20.1.1
pkginfo                           1.4.2
plotly                            4.1.1
plotnine                          0.3.0
pluggy                            0.6.0
post                              2019.4.13
prettytable                       0.7.2
prometheus-client                 0.5.0
prompt-toolkit                    1.0.15
protobuf                          3.12.4
psutil                            5.4.6
ptpython                          0.41
public                            2019.4.13
PuLP                              1.6.8
py                                1.5.4
py-spy                            0.3.3
pyarrow                           0.9.0
pyasn1                            0.4.8
pyasn1-modules                    0.2.8
PyAudio                           0.2.11
pybars3                           0.9.3
pybind11                          2.2.3
pycodestyle                       2.4.0
pycparser                         2.17
pyct                              0.4.6
pyculib                           1.0.1
pydot                             1.2.4
pyflakes                          2.0.0
pyflux                            0.4.17
pygame                            1.9.3
pyglet                            1.2.4
Pygments                          2.2.0
PyInstaller                       3.5
pylint                            1.9.2
pymc                              2.3.7
pymc3                             3.4.1
PyMeta3                           0.5.1
pymongo                           3.7.0
pymp-pypi                         0.4.2
pyodbc                            4.0.23
PyOpenGL                          3.1.5
pypandoc                          1.3.2
pyparsing                         2.2.0
PyQt5                             5.9.2
pyqtgraph                         0.11.0.dev0
pyreadline                        2.1
pyserial                          3.4
pystache                          0.5.4
pytesseract                       0.3.4
pytest                            3.6.3
python-chess                      0.31.2
python-dateutil                   2.7.3
python-editor                     1.0.4
python-hdf4                       0.9.1
python-Levenshtein                0.12.0
python-snappy                     0.5.3
python-unsplash                   1.0.1
pythonnet                         2.4.0.dev0
pythonping                        1.0.8
PythonQwt                         0.5.5
pythreejs                         1.1.0
pytools                           2020.3.1
pytube                            9.5.2
pytz                              2018.5
pyviz-comms                       0.7.5
PyWavelets                        0.5.2
pywin32                           223.1
pywin32-ctypes                    0.2.0
pywinpty                          0.5.4
pywinusb                          0.4.2
PyYAML                            4.1
pyzmq                             19.0.1
pyzo                              4.5.2
QtAwesome                         0.5.0.dev0
qtconsole                         4.3.1
QtPy                              1.4.2
query-string                      2019.4.13
ray                               0.8.6
rdflib                            4.2.2
rdflib-sqlalchemy                 0.3.8
redis                             3.4.1
regex                             2018.6.21
reportlab                         3.4.0
request                           2019.4.13
requests                          2.24.0
requests-file                     1.4.3
requests-ftp                      0.3.1
requests-oauthlib                 1.3.0
requests-toolbelt                 0.8.0
retrying                          1.3.3
rope                              0.10.7
rpy2                              2.9.4
rsa                               4.6
Rtree                             0.9.4
ruamel.yaml                       0.15.42
Rx                                1.6.1
scikit-fuzzy                      0.3.1
scikit-image                      0.14.0
scikit-learn                      0.19.1
scikit-optimize                   0.5.2
scilab2py                         0.6.1
scipy                             1.4.1
scs                               1.2.7
seaborn                           0.9.0
segmentation-models               0.1.2
Send2Trash                        1.5.0
setuptools                        49.2.1
Shapely                           1.7.0
simpervisor                       0.3
simplegeneric                     0.8.1
simplejson                        3.16.0
sip                               4.19.6
six                               1.15.0
snakeviz                          0.4.2
snowballstemmer                   1.2.1
sortedcontainers                  2.0.4
sounddevice                       0.3.11
SPARQLWrapper                     1.8.4
Sphinx                            1.7.5
sphinx-rtd-theme                  0.4.0
sphinxcontrib-websupport          1.1.0
spyder                            3.3.0
spyder-kernels                    0.2.4
SQLAlchemy                        1.2.9
sqlite-bro                        0.8.11
sqlparse                          0.2.4
statsmodels                       0.9.0
streamz                           0.3.0
supersmoother                     0.4
sympy                             1.1.1
tables                            3.4.4
tbb                               2019.0
tblib                             1.3.2
tensorboard                       1.15.0
tensorboard-plugin-wit            1.7.0
tensorflow-estimator              1.15.1
tensorflow-gpu                    1.15.0
tensorflow-gpu-estimator          2.3.0
termcolor                         1.1.0
terminado                         0.8.1
testpath                          0.3.1
Theano                            1.0.2
thrift                            0.11.0
toolz                             0.9.0
torch                             0.4.0
torchvision                       0.2.1
tornado                           5.1.1
tqdm                              4.23.4
traitlets                         4.3.2
traittypes                        0.2.1
twine                             1.11.0
twitter                           1.17.1
txaio                             20.4.1
typed-ast                         1.1.0
typing                            3.6.4
typing-extensions                 3.7.4.2
uncertainties                     3.0.2
urllib3                           1.25.10
vega                              1.3.0
vega-datasets                     0.5.0
vega3                             0.13.0
ViTables                          3.0.0
vpython                           7.6.1
wcwidth                           0.1.7
webencodings                      0.5.1
Werkzeug                          1.0.1
wheel                             0.34.2
whichcraft                        0.6.0
widgetsnbextension                3.2.1
winpython                         1.10.20180624
wordcloud                         1.4.1
wrapt                             1.12.1
xarray                            0.10.7
xlrd                              1.1.0
XlsxWriter                        1.0.5
xlwings                           0.11.5
yarl                              1.4.2
zarr                              2.2.0
zict                              0.1.3
zipp                              3.1.0

No errors reported.


__Warning log__
Warning (roc): Error initialising ROC: No ROC toolchains found.
Warning (roc): No HSA Agents found, encountered exception when searching: Error at driver init: 

HSA is not currently supported on this platform (win32).
:
Warning: Conda not available.
 Error was [WinError 2] Das System kann die angegebene Datei nicht finden

--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================

github-actions · 2021-05-29T02:15:03Z

This issue is marked as stale as it has had no activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with any updates and confirm that this issue still needs to be addressed.

esc · 2021-05-31T12:26:30Z

@mha-py quick question: is this still present in the current version (0.53.1)?

github-actions · 2021-07-02T01:47:33Z

This issue is marked as stale as it has had no activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with any updates and confirm that this issue still needs to be addressed.

gmarkall · 2021-07-02T10:30:47Z

Just checked this with Numba 0.54 RC:

Variation 1 behaves similarly across the hardware and simulator
Variation 2 fails to compile

I also noticed that this is returning a local array from a function, which isn't expected to work (see also discussion in #7090), so I'm going to close this.

gmarkall added CUDA CUDA related issue/PR question Notes an issue as a question needtriage labels Jul 31, 2020

gmarkall added more info needed This issue needs more information and removed needtriage labels Aug 1, 2020

gmarkall added needtriage and removed more info needed This issue needs more information labels Aug 11, 2020

github-actions bot added the stale Marker label for stale issues. label May 29, 2021

github-actions bot removed the stale Marker label for stale issues. label Jun 1, 2021

github-actions bot added the stale Marker label for stale issues. label Jul 2, 2021

gmarkall closed this as completed Jul 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cudasim acting differently than Cuda (when allocating) #6055

Cudasim acting differently than Cuda (when allocating) #6055

mha-py commented Jul 31, 2020 •

edited

gmarkall commented Jul 31, 2020

mha-py commented Jul 31, 2020 •

edited

gmarkall commented Aug 1, 2020

mha-py commented Aug 6, 2020

gmarkall commented Aug 11, 2020

mha-py commented Aug 12, 2020 •

edited

github-actions bot commented May 29, 2021

esc commented May 31, 2021 •

edited

github-actions bot commented Jul 2, 2021

gmarkall commented Jul 2, 2021

Cudasim acting differently than Cuda (when allocating) #6055

Cudasim acting differently than Cuda (when allocating) #6055

Comments

mha-py commented Jul 31, 2020 • edited

gmarkall commented Jul 31, 2020

mha-py commented Jul 31, 2020 • edited

gmarkall commented Aug 1, 2020

mha-py commented Aug 6, 2020

gmarkall commented Aug 11, 2020

mha-py commented Aug 12, 2020 • edited

github-actions bot commented May 29, 2021

esc commented May 31, 2021 • edited

github-actions bot commented Jul 2, 2021

gmarkall commented Jul 2, 2021

mha-py commented Jul 31, 2020 •

edited

mha-py commented Jul 31, 2020 •

edited

mha-py commented Aug 12, 2020 •

edited

esc commented May 31, 2021 •

edited