New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Records in CUDA Const Memory #3186
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3186 +/- ##
=========================================
- Coverage 81.13% 81.1% -0.03%
=========================================
Files 384 386 +2
Lines 75110 76398 +1288
Branches 8434 8590 +156
=========================================
+ Hits 60937 61964 +1027
- Misses 12884 13114 +230
- Partials 1289 1320 +31 |
This approach may have performance issues. Since GPU is very sensitive to memory load latency/stall, we need to verify what the emitted instruction sequence is. i.e. don't want to load a float32 as 4 byte load instructions. |
@sklam will add that to the tests. Is a vector loads of 4 bytes operationally equivalent to a int 32 load? Or do vector instructions have different performance characteristics? |
I don't know for certain. I don't think that is documented for the public. But, I will guess that they are equivalent. |
I don't remember there ever being a documented difference, except in really early CUDA C, where casting a pointer to the vector types before loading was used as a hack to work around memory controller behavior. (Loading float2 to get better bandwidth, etc...) I strongly suspect it doesn't matter any more. |
@sklam see tests on the commit I pushed - |
numba/cuda/cudaimpl.py
Outdated
constary = lc.Constant.array(constvals[0].type, constvals) | ||
constvals = [ | ||
context.get_constant(types.byte, i) | ||
for i in arr.flatten(order='A').data.tobytes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is causing error on windows and python2.7:
test_const_array (numba.cuda.tests.cudapy.test_constmem.TestCudaConstantMemory)
test_const_array_2d (numba.cuda.tests.cudapy.test_constmem.TestCudaConstantMemory)
test_const_array_3d (numba.cuda.tests.cudapy.test_constmem.TestCudaConstantMemory)
test_const_record (numba.cuda.tests.cudapy.test_constmem.TestCudaConstantMemory)
test_const_record_align (numba.cuda.tests.cudapy.test_constmem.TestCudaConstantMemory)
File "..\_test_env\lib\site-packages\numba\cuda\tests\cudapy\test_constmem.py", line 60:
def cuconstRecAlign(A, B, C, D, E):
Z = cuda.const.array_like(CONST_RECORD_ALIGN)
^
[1] During: lowering "$0.5 = call ptx.cmem.arylike($0.4, kws=[], args=[Var($0.4, c:\conda64\conda-bld\numba_1533830313780\_test_env\lib\site-packages\numba\cuda\tests\cudapy\test_constmem.py (60))], func=ptx.cmem.arylike, vararg=None)" at c:\conda64\conda-bld\numba_1533830313780\_test_env\lib\site-packages\numba\cuda\tests\cudapy\test_constmem.py (60)
'buffer' object has no attribute 'tobytes'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely a python2.7 error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a fix for this - it uses this instead
There's also an error on python3.7:
|
@sklam re: the python3.7 error - why is it using the v8 toolkit, not a v9 / v9.1 one? Can you include the full PTX so I can see what |
We test with CUDA 8, 9, and 9.1, so it is possible that NVVM from one of those versions behaves differently than the others? I'll see if I can dig out the full PTX result |
Build farm is failing on:
Python 2 tests are failing because of this:
Without looking I'd guess that the test suite is larger than it thinks it ought to be due to new tests OR that there's a really long error message or similar that's appearing in test discovery and that's being decoded. There's also an error message like this on windows 64 + python 2:
Output excerpt for failure on Python 3 + Windows 64:
|
@stuartarchibald thanks for that - I've pushed fixes to my branch ( |
This is passing on the build farm and ready to merge. |
(really any type). Don't try to interpret data in the lowering step - just serialize as a stream of bytes.