-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Adds CUDA FP16 arithmetic operators #7885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gpuci run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a small Q below.
gpuci run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this branch, the following:
from numba import cuda
from numba.types import float16, float32
def f(x, y):
return x + y
ptx_16, resty_16 = cuda.compile_ptx(f, (float16, float16), device=True)
ptx_32, resty_32 = cuda.compile_ptx(f, (float32, float32), device=True)
print(ptx_16)
print(resty_16)
print(ptx_32)
print(resty_32)
When run with
numba --annotate-html scalar.html scalar.py
Seems to give the following typing for the addition of float16
s (in the HTML output):
x = arg(0, name=x) :: float16
y = arg(1, name=y) :: float16
$6binary_add.2 = x + y :: float32
and corresponding PTX:
// begin inline asm
cvt.f32.f16 %f1, %rs1;
// end inline asm
// begin inline asm
cvt.f32.f16 %f2, %rs2;
// end inline asm
add.f32 %f3, %f1, %f2;
st.f32 [%rd1], %f3;
Is it expected with this PR that float16 + float16
should be typed as returning a float16
?
(Does this reproduce for you? Perhaps I am missing something / running something incorrectly?)
@gmarkall this reproduces for me and you definitely found a issue. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out the template should be using @register_global
for the operator.*
.
You can check it without CUDA hardware with:
import operator
from numba.core import types, typing
from numba.cuda.descriptor import cuda_target
fp16 = types.float16
typingctx = cuda_target.typing_context
typingctx.refresh()
fnty = typingctx.resolve_value_type(operator.add)
out = typingctx.resolve_function_type(fnty, (fp16, fp16), {})
print(out)
if out != typing.signature(fp16, fp16, fp16):
raise AssertionError(out)
The above code is what the compiler does for function resolution. I'd recommend adding tests like this to numba/cuda/tests/nocuda
.
…s test cases for inplace operators
gpuci run tests |
…e being generated
gpuci run tests |
gpuci run tests |
@gmarkall I added the skip_unless_cc_53 back to the testcase. |
@esc Could this have another buildfarm run please? |
|
So, I tested `` on the Anaconda build farm and still received the following errors on Windows. @testhound could you take a look, please?
|
@esc thank you for the feedback, looking into the failures. |
gpuci run tests |
@esc can I request another buildfarm run? I fixed the failing tests by skipping if the compute capability requirements are not met. I am confused as to why these tests just started failing, since this code has been in main for quite some time. |
@gmarkall FYI: I updated the PR to fix tests that should be skipped. |
It suggests that there is something wrong with these tests that we never picked up on before. |
if out != typing.signature(fp16, fp16, fp16): | ||
raise AssertionError(out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be self.assertEqual(out, typing.signature(fp16, fp16, fp16), msg=str(out))
or similar? (same for the other test in this class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
gpuci run tests |
@stuartarchibald and @gmarkall code update per your request to use |
gpuci run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Could this have another buildfarm run please @stuartarchibald ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch and fixes.
I ran this on the farm as |
This pull request add operator support for the following fp 16 operations: +, -, *, neg, abs. This PR partly addresses the meta issue "add fp16 support": #4402.
This PR builds on: #7460 which added intrinsic support for the above operators.