New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meta issue: fp16 support #4402
Comments
Related issue is how to handle bfloat16, which seems to be gaining traction and will have limited AVX-512 support on very recent Intel CPU architectures, and is also used on the Google TPU. Haven't seen any GPU hardware support for bfloat16 yet, although AMD's ROCm seems to have added software support. |
|
Just a note that CuPy now supports float16 in user defined kernels.
It would be good to have float16 in Numba kernels to prepare data for deep learning models and RAPIDS Numba based UDFs. At present RAPIDS converts float32 to float16 as a workaround. |
I can't edit the issue to tick the box, but it looks like:
is done in numba/llvmlite#509, and
is not completed, but there is a merged PR that adds support for fp16 intrinsics: numba/llvmlite#510 |
I ticked the |
It also looks like LLVM 11 adds bfloat16 support: llvm/llvm-project@8c24f33158d8 The type name seems to be |
Hello, Any updates on this ? |
Seems like with numba right now it's possible to use FP16 inputs, but not FP16 shared memory. Does full fp16 support in CUDA mean we can use fp16 shared memory? Example (currently does not work):
|
Any progress? |
Tag. Interested to hear if there is an update on this. |
Float16 is much faster than float32 and float64. So as a JIT tool for python, it is extremely neccesary to support float16 type. |
Currently |
at this point, the only CPUs supporting native FP16 arithmetic are Intel Sapphire Rapids (which just came out) and ARMv8.2 and later, right? I don't think AMD has any FP16 support on CPU at all right now. |
Does |
LLVM supports it, as does llvmlite - support was added in numba/llvmlite#509 For systems that don't have it, perhaps compiler-rt implementations need to be used (@testhound may know / recall what the thinking here was better than I). |
@dongrixinyu I am currently working on float16 support for the cpu and @gmarkall is correct, support for LLVM's compiler-rt will be required for targets that do not support float16 instructions. |
@testhound realy looking forward to your support for |
If you can not spare much time to finish |
This would also affect our work on USearch. It supports f16 and Numba JIT to define the metrics, but not together :) Here is an example |
Hi! Great initiative. Is the initial post's checkbox up-to-date in terms of items finalized? |
It's supported on CUDA but not the CPU target. So either all or none of them could be checked depending on what we decide this issue is about. |
Great to hear that, thanks. Was indeed asking more for the CPU scope - is this still a planned feature (CPU support), or is it rather unlikely? |
It's possible. There are some llvmlite PRs in flight that will be in support of it: numba/llvmlite#979 and numba/llvmlite#986 |
This meta issue lists the concrete tasks needed to implement half-precision floating type. (#4395).
llvmlite:
half
typenumba:
float16
type and conversion to/fromnumpy.float16
.The text was updated successfully, but these errors were encountered: