Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta issue: fp16 support #4402

Open
1 of 7 tasks
sklam opened this issue Aug 2, 2019 · 24 comments
Open
1 of 7 tasks

meta issue: fp16 support #4402

sklam opened this issue Aug 2, 2019 · 24 comments

Comments

@sklam
Copy link
Member

sklam commented Aug 2, 2019

This meta issue lists the concrete tasks needed to implement half-precision floating type. (#4395).

llvmlite:

  • add half type

numba:

  • add float16 type and conversion to/from numpy.float16.
  • setup implicit casting rules. (note: survey what numpy does with float16 casting).
  • implement basic arithmetic ops.
    • add, sub, mul
    • div, rem (separate item because CUDA PTX do not provide these it seems)
  • decide what to do with other math functions? implicit cast to float32 variant?
@seibert
Copy link
Contributor

seibert commented Aug 6, 2019

Related issue is how to handle bfloat16, which seems to be gaining traction and will have limited AVX-512 support on very recent Intel CPU architectures, and is also used on the Google TPU. Haven't seen any GPU hardware support for bfloat16 yet, although AMD's ROCm seems to have added software support.

@njwhite
Copy link
Contributor

njwhite commented Sep 11, 2019

  • Nvidia to add support for f16 ("half") types in NVVM IR - otherwise need to sprinkle llvm.convert.to.fp16.f32 / llvm.convert.from.fp16.f32 everywhere (and lower types.f16 in numba code to i16s in IR).

@MurrayData
Copy link

Just a note that CuPy now supports float16 in user defined kernels.

>>> import cupy as cp
>>> c = cp.random.random(1000000).astype(cp.float16)

@cp.fuse()
... def squared_diff(x, y):
...     return (x - y) * (x - y)

>>> square(c)

array([0.848  , 0.637  , 0.05655, ..., 0.02043, 0.5337 , 0.8125 ],
      dtype=float16)

It would be good to have float16 in Numba kernels to prepare data for deep learning models and RAPIDS Numba based UDFs. At present RAPIDS converts float32 to float16 as a workaround.

@gmarkall
Copy link
Member

I can't edit the issue to tick the box, but it looks like:

llvmlite:

  • add half type

is done in numba/llvmlite#509, and

  • Nvidia to add support for f16 ("half") types in NVVM IR - otherwise need to sprinkle llvm.convert.to.fp16.f32 / llvm.convert.from.fp16.f32 everywhere (and lower types.f16 in numba code to i16s in IR).

is not completed, but there is a merged PR that adds support for fp16 intrinsics: numba/llvmlite#510

@seibert
Copy link
Contributor

seibert commented Aug 17, 2020

I ticked the half box for you.

@seibert
Copy link
Contributor

seibert commented Aug 17, 2020

It also looks like LLVM 11 adds bfloat16 support: llvm/llvm-project@8c24f33158d8

The type name seems to be bfloat.

@GuillaumeLeclerc
Copy link

Hello,

Any updates on this ?

@gmarkall
Copy link
Member

There is some support in CUDA from #7556 and #7460 - this work is ongoing though and there are more PRs that will be needed for full float16 support in CUDA.

@calclavia
Copy link

Seems like with numba right now it's possible to use FP16 inputs, but not FP16 shared memory. Does full fp16 support in CUDA mean we can use fp16 shared memory?

Example (currently does not work):

cuda.shared.array(shape=(1,), dtype=nb.float16)

@lucidrains
Copy link

Any progress?

@Hjorthmedh
Copy link

Tag. Interested to hear if there is an update on this.

@dongrixinyu
Copy link

Float16 is much faster than float32 and float64. So as a JIT tool for python, it is extremely neccesary to support float16 type.

@gmarkall
Copy link
Member

Currently float16 is supported in CUDA (maybe not everything in the latest release, but on main at least). We are still working on adding float16 for CPU targets.

@seibert
Copy link
Contributor

seibert commented Jan 12, 2023

at this point, the only CPUs supporting native FP16 arithmetic are Intel Sapphire Rapids (which just came out) and ARMv8.2 and later, right? I don't think AMD has any FP16 support on CPU at all right now.

@dongrixinyu
Copy link

Currently float16 is supported in CUDA (maybe not everything in the latest release, but on main at least). We are still working on adding float16 for CPU targets.

Does float16 for CPU problem affected by the LLVM compiler not supporting float16?

@gmarkall
Copy link
Member

Does float16 for CPU problem affected by the LLVM compiler not supporting float16?

LLVM supports it, as does llvmlite - support was added in numba/llvmlite#509

For systems that don't have it, perhaps compiler-rt implementations need to be used (@testhound may know / recall what the thinking here was better than I).

@testhound
Copy link
Contributor

@dongrixinyu I am currently working on float16 support for the cpu and @gmarkall is correct, support for LLVM's compiler-rt will be required for targets that do not support float16 instructions.

@dongrixinyu
Copy link

@dongrixinyu I am currently working on float16 support for the cpu and @gmarkall is correct, support for LLVM's compiler-rt will be required for targets that do not support float16 instructions.

@testhound realy looking forward to your support for float16 on the cpu!!! Thanks!

@dongrixinyu
Copy link

@dongrixinyu I am currently working on float16 support for the cpu and @gmarkall is correct, support for LLVM's compiler-rt will be required for targets that do not support float16 instructions.

If you can not spare much time to finish float16, I could help

@ashvardanian
Copy link

This would also affect our work on USearch. It supports f16 and Numba JIT to define the metrics, but not together :) Here is an example

@SkBlaz
Copy link

SkBlaz commented Sep 13, 2023

Hi! Great initiative. Is the initial post's checkbox up-to-date in terms of items finalized?

@gmarkall
Copy link
Member

It's supported on CUDA but not the CPU target. So either all or none of them could be checked depending on what we decide this issue is about.

@SkBlaz
Copy link

SkBlaz commented Sep 14, 2023

Great to hear that, thanks. Was indeed asking more for the CPU scope - is this still a planned feature (CPU support), or is it rather unlikely?

@gmarkall
Copy link
Member

It's possible. There are some llvmlite PRs in flight that will be in support of it: numba/llvmlite#979 and numba/llvmlite#986

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests