Feature request: Half Float (FP16) support #29

jamilbk · 2017-10-23T17:09:27Z

Are there plans to optimize around 16-bit floats for training models? AMD's Vega supports two 16-bit mul-add per clock which means an RX Vega 56 for under $450 could provide nearly 25 TFLOPS of training performance. And Nvidia's upcoming Volta architecture shows big performance gains in caffe2 when training with 16-bit floats.

There doesn't seem to be many other ML frameworks that support this.

znmeb · 2017-10-23T17:15:19Z

My AMD Bonaire on Arch Linux doesn't show 16-bit floats in OpenCL. It gets about 1.8 TFLOPS in 32-bit mode, which ain't shabby compared to something like 50 GFLOPS in 64 bits.

16.bit floats are like slide-rule precision. They're probably fine for computer vision on 8-bit inputs but I'm not going to mess with them on anything else.

jamilbk · 2017-10-23T17:18:53Z

Apparently Rapid Packed Math (AMD's version of optimized FP16) was just exposed in the 1.6.4 version of the ROCm OpenCL stack. @znmeb Are you using the ROCm or AMDGPU-Pro driver?

brianretford · 2017-10-23T17:19:47Z

There are plans! It mostly already works, we just have a few bits to polish up. We have a Vega 10 in house, and we are really excited about it. Last we checked the ROCm OpenCL drivers didn't support half precision yet. It's unclear if the HIP or SPIR-V supports it yet. Volta is an interesting platform. It has a micro tile matrix multiplication unit. We intend to support it too, eventually.

@jamilbk -- great! We'll update our internal driver.

jamilbk · 2017-10-23T17:22:31Z

@brianretford 😍 That's really great news! I just built a Vega ML rig banking on the assumption someone will optimize around rapid packed math.

Thanks for the prompt reply! Really excited about the work you guys are doing. Looks like I'll be sticking with PlaidML going forward.

znmeb · 2017-10-23T17:28:04Z

@jamilbk I'm using the Arch Linux "port" of the AMDGPU Pro Ubuntu library. Arch does have most of the AMD stack in AUR in various stages of workingness, but it's been a couple of months since I did any testing on it.

The Arch User Repository approach is mixed - when something is open source they'll build it from source on your machine, but if there's a binary they have ways to extract from a Debian / Ubuntu package as well.

The bottom line is that Arch can run almost anything that runs on Ubuntu LTS, but it's a community-driven process.

brianretford · 2017-10-23T17:30:16Z

Some hardware doesn't support double rate half. I think only the Vega architecture does. You can still get improvements from it on Fiji because of reduced memory bandwidth, but less so

znmeb · 2017-11-30T02:15:42Z

I'm getting back to this - the workstation is now dual-booted Ubuntu 16.04 and Arch, mostly because I bought one of those Movidius neural compute sticks and it only works on Ubuntu. However, I have been unable to get any of the AMD software - proprietary or open source - to work on it in Ubuntu!

I'm planning on taking another run at building ROCm from source this weekend; it's on GitHub and if I can find a hardware compatibility test procedure I'll go ahead and test it and file issues. But in the absence of that I think I'm better off buying an NVidia GPU than pissing away my time further with a piece of hardware that's two or three generations behind what AMD is shipping now. I have zero interest in any AMD proprietary software.

mirh · 2017-12-15T18:24:03Z

This guy got that stick to work on debian ftr.
Also, yes 16 bit precision is supported since ROCm 1.3 (>GCN 3rd gen). Though only Vega (5th gen) will get relevant performance improvements
Also also, a very important update for proprietary driver came out this week - maybe that could change something.
Also also also, any amd card you were to buy now, would really *actually* have a fully open source stack (but I guess you had already understood this).
But until that moment, you should stay away from the incompatible ROCm.

Also also also also - god, we are meeting everywhere.

brianretford · 2019-06-28T18:46:56Z

FP16 seems to be working both in the old optimizer and in the new iGPU Stripe backend. It provides a modest boost with Stripe. We're tracking ensuring optimization passes can take advantage of various data types internally

jamilbk changed the title ~~Feature request: Half Float support~~ Feature request: Half Float (FP16) support Oct 23, 2017

brianretford assigned flaub Oct 23, 2017

jamilbk closed this as completed Oct 23, 2017

jamilbk mentioned this issue Oct 23, 2017

OpenCL rapid packed math support for Vega ROCm/ROCm#219

Closed

brianretford reopened this Oct 23, 2017

flaub mentioned this issue Oct 24, 2017

Training with --fp16 fails with "call to 'select' is ambiguous" plaidml/plaidbench#17

Open

freddybc mentioned this issue Jan 16, 2018

plaidml-setup hangs on Clover with clpeak working #71

Closed

Ka-zam mentioned this issue Apr 27, 2018

Benchmarking requests - USE_HALF, faster tuner leela-zero/leela-zero#1273

Closed

brianretford closed this as completed Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Half Float (FP16) support #29

Feature request: Half Float (FP16) support #29

jamilbk commented Oct 23, 2017

znmeb commented Oct 23, 2017

jamilbk commented Oct 23, 2017 •

edited

brianretford commented Oct 23, 2017

jamilbk commented Oct 23, 2017

znmeb commented Oct 23, 2017 •

edited

brianretford commented Oct 23, 2017

znmeb commented Nov 30, 2017

mirh commented Dec 15, 2017

brianretford commented Jun 28, 2019

Feature request: Half Float (FP16) support #29

Feature request: Half Float (FP16) support #29

Comments

jamilbk commented Oct 23, 2017

znmeb commented Oct 23, 2017

jamilbk commented Oct 23, 2017 • edited

brianretford commented Oct 23, 2017

jamilbk commented Oct 23, 2017

znmeb commented Oct 23, 2017 • edited

brianretford commented Oct 23, 2017

znmeb commented Nov 30, 2017

mirh commented Dec 15, 2017

brianretford commented Jun 28, 2019

jamilbk commented Oct 23, 2017 •

edited

znmeb commented Oct 23, 2017 •

edited