Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Half Float (FP16) support #29

Closed
jamilbk opened this issue Oct 23, 2017 · 9 comments
Closed

Feature request: Half Float (FP16) support #29

jamilbk opened this issue Oct 23, 2017 · 9 comments
Assignees

Comments

@jamilbk
Copy link

jamilbk commented Oct 23, 2017

Are there plans to optimize around 16-bit floats for training models? AMD's Vega supports two 16-bit mul-add per clock which means an RX Vega 56 for under $450 could provide nearly 25 TFLOPS of training performance. And Nvidia's upcoming Volta architecture shows big performance gains in caffe2 when training with 16-bit floats.

There doesn't seem to be many other ML frameworks that support this.

@jamilbk jamilbk changed the title Feature request: Half Float support Feature request: Half Float (FP16) support Oct 23, 2017
@znmeb
Copy link

znmeb commented Oct 23, 2017

My AMD Bonaire on Arch Linux doesn't show 16-bit floats in OpenCL. It gets about 1.8 TFLOPS in 32-bit mode, which ain't shabby compared to something like 50 GFLOPS in 64 bits.

16.bit floats are like slide-rule precision. They're probably fine for computer vision on 8-bit inputs but I'm not going to mess with them on anything else.

@jamilbk
Copy link
Author

jamilbk commented Oct 23, 2017

Apparently Rapid Packed Math (AMD's version of optimized FP16) was just exposed in the 1.6.4 version of the ROCm OpenCL stack. @znmeb Are you using the ROCm or AMDGPU-Pro driver?

@brianretford
Copy link

There are plans! It mostly already works, we just have a few bits to polish up. We have a Vega 10 in house, and we are really excited about it. Last we checked the ROCm OpenCL drivers didn't support half precision yet. It's unclear if the HIP or SPIR-V supports it yet. Volta is an interesting platform. It has a micro tile matrix multiplication unit. We intend to support it too, eventually.

@jamilbk -- great! We'll update our internal driver.

@jamilbk
Copy link
Author

jamilbk commented Oct 23, 2017

@brianretford 😍 That's really great news! I just built a Vega ML rig banking on the assumption someone will optimize around rapid packed math.

Thanks for the prompt reply! Really excited about the work you guys are doing. Looks like I'll be sticking with PlaidML going forward.

@znmeb
Copy link

znmeb commented Oct 23, 2017

@jamilbk I'm using the Arch Linux "port" of the AMDGPU Pro Ubuntu library. Arch does have most of the AMD stack in AUR in various stages of workingness, but it's been a couple of months since I did any testing on it.

The Arch User Repository approach is mixed - when something is open source they'll build it from source on your machine, but if there's a binary they have ways to extract from a Debian / Ubuntu package as well.

The bottom line is that Arch can run almost anything that runs on Ubuntu LTS, but it's a community-driven process.

@brianretford brianretford reopened this Oct 23, 2017
@brianretford
Copy link

Some hardware doesn't support double rate half. I think only the Vega architecture does. You can still get improvements from it on Fiji because of reduced memory bandwidth, but less so

@znmeb
Copy link

znmeb commented Nov 30, 2017

I'm getting back to this - the workstation is now dual-booted Ubuntu 16.04 and Arch, mostly because I bought one of those Movidius neural compute sticks and it only works on Ubuntu. However, I have been unable to get any of the AMD software - proprietary or open source - to work on it in Ubuntu!

I'm planning on taking another run at building ROCm from source this weekend; it's on GitHub and if I can find a hardware compatibility test procedure I'll go ahead and test it and file issues. But in the absence of that I think I'm better off buying an NVidia GPU than pissing away my time further with a piece of hardware that's two or three generations behind what AMD is shipping now. I have zero interest in any AMD proprietary software.

@mirh
Copy link

mirh commented Dec 15, 2017

This guy got that stick to work on debian ftr.
Also, yes 16 bit precision is supported since ROCm 1.3 (>GCN 3rd gen). Though only Vega (5th gen) will get relevant performance improvements
Also also, a very important update for proprietary driver came out this week - maybe that could change something.
Also also also, any amd card you were to buy now, would really *actually* have a fully open source stack (but I guess you had already understood this).
But until that moment, you should stay away from the incompatible ROCm.

Also also also also - god, we are meeting everywhere.

@brianretford
Copy link

FP16 seems to be working both in the old optimizer and in the new iGPU Stripe backend. It provides a modest boost with Stripe. We're tracking ensuring optimization passes can take advantage of various data types internally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants