Support FPGA Xilinx #26691

Belkharym · 2019-09-23T23:59:28Z

Hello, World!

We are a group intending to accelerate some Pytorch operations on Xilinx UltraScale FPGAs. However, we are a little lost to where to begin to port the functions.

From what we could see, we think we can start from the CUDA implementation and modify it to use the OpenCL API and add an FPGA device type and components (Streams, Storage, Tensors, ...).

We would like to have some guidance on the right way to take. Would you be so kind to help us?
Thank you.

jspisak · 2019-09-26T03:59:06Z

@Belkharym - glad to connect with you to chat through this before jumping into the technical details. Can you send me an email at jspisak@fb.com so we can find a time?

dylanbespalko · 2020-01-15T18:52:12Z

Hello,

Xilinx recently released Vitis High Level Synthesis which supports FPGA programming using C++ and Verilog. I have added a proof-of-concept with relevant links here.

I am hoping to work with both:

PyTorch to enable OpenCL, FPGA support.
Xilinx Vitis Libraries to enable complex number support.

Please contact me if you need more information.

jspisak · 2020-02-02T07:03:44Z

thanks!! (and apologies for the delayed response). Is this something you would be willing to develop out of tree but promoted as part of the pytorch ecosystem? These are really cool projects but we try hard to keep the core lean and have it as modular as possible.

btw, can you state more about the plans for complex number support? Are you for example planning to support quaternions?

dylanbespalko · 2020-02-02T17:22:43Z

Hi @jspisak,

Out-of-Tree Project

Yes this should absolutely remain out-of-tree.
I am using the DeviceType::FPGA device type, however there are two other FPGA solutions from Xilinx alone. Vitis is supposed to replace the other two, but it takes time.
Unlike the CPU/GPU, I had to use a config file to reduce the build time by:
- Specifying which kernels to build (BinaryOps, SpectralOps, ReduceOps, etc).
- Specifying which data types to build (int, float, double, std::complex).
However, FPGA development has a software emulation mode where you might be able build every kernel simultaneously and run all of the unit tests.

In-Tree Changes (minimal)

Sometimes there are assertions that block execution on the FPGA.
I will submit a few PRs to fix these as they block me.

Support for Quaternions (Higher-Order Spaces).

Yes, Vitis can support array data types as long as the combined data does not exceed the memory bit width (typically 512 bits on servers).
Eg). 512 bit width = float32 bits * 16 dimensions (Maximum number of dimensions for 32-bit)
Eg). 512 bit width = float128 bits *4 dimensions (Maximum precision for Quaternions)
I have implemented a generalization of Vec256 (from PyTorch) called Vec which allows for this flexibility using C++ class templates.
I will write a blog about this next week

Promotion in the PyTorch Ecosystem.

The sw license is the same as PyTorch.
I'm not looking to do this for profit.
I am always looking for jobs in the San Francisco Bay Area related to Radio Frequency or Optical communications.

gchanan · 2020-02-18T20:10:49Z

out-of-tree sounds right and we are happy to accept fixes for the assertions that break you.

tataetae · 2020-02-18T22:48:22Z

Hi @jspisak @dylanbespalko,
I am a graduate researcher in a group focusing on accelerating machine learning on hardware. We have built a int-16 inference convolution kernel on Xilinx FPGA (Alveo U250) and have some idea about integrating them into Pytorch. We think it would be really cool if we can have a Pytorch convolution inference layer that runs on FPGA and you can just call .fpga() on the layer to have it runs on FPGA. Since Pytorch already has support for quantized data structure, we could have retraining and quantization done all in Pytorch.

Would you mind if we would like to discuss more about implementing convolution inference on FPGA in Pytorch? Thanks!

dylanbespalko · 2020-02-18T23:01:23Z

Would you mind if we would like to discuss more about implementing convolution inference on FPGA in Pytorch? Thanks!

@tataetae,

I have implemented the very basic math kernels here. Development is on-going, but I think I have all binary functions (eg a + b) and unary functions (eg. sin(a)) covered. I have been testing on the Alveo U200 card, but I'm just using sw_emu and hw_emu modes for now.

As for my future development:

I mostly work on embedded FPGAs (Xilinx Zynq, Versal)
I implement non-neural network functions.
I currently just use autograd.
I am using floating-point precision until I can decide what to do about fixed precision

If you would like to develop in my repo, send me a Gitlab ID and tend me when I need to clean up my act.

tataetae · 2020-02-20T19:08:17Z

@dylanbespalko sure, I am more than happy to talk about this. Is there a way to DM or email you?

dylanbespalko · 2020-02-20T19:11:22Z

@dylanbespalko sure, I am more than happy to talk about this. Is there a way to DM or email you?

You can register for pytorch.slack.com and find me as Dylan Bespalko. Or you can email me here.

I am writing a blog right now that outlines the project status and how to contribute.

dylanbespalko · 2020-02-25T18:59:26Z

@tataetae,

I have posted a tutorial on my work integrating PyTorch with Xilinx Vitis/Vivado:
pytorch-for-fpga-part-1-heterogeneous-processing
pytorch-for-fpga-part-2-basic-fpga-optimizations
pytorch-for-fpga-part-3-advanced-fpga-optimizations
pytorch-for-fpga-part-4-deploying-pytorch-kernels

I still need to make some in-tree changes to PyTorch (See PyTorch WIP: Add FPGATensorId for Xilinx Vitis Devices #32920)
I anticipate that I will be developing math kernels for the FPGA for another 2-months.
There is additional work for calling multiple math operations. This can be done in two ways.
1. Registering a new top-level math kernel to calls sub-kernels.
2. Exporting the PyTorch graph to a Xilinx .cfg file (may require changes to the PyTorch JIT)

I'm have no stress in my life, so I'm going to develop a bunch of math kernels and then export the PyTorch graph.

dylanbespalko · 2020-02-25T20:31:50Z

@jspisak, @anjali411, @ezyang

btw, can you state more about the plans for complex number support? Are you for example planning to support quaternions?

Here is an update:

I have blogged about how to deploy math kernels for real numbers.
I have generalized the code to work with complex and quaternions.
- Real at::vec::r1::Vec<T, LOG2BW>
- Complex at::vec::r2::Vec<T, LOG2BW>(Coming Soon)
- Quaternions at::vec::r4::Vec<T, LOG2BW> (Future Work)
- These templates should not require explicit specialization for different dtypes!

Here are some issues:

PyTorch: WIP: Add FPGATensorId for Xilinx Vitis Devices #32920 is a WIP that adds support for FPGA devices.
- I need to enable USE_OPENCL=1 however I think there is some dead code in caffe2/contrib/opencl/CMakeLists.txt that I have commented out. Is anybody using caffe2/contrib/opencl? It doesn't compile.
PyTorch: would need to add a Quaternions dypes in-tree.
Vitis: Supports Vectorized data types with (WideType), but it still seems flakey and doesn't work with all other optimizations.
- Please vote up this issue on the Xilinx Forum and Github.

The last issue is very scary for me. Please vote up the issue.

VitalyFedyunin added triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Sep 24, 2019

gchanan removed the needs research We need to decide whether or not this merits inclusion, based on research world label Feb 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support FPGA Xilinx #26691

Support FPGA Xilinx #26691

Belkharym commented Sep 23, 2019

jspisak commented Sep 26, 2019

dylanbespalko commented Jan 15, 2020

jspisak commented Feb 2, 2020

dylanbespalko commented Feb 2, 2020

gchanan commented Feb 18, 2020

tataetae commented Feb 18, 2020

dylanbespalko commented Feb 18, 2020

tataetae commented Feb 20, 2020

dylanbespalko commented Feb 20, 2020

dylanbespalko commented Feb 25, 2020

dylanbespalko commented Feb 25, 2020 •

edited

Support FPGA Xilinx #26691

Support FPGA Xilinx #26691

Comments

Belkharym commented Sep 23, 2019

jspisak commented Sep 26, 2019

dylanbespalko commented Jan 15, 2020

jspisak commented Feb 2, 2020

dylanbespalko commented Feb 2, 2020

Out-of-Tree Project

In-Tree Changes (minimal)

Support for Quaternions (Higher-Order Spaces).

Promotion in the PyTorch Ecosystem.

gchanan commented Feb 18, 2020

tataetae commented Feb 18, 2020

dylanbespalko commented Feb 18, 2020

tataetae commented Feb 20, 2020

dylanbespalko commented Feb 20, 2020

dylanbespalko commented Feb 25, 2020

dylanbespalko commented Feb 25, 2020 • edited

dylanbespalko commented Feb 25, 2020 •

edited