Quartus streaming support for Activations, Dense & Batch Normalization #557

bo3z · 2022-05-31T12:47:42Z

A PR enabling support for io_stream in the Qaurtus backend. Currently, the supported layers for streaming I/O are Dense, Activation and Batch Normalisation. Due to the inherent differences between parallel and streaming interfaces in Intel HLS, this is an extensive PR. Furthermore, Intel's HLS stream has certain requirements when implemented (no copy constructor, global declaration etc.). Therefore, It is recommended to review this PR commit by commit (rather than side-by-side diff), with each commit being briefly explained below. Each commit is self-contained and can be checked out and the project compiled.

2233a68 Allow multiple includes of the same LUT - This simply allows the same look-up tables to be used for both io_parallel and io_stream, by removing the IFNDEF guard
e003234 Quartus custom Stream & distinction between g++ (nnet::stream) and i++ (ihc::stream):

When executing hls4ml.compile() or hls4ml.predict(...), GCC is used. However, GCC doesn't have access to HLS source files (which include Intel's streaming interface, ihc::stream, ihc::stream_in, ihc::stream_out). Unlike ac_int, ac_fixed which are open-source and included in hls4ml, Intel's HLS stream source files are protected by licence and cannot be included in the repository. Therefore, a custom nnet::stream struct is written, having the same high-level function as Intel's HLS ihc::stream, but implemented using queues. Please note, pytests written for streaming layers use nnet::stream. To verify correct functionality of the IP, use cosim with the following command: i++ -march=x86-64 -o myproject_test -v myproject_test.cpp firmware/myproject.cpp
Finally, as per this tutorial by Intel: https://youtu.be/aZYBlkcoj8Q?t=819, streams are always passed by reference to the top-level component. A top-level component with a streaming output is void - instead of returning a stream, it takes the stream object by reference. It is not even possible to return a stream from a component, since the internal implementation contains a explicitly deleted copy constructor.
Finally, there are 3 distinct types of stream - inputs to an HLS component must be of type stream_in, outputs of type stream_out and inter-component connection of type stream. With this distinction, the HLS compiler is able to distinguish between component inputs and outputs. If only stream was used, the component would not synthesize with the correct inputs/outputs

1bbcca5 Top-level & Quartus Writer distinction between io_parallel and io_stream:

This commit handles the inherent differences between io_parallel and io_stream in myproject.cpp, myproject.h and defines.h - in parallel, the top-level component takes a input struct, containing the array of data and returns a struct containing the output array. On the other hand, as explained above, streaming interfaces are void, as both the input and output and output are passed by reference.
Secondly, a new cosim benchmark was written for streaming inputs. Previously (and still for io_parallel), all of the input data is processed, stored to a vector and then passed to the component sequentially. However, due to an explicitly deleted copy constructor, ihc::stream cannot be stored inside a vector. Therefore, this new benchmark processes data from input files and executes the component straight away.
Finally, for io_stream, inter-layer connections (type stream) must be declared outside of main() - Intel HLS has a requirement through which all streams are either passed by reference to the top-level component or declared as global variables. No stream types can be instantiated inside the component.

d600882 Quartus Stream Variable Converter & Backend io_stream enabled:

Adds a stream variable converter, in a similar manner to Vivado
Add nnet::array, as an array-like struct for storing data inside streams. Implementation similar to Vivado.

1b17b3b Quartus Clone Optimizer - Sets up streaming optimizers in Quartus backend and adds the clone optimizer, as the most commonly used. From here, it should be straight forward to add further streaming passes.
d4da71f Tanh bug fix in Quartus - Addresses a small bug when invoking TanH activation. This is simply a pre-requisite for writing streaming activations.
The other commits add support for streaming Dense, BatchNormalization and Activation, in a similar manner to Vivado, with the appropriate pipeline initiation interval:

Existing PyTests were expanded to include io_stream. Some new tests (test_activations.py) were written as well, to verify correct results of less-frequently used layers. Important to note, a passing PyTest does not imply correct HLS outputs - as explained above, PyTest uses nnet::stream and HLS, when synthesizing uses ihc::stream - these two are inherently different, even though they offer the same high-level funcionality. To verify correct HLS behaviour, cosim should be used (explained above), as well as some RTL-level simulation, such as ModelSim or Questa. All of the above layers were tested using both PyTest and cosim. Finally, all of the above layers synthesized correctly and produced a valid IP block.
Finally, some of the Quartus-equivalent pragmas couldn't be used. For example, Vivado offers a #pragma HLS data_pack directive, for which the Intel equivalent would be hls_bankwidth. However, hls_bankwidth (similar to other memory-optimisation pragmas, except for hls_register) is not supported for variables passed by reference/pointers, which is a necessity for correct functionality.

bo3z · 2022-07-08T10:36:16Z

cppname is being removed in #562, this PR will also have to reflect these changes

jmitrevs · 2022-07-12T20:00:06Z

Can you expand a bit on why multiple includes are needed (e8c9170). It seems a bit strange to me to include the same header file twice.

jmitrevs · 2022-07-12T20:41:11Z

@bo3z, can you fix the conflicts? I understand that it's high priority to get this in fairly soon, so I'd like to make a pr/557 to run all the tests.

bo3z · 2022-07-13T08:19:19Z

Can you expand a bit on why multiple includes are needed (e8c9170). It seems a bit strange to me to include the same header file twice.

Unlike Vivado, Quartus generates look-up tables thorugh Python and stores them in .h files. These files simply contain a single C-array, the entries of the LUT. These header files are only included in the function using that LUT, not globally. io_parallel and io_stream are simply overloads of the same function, but completely independent from each other. So, both of them have to include the LUT header file. If one of the overloads isn't able to include the header files, compilation will fail, with a message along the lines of 'No variable named exp_table'. Keeping the #IFNDEF guard prevents both files to be included and compilation fails.

Now I am aware that this is very bad practice, especially in a big project. One alternative is generating the same LUT twice, once with the suffix parallel and once with the suffix stream. This will slow down GCC compilation time, but should have no impact on HLS synthesis, as HLS compiler removes all unused code. Using a linker (-flto flag) during GCC compilation might also do the trick (I haven't tried this) - it will remove all unused code and we can always be certain that only one of the two activation functions is called at a time.

bo3z · 2022-07-13T10:13:48Z

@bo3z, can you fix the conflicts? I understand that it's high priority to get this in fairly soon, so I'd like to make a pr/557 to run all the tests.

Rebased and pushed. All the tests should pass now.

jmitrevs · 2022-07-23T22:44:04Z

About the multiple includes, I think it's fine. The includes are used inside of functions, so effectively define the variables there, if I understand correctly.

jmitrevs · 2022-07-24T15:54:52Z

I see the following errors whey I run pytests:

FAILED test_qkeras.py::test_qactivation_kwarg[quantized_bits(4, 0, alpha=1)-quantized_relu(8, 0)] - AssertionError: 
FAILED test_transpose_concat.py::test_accuracy[io_stream] - AssertionError:

jmitrevs · 2022-07-24T16:08:02Z

(In general, though, the PR looks good.)

bo3z · 2022-07-24T16:49:47Z

I see the following errors whey I run pytests:

FAILED test_qkeras.py::test_qactivation_kwarg[quantized_bits(4, 0, alpha=1)-quantized_relu(8, 0)] - AssertionError: 
FAILED test_transpose_concat.py::test_accuracy[io_stream] - AssertionError:

Could you post the full CI/CD log please? Or maybe re-run the tests ? I just re-ran both of these tests locally and they passed. test_transpose_concat.py really should not fail, because it defaults to Vivado and I have not modified it, so my PR should have no impact on it at all?

bo3z · 2022-07-24T16:54:52Z

About the multiple includes, I think it's fine. The includes are used inside of functions, so effectively define the variables there, if I understand correctly.

That's correct.

jmitrevs · 2022-07-24T17:08:50Z

They both passed. I don't understand what happened before. They had previously failed by not matching perfectly.

I have to run now, but I want to push your branch to pr/557 on the upstream repository to trigger the pytests as part of the CI (you can do it if you want) and I'll look over the code one more time before approving.

bo3z · 2022-07-24T17:11:44Z

They both passed. I don't understand what happened before. They had previously failed by not matching perfectly.

I have to run now, but I want to push your branch to pr/557 on the upstream repository to trigger the pytests as part of the CI (you can do it if you want) and I'll look over the code one more time before approving.

No clue what happened there, but I've seen this happen once before as well. Sometimes the CI just fails. I will push to pr/557 now, all tests should pass. I've just merged main into this branch.

bo3z · 2022-07-25T08:39:34Z

E           AssertionError: 
E           Not equal to tolerance rtol=1, atol=0.00390625
E           
E           Mismatched elements: 1 / 1000 (0.1%)
E           Max absolute difference: 0.0078125
E           Max relative difference: 1.
E            x: array([0.      , 0.      , 0.042969, 0.203125, 0.214844, 0.      ,
E                  0.371094, 0.046875, 0.      , 0.136719, 0.      , 0.      ,
E                  0.570312, 0.15625 , 0.121094, 0.089844, 0.792969, 0.      ,...
E            y: array([0.      , 0.      , 0.039062, 0.199219, 0.210938, 0.      ,
E                  0.367188, 0.042969, 0.      , 0.132812, 0.      , 0.      ,
E                  0.566406, 0.15[234](https://gitlab.cern.ch/fastmachinelearning/hls4ml/-/jobs/23506353#L234)4, 0.117188, 0.085938, 0.789062, 0.      ,...
test_qkeras.py:309: AssertionError

This PyTest still seems to be failing, by a very small margin. However, this test defaults to Vivado, so I'm not sure why that's the case?

jmitrevs · 2022-07-25T15:36:52Z

This PyTest still seems to be failing, by a very small margin. However, this test defaults to Vivado, so I'm not sure why that's the case?

I wonder if it's a different random seed.

vloncar

Looks amazing. Can you fix the two minor cosmetic issues before we merge?

hls4ml/writer/quartus_writer.py

hls4ml/templates/quartus/myproject_test_stream.cpp

hls4ml/templates/quartus/firmware/nnet_utils/nnet_batchnorm_stream.h

…to avoid generating same LUT twice.

…+ (ihc::stream)

bo3z · 2022-07-26T10:35:08Z

All comments addressed. Needed to rebase one final time to avoid merge conflicts as well as include a small amount of code from a different branch in quartus_writer.py (lines 95-107), as previously synthesis failed (I had this same PR on two branches, to avoid any loss of code). All tests pass now and verified synthesis and HLS simulation with a few single-layer and multi-layer models.

It might be worth including in the CI/CD process a step for synthesising Quartus designs, as we don't currently support that.

jmitrevs

Vladimir should check his change requests first, but from my side things look good.

vloncar · 2022-07-26T14:39:18Z

I'm fine with it. No idea about the test failure. It seems to choose the same seed every time for this PR, which is strange. I'll approve it, not sure if we should merge, or investigate the test more? For sure no changes in this PR cause the test to fail.

bo3z · 2022-07-26T20:33:41Z

I'm fine with it. No idea about the test failure. It seems to choose the same seed every time for this PR, which is strange. I'll approve it, not sure if we should merge, or investigate the test more? For sure no changes in this PR cause the test to fail.

I would maybe merge it (just to avoid further merge conflicts and kick of the work on streaming CNNs) and see if this test fails on other PRs, rebased on top of this one?

vloncar · 2022-07-26T20:37:26Z

Let's go!!!

bo3z requested a review from vloncar May 31, 2022 12:47

vloncar mentioned this pull request Jun 1, 2022

Quartus Embedding Layer #548

Merged

bo3z force-pushed the quartus-stream branch from 0fbf3c7 to 519af72 Compare July 5, 2022 09:00

bo3z mentioned this pull request Jul 11, 2022

Quartus GRU #596

Merged

6 tasks

bo3z force-pushed the quartus-stream branch from 519af72 to 0265efb Compare July 13, 2022 10:11

vloncar requested changes Jul 25, 2022

View reviewed changes

hls4ml/writer/quartus_writer.py Outdated Show resolved Hide resolved

hls4ml/templates/quartus/myproject_test_stream.cpp Show resolved Hide resolved

jmitrevs reviewed Jul 25, 2022

View reviewed changes

hls4ml/templates/quartus/firmware/nnet_utils/nnet_batchnorm_stream.h Show resolved Hide resolved

bo3z added 4 commits July 26, 2022 08:52

Allow multiple includes of same LUT - needed for stream activations, …

2233a68

…to avoid generating same LUT twice.

Quartus custom Stream & distinction between g++ (nnet::stream) and i+…

e003234

…+ (ihc::stream)

Top-level & Quartus Writer distinction between io_parallel and io_stream

1bbcca5

Quartus Stream Variable Converter & Backend io_stream enabled

d600882

bo3z force-pushed the quartus-stream branch from 2b0e724 to 7f0a143 Compare July 26, 2022 10:16

bo3z added 4 commits July 26, 2022 11:27

Quartus Clone Optimizer

1b17b3b

Quartus streaming Softmax

74dfb6c

Tanh bug fix in Quartus

d4da71f

Quartus streaming activations

dc68c2c

bo3z added 3 commits July 26, 2022 11:27

Quartus streaming batch normalisation

28da4b8

Quartus streaming Dense layer

f76332e

Extended support for streaming qkeras

54eb16b

bo3z force-pushed the quartus-stream branch from 7f0a143 to 54eb16b Compare July 26, 2022 10:28

jmitrevs approved these changes Jul 26, 2022

View reviewed changes

vloncar approved these changes Jul 26, 2022

View reviewed changes

vloncar merged commit c5507f3 into fastmachinelearning:main Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quartus streaming support for Activations, Dense & Batch Normalization #557

Quartus streaming support for Activations, Dense & Batch Normalization #557

bo3z commented May 31, 2022 •

edited

bo3z commented Jul 8, 2022

jmitrevs commented Jul 12, 2022

jmitrevs commented Jul 12, 2022

bo3z commented Jul 13, 2022

bo3z commented Jul 13, 2022

jmitrevs commented Jul 23, 2022

jmitrevs commented Jul 24, 2022

jmitrevs commented Jul 24, 2022

bo3z commented Jul 24, 2022

bo3z commented Jul 24, 2022

jmitrevs commented Jul 24, 2022

bo3z commented Jul 24, 2022

bo3z commented Jul 25, 2022

jmitrevs commented Jul 25, 2022

vloncar left a comment

bo3z commented Jul 26, 2022

jmitrevs left a comment

vloncar commented Jul 26, 2022

bo3z commented Jul 26, 2022

vloncar commented Jul 26, 2022

Quartus streaming support for Activations, Dense & Batch Normalization #557

Quartus streaming support for Activations, Dense & Batch Normalization #557

Conversation

bo3z commented May 31, 2022 • edited

bo3z commented Jul 8, 2022

jmitrevs commented Jul 12, 2022

jmitrevs commented Jul 12, 2022

bo3z commented Jul 13, 2022

bo3z commented Jul 13, 2022

jmitrevs commented Jul 23, 2022

jmitrevs commented Jul 24, 2022

jmitrevs commented Jul 24, 2022

bo3z commented Jul 24, 2022

bo3z commented Jul 24, 2022

jmitrevs commented Jul 24, 2022

bo3z commented Jul 24, 2022

bo3z commented Jul 25, 2022

jmitrevs commented Jul 25, 2022

vloncar left a comment

Choose a reason for hiding this comment

bo3z commented Jul 26, 2022

jmitrevs left a comment

Choose a reason for hiding this comment

vloncar commented Jul 26, 2022

bo3z commented Jul 26, 2022

vloncar commented Jul 26, 2022

bo3z commented May 31, 2022 •

edited