Quartus GRU #596

bo3z · 2022-07-11T18:08:47Z

Description

📝 Gated Recurrent Units (GRUs) for Quartus backend

Adds support for GRU Units in Quartus backend

Similar to work done on the Vivado backend (Vivado Backend GRU/LSTM support (PR560) #576) and Quartus LSTM & SimpleRNN (Implementation and optimizations linked to Simple-RNN and LSTM for qu… #575)

Type of change

New feature (non-breaking change which adds functionality)

Tests

Accuracy tests through PyTest, for more details see test/pytest/test_rnn.py

IP simulation using cosim.

Successful synthesis and analysis of device resources and latency (see blow)

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added tests that prove my fix is effective or that my feature works.

Implementation

HLS code for GRU layers consist of two main functions:

gru_cell(t, h, weights, recurrent_weights, bias, recurent_bias) - which takes in the input vector, x, and hidden state, h, at time step t, and produces the new hidden state according to GRU logic (reset, update, candidate state gate) - this function has several loops over the number of GRU units/states; therefore, those loops are often unrolled with the appropriate reuse factor. For results on resource usage and latency, see below.
gru(data, res, weights, recurrent_weights, bias, recurent_bias) - makes use of the previously mentioned function, by traversing through the data at each time step and obtaining the new state, until the final output is obtained. Note, it is not possible to pipeline this function, because there is a loop dependency (LD). Namely, the at every iteration, the state needs to be available so that the new state can be calculated.

The backend containes a layer initialiser and the appropriate templates. Matrix multiplication and bias addition is done through the Dense layer. Finally, a resource strategy optimizer handles matrix transposes needed for Dense multiplication, rather than being done in layer initialising procedures.

Results

Below are latency, DSP, REG and ALM usage results of a GRU layer with a 5-dimensional input, 8 time steps and a variable number of units.

As expected, the latency remains approximately constant when increasing the number of units, while DSPs, REGs and ALM increase at a linear rate. This occurs because the implementation contains several loops unrolled over the number of units/states. Therefore, such an implementation is time-invariant, but resource-ineffficient.

Finally, with the units fixed to 8 and the input size to 5, similar plots are obtained. As the time loop has pipelining disabled (due to loop dependencies), the use of DSPs remains approximately constant. ALMs and REGs increase slightly, because a larger input needs to be stored. The latency increases at a linear rate, as expected.

jmitrevs · 2022-07-29T20:51:35Z

The pytest failure was related to running out of disk space. It probably is unrelated to this PR.

hls4ml/templates/quartus/firmware/nnet_utils/nnet_recurrent.h

Quartus GRU

bo3z requested a review from vloncar July 11, 2022 18:08

bo3z force-pushed the quartus-gru branch from cba30dc to 6b9e91f Compare July 26, 2022 12:09

jmitrevs reviewed Jul 29, 2022

View reviewed changes

hls4ml/templates/quartus/firmware/nnet_utils/nnet_recurrent.h Outdated Show resolved Hide resolved

bo3z force-pushed the quartus-gru branch from 6b9e91f to faf4592 Compare August 1, 2022 08:02

Quartus GRU

e7ab058

bo3z force-pushed the quartus-gru branch from faf4592 to e7ab058 Compare August 1, 2022 08:07

Quartus Streaming GRU

09695b4

bo3z force-pushed the quartus-gru branch from 61c7d3d to 09695b4 Compare August 1, 2022 09:00

vloncar approved these changes Aug 12, 2022

View reviewed changes

Merge branch 'main' into quartus-gru

af0ed00

vloncar merged commit ae31793 into fastmachinelearning:main Aug 12, 2022

calad0i pushed a commit to calad0i/hls4ml that referenced this pull request Jul 1, 2023

Merge pull request fastmachinelearning#596 from bo3z/quartus-gru

712160b

Quartus GRU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quartus GRU #596

Quartus GRU #596

bo3z commented Jul 11, 2022 •

edited

jmitrevs commented Jul 29, 2022

Quartus GRU #596

Quartus GRU #596

Conversation

bo3z commented Jul 11, 2022 • edited

Type of change

Tests

Checklist

Implementation

Results

jmitrevs commented Jul 29, 2022

bo3z commented Jul 11, 2022 •

edited