Softmax layer latency #20

jmduarte · 2017-11-17T15:47:12Z

Using the branch nt/resource-reuse-api I checked what the latency and resource usage is for the 3-layer model with two ReuseFactor test cases (below). In either case the softmax layer takes 34 clocks and I was going to check the code to see if this is expected.

ReuseFactor: 1

+ Latency (clock cycles): 
    * Summary: 
    +-----+-----+-----+-----+----------+
    |  Latency  |  Interval | Pipeline |
    | min | max | min | max |   Type   |
    +-----+-----+-----+-----+----------+
    |   59|   59|    1|    1| dataflow |
    +-----+-----+-----+-----+----------+

    + Detail: 
        * Instance: 
        +----------------------------------------+-----------------------+-----+-----+-----+-----+----------+
        |                                        |                       |  Latency  |  Interval | Pipeline |
        |                Instance                |         Module        | min | max | min | max |   Type   |
        +----------------------------------------+-----------------------+-----+-----+-----+-----+----------+
        |grp_compute_layer_0_0_0_2_fu_440        |compute_layer_0_0_0_2  |    5|    5|    1|    1| function |
        |grp_compute_layer_0_0_0_1_fu_508        |compute_layer_0_0_0_1  |    4|    4|    1|    1| function |
        |grp_compute_layer_0_0_0_3_fu_539        |compute_layer_0_0_0_3  |    4|    4|    1|    1| function |
        |grp_softmax_fu_575                      |softmax                |   34|   34|    1|    1| function |
        |grp_compute_layer_0_0_0_s_fu_587        |compute_layer_0_0_0_s  |    3|    3|    1|    1| function |
        |call_ret2_relu_2_fu_623                 |relu_2                 |    0|    0|    1|    1| function |
        |call_ret4_relu_1_fu_691                 |relu_1                 |    0|    0|    1|    1| function |
        |call_ret_relu_fu_727                    |relu                   |    0|    0|    1|    1| function |
        |StgValue_114_myproject_entry3_fu_763    |myproject_entry3       |    0|    0|    0|    0|   none   |
        |StgValue_115_myproject_entry490_fu_848  |myproject_entry490     |    0|    0|    0|    0|   none   |
        |StgValue_572_Block_proc_fu_906          |Block_proc             |    0|    0|    0|    0|   none   |
        +----------------------------------------+-----------------------+-----+-----+-----+-----+----------+

ReuseFactor: 4

+ Latency (clock cycles): 
    * Summary: 
    +-----+-----+-----+-----+----------+
    |  Latency  |  Interval | Pipeline |
    | min | max | min | max |   Type   |
    +-----+-----+-----+-----+----------+
    |   69|   69|    4|    4| dataflow |
    +-----+-----+-----+-----+----------+

    + Detail: 
        * Instance: 
        +----------------------------------------+-----------------------+-----+-----+-----+-----+----------+
        |                                        |                       |  Latency  |  Interval | Pipeline |
        |                Instance                |         Module        | min | max | min | max |   Type   |
        +----------------------------------------+-----------------------+-----+-----+-----+-----+----------+
        |grp_compute_layer_0_0_0_2_fu_440        |compute_layer_0_0_0_2  |    6|    6|    3|    3| function |
        |grp_compute_layer_0_0_0_3_fu_508        |compute_layer_0_0_0_3  |    6|    6|    3|    3| function |
        |grp_compute_layer_0_0_0_s_fu_539        |compute_layer_0_0_0_s  |    7|    7|    4|    4| function |
        |grp_softmax_fu_575                      |softmax                |   34|   34|    1|    1| function |
        |grp_compute_layer_0_0_0_1_fu_587        |compute_layer_0_0_0_1  |    7|    7|    4|    4| function |
        |call_ret2_relu_fu_623                   |relu                   |    0|    0|    1|    1| function |
        |call_ret4_relu_2_fu_691                 |relu_2                 |    0|    0|    1|    1| function |
        |call_ret_relu_1_fu_727                  |relu_1                 |    0|    0|    1|    1| function |
        |StgValue_125_myproject_entry3_fu_763    |myproject_entry3       |    0|    0|    0|    0|   none   |
        |StgValue_126_myproject_entry505_fu_848  |myproject_entry505     |    0|    0|    0|    0|   none   |
        |StgValue_593_Block_proc_fu_906          |Block_proc             |    0|    0|    0|    0|   none   |
        +----------------------------------------+-----------------------+-----+-----+-----+-----+----------+

The text was updated successfully, but these errors were encountered:

benjaminkreis · 2017-11-17T15:53:47Z

This is expected. With <16,6> fixed point, I found that it adds 30 clocks of latency. With <32,8>, it adds 60 clocks. Division is slow :(

benjaminkreis · 2017-11-17T15:57:22Z

I think we can close this for now. If we want to do something similar without actually doing full softmax, @ejk43 had some ideas (e.g. find the maximum).

Branch to test non-streaming relu activation function

benjaminkreis closed this as completed Nov 17, 2017

GiuseppeDiGuglielmo pushed a commit that referenced this issue Oct 13, 2023

Merge pull request #20 from dgburnette/relu_no_stream_branch

cb8b971

Branch to test non-streaming relu activation function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Softmax layer latency #20

Softmax layer latency #20

jmduarte commented Nov 17, 2017

benjaminkreis commented Nov 17, 2017 •

edited

benjaminkreis commented Nov 17, 2017

Softmax layer latency #20

Softmax layer latency #20

Comments

jmduarte commented Nov 17, 2017

benjaminkreis commented Nov 17, 2017 • edited

benjaminkreis commented Nov 17, 2017

benjaminkreis commented Nov 17, 2017 •

edited