SynOps Calculation #12

Dieguli · 2023-12-27T13:04:33Z

Hi @ridgerchu , first of all congratulations for your work, it is amazing. I would like to know how you exactly calculate the number for SynOps reported in your paper, as I do not get the same results. Look forward to hearing from you.

ridgerchu · 2023-12-31T01:03:57Z

Hi, thank you for reaching out and for your interest in our work. I'd like to clarify that in the latest version of our paper, which you can find at this link, we no longer use the SynOps metric. We've decided that it wasn't the most appropriate measure for our purposes. Instead, we've switched to using the theoretical power consumption and have provided detailed steps for its calculation in the paper. Please refer to the linked document for more in-depth information.

Dieguli · 2024-01-14T05:41:26Z

Hi @ridgerchu thank you for your previous answer. I have taken a look to the paper and I am able to replicate almost everything, but the energy-consumption estimate, which is what interests me the most. I would appreciate if you could explain how you can get the spiking firing rate from the code provided or already trained model. Furthermore, I do not manage to infer the attention values reported in Table 1. Could you explain why for the second row, MACs are "2T^2d vs 6Td" and why for the first row MACs are 3d^2T, based on the general equations for the SRWKV and SRFNN blocks as well as that for the self-attention mechanism?. I would eally appreciate your help.

ridgerchu · 2024-01-15T03:37:46Z

Hi,

To measure the spiking rate, you can utilize the hook function in PyTorch. This function allows you to record the outputs of network layers, thereby enabling you to calculate the output firing rate effectively.

Regarding the MACs: the term '3Td^2' refers to the computational consumption for the matrices Q, K, and V in a neural network. Each of these matrices requires 'Td^2' operations for computation. Specifically, in the context of the attention mechanism, the product of matrices Q and K involves a matrix multiplication operation, which results in a computational cost proportional to the square of T (T^2). This is due to the matrix multiplication dynamics in the attention process.

I hope this explanation clarifies your queries. Feel free to reach out if you have more questions!

Dieguli · 2024-01-15T06:34:35Z

Hi @ridgerchu thanks a lot for the help with the spiking rate calculation!

But I am still struggling with defining the computational complexity of the model to derive the energy consumption from it. I will try to expose my doubts properly:

I understand that for the self-attention mechanism you have 3 operations involved: dot product of Q and K, scaling of this dot product and multiplication of the attention scores with V, which gives a total of
2T^2d + T^2 FLOPs. Therefore, we have to multiply the resulting number with Emac to get the energy consumption. As you can see, I do not understand where the additional two terms of rows 1 and 4 comes from. I understand that for SpikeGPT, the number of FLOPs of f(Q/R,K,V) is 6Td, as here you use the RWKV version inspired in the attention free transformer. Can you explain what does the 'Q/R,K,V' contribution mean in the case of both Vanilla-GPT and SpikeGPT as well as how you compute its different values?. Also, why in the case of SpikeGPT it does not involve MAC operations but just AC?.
Finally, I would like to know how you compute the FLOPs of the 3 MLPs (values included in rows 5, 6 and 7). Firstly, I would like to make sure that they are a contribution from the SRFNN block. Secondly, I would like to make sure that they are the computations related to Mp, Mg and Ms matrices. I would appreciate if you could give the FLMLP_i values in terms of T and d.

Sorry for such a long question, as I understand that answering it means explaining step by step all the calculations involved in that section of the paper, but maybe it is also helpful for you to include in supplementary materials as a clarification to reviewers.

ridgerchu · 2024-01-15T13:25:31Z

Hi, thank you for reaching out with your questions!

Self-Attention Mechanism Complexity: In reference to the self-attention mechanism's computational complexity and its relation to energy consumption, we align our methodology with the approach used in Spike-Driven Transformer. Specifically, we employ Eac and Emac calculations similar to theirs. In their Spike Neural Network (SNN) model, they utilize Eac, which we have also adopted. For the Td calculation, we followed the precedent set in models like AFT, RWKV, and SpikeGPT, where the combination of R/Q, K, and V variables involves element-wise products, leading to a complexity level of Td.

Calculations of Mp, Mg, and Ms Matrices: The additional terms you're inquiring about originate from the Mp, Mg, and Ms matrices. For Mg, its computation is based on 4Td^2 (with d=512, T=3072), resulting in 3221225472. This number, when multiplied by R=0.15 and Eac = 0.9, yields a value of 434865438.72. For Mp and Ms, we initially considered their calculation to be Td^2 since their matrix size is four times smaller. This was our calculation at the time, and while we strove for accuracy, we acknowledge there might be areas that lack rigor. If you find any discrepancies or have concerns, please feel free to point them out!

Dieguli · 2024-01-16T08:01:58Z

@ridgerchu I really appreciate the effort you have made to answer my questions. I think that everything is clear now. I will get back to you in case that anything else arises. Thanks!

Dieguli · 2024-02-21T09:27:34Z

Hi @ridgerchu I was wondering if you new where I could find the wkv implementation class in pytorch, not the current cuda one. Thanks!

ridgerchu · 2024-02-22T00:58:00Z

Hi, you can find PyTorch-Style RWKV code here: link

Dieguli · 2024-02-22T15:13:52Z

@ridgerchu really appreciate your support. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SynOps Calculation #12

SynOps Calculation #12

Dieguli commented Dec 27, 2023

ridgerchu commented Dec 31, 2023 •

edited

Loading

Dieguli commented Jan 14, 2024 •

edited

Loading

ridgerchu commented Jan 15, 2024

Dieguli commented Jan 15, 2024

ridgerchu commented Jan 15, 2024

Dieguli commented Jan 16, 2024

Dieguli commented Feb 21, 2024

ridgerchu commented Feb 22, 2024

Dieguli commented Feb 22, 2024

SynOps Calculation #12

SynOps Calculation #12

Comments

Dieguli commented Dec 27, 2023

ridgerchu commented Dec 31, 2023 • edited Loading

Dieguli commented Jan 14, 2024 • edited Loading

ridgerchu commented Jan 15, 2024

Dieguli commented Jan 15, 2024

ridgerchu commented Jan 15, 2024

Dieguli commented Jan 16, 2024

Dieguli commented Feb 21, 2024

ridgerchu commented Feb 22, 2024

Dieguli commented Feb 22, 2024

ridgerchu commented Dec 31, 2023 •

edited

Loading

Dieguli commented Jan 14, 2024 •

edited

Loading