Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SynOps Calculation #12

Open
Dieguli opened this issue Dec 27, 2023 · 9 comments
Open

SynOps Calculation #12

Dieguli opened this issue Dec 27, 2023 · 9 comments

Comments

@Dieguli
Copy link

Dieguli commented Dec 27, 2023

Hi @ridgerchu , first of all congratulations for your work, it is amazing. I would like to know how you exactly calculate the number for SynOps reported in your paper, as I do not get the same results. Look forward to hearing from you.

@ridgerchu
Copy link
Owner

ridgerchu commented Dec 31, 2023

Hi, thank you for reaching out and for your interest in our work. I'd like to clarify that in the latest version of our paper, which you can find at this link, we no longer use the SynOps metric. We've decided that it wasn't the most appropriate measure for our purposes. Instead, we've switched to using the theoretical power consumption and have provided detailed steps for its calculation in the paper. Please refer to the linked document for more in-depth information.

@Dieguli
Copy link
Author

Dieguli commented Jan 14, 2024

Hi @ridgerchu thank you for your previous answer. I have taken a look to the paper and I am able to replicate almost everything, but the energy-consumption estimate, which is what interests me the most. I would appreciate if you could explain how you can get the spiking firing rate from the code provided or already trained model. Furthermore, I do not manage to infer the attention values reported in Table 1. Could you explain why for the second row, MACs are "2T^2d vs 6Td" and why for the first row MACs are 3d^2T, based on the general equations for the SRWKV and SRFNN blocks as well as that for the self-attention mechanism?. I would eally appreciate your help.

@ridgerchu
Copy link
Owner

Hi,

To measure the spiking rate, you can utilize the hook function in PyTorch. This function allows you to record the outputs of network layers, thereby enabling you to calculate the output firing rate effectively.

Regarding the MACs: the term '3Td^2' refers to the computational consumption for the matrices Q, K, and V in a neural network. Each of these matrices requires 'Td^2' operations for computation. Specifically, in the context of the attention mechanism, the product of matrices Q and K involves a matrix multiplication operation, which results in a computational cost proportional to the square of T (T^2). This is due to the matrix multiplication dynamics in the attention process.

I hope this explanation clarifies your queries. Feel free to reach out if you have more questions!

@Dieguli
Copy link
Author

Dieguli commented Jan 15, 2024

Hi @ridgerchu thanks a lot for the help with the spiking rate calculation!

But I am still struggling with defining the computational complexity of the model to derive the energy consumption from it. I will try to expose my doubts properly:

  1. I understand that for the self-attention mechanism you have 3 operations involved: dot product of Q and K, scaling of this dot product and multiplication of the attention scores with V, which gives a total of
    2T^2d + T^2 FLOPs. Therefore, we have to multiply the resulting number with Emac to get the energy consumption. As you can see, I do not understand where the additional two terms of rows 1 and 4 comes from. I understand that for SpikeGPT, the number of FLOPs of f(Q/R,K,V) is 6Td, as here you use the RWKV version inspired in the attention free transformer. Can you explain what does the 'Q/R,K,V' contribution mean in the case of both Vanilla-GPT and SpikeGPT as well as how you compute its different values?. Also, why in the case of SpikeGPT it does not involve MAC operations but just AC?.

  2. Finally, I would like to know how you compute the FLOPs of the 3 MLPs (values included in rows 5, 6 and 7). Firstly, I would like to make sure that they are a contribution from the SRFNN block. Secondly, I would like to make sure that they are the computations related to Mp, Mg and Ms matrices. I would appreciate if you could give the FLMLP_i values in terms of T and d.

Sorry for such a long question, as I understand that answering it means explaining step by step all the calculations involved in that section of the paper, but maybe it is also helpful for you to include in supplementary materials as a clarification to reviewers.

@ridgerchu
Copy link
Owner

Hi, thank you for reaching out with your questions!

Self-Attention Mechanism Complexity: In reference to the self-attention mechanism's computational complexity and its relation to energy consumption, we align our methodology with the approach used in Spike-Driven Transformer. Specifically, we employ Eac and Emac calculations similar to theirs. In their Spike Neural Network (SNN) model, they utilize Eac, which we have also adopted. For the Td calculation, we followed the precedent set in models like AFT, RWKV, and SpikeGPT, where the combination of R/Q, K, and V variables involves element-wise products, leading to a complexity level of Td.

Calculations of Mp, Mg, and Ms Matrices: The additional terms you're inquiring about originate from the Mp, Mg, and Ms matrices. For Mg, its computation is based on 4Td^2 (with d=512, T=3072), resulting in 3221225472. This number, when multiplied by R=0.15 and Eac = 0.9, yields a value of 434865438.72. For Mp and Ms, we initially considered their calculation to be Td^2 since their matrix size is four times smaller. This was our calculation at the time, and while we strove for accuracy, we acknowledge there might be areas that lack rigor. If you find any discrepancies or have concerns, please feel free to point them out!

@Dieguli
Copy link
Author

Dieguli commented Jan 16, 2024

@ridgerchu I really appreciate the effort you have made to answer my questions. I think that everything is clear now. I will get back to you in case that anything else arises. Thanks!

@Dieguli
Copy link
Author

Dieguli commented Feb 21, 2024

Hi @ridgerchu I was wondering if you new where I could find the wkv implementation class in pytorch, not the current cuda one. Thanks!

@ridgerchu
Copy link
Owner

Hi, you can find PyTorch-Style RWKV code here: link

@Dieguli
Copy link
Author

Dieguli commented Feb 22, 2024

@ridgerchu really appreciate your support. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants