Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

computational complexity in paper #19

Open
YinzhenWang opened this issue Apr 6, 2024 · 1 comment
Open

computational complexity in paper #19

YinzhenWang opened this issue Apr 6, 2024 · 1 comment

Comments

@YinzhenWang
Copy link

YinzhenWang commented Apr 6, 2024

I would like to ask whether the computational complexity in the paper is correct.

Should it be O((SHW)*( F * T/F)). instead of O((SHW)*( S * T/F)). ?
I think in RS-MMA, there is F audio pitches (length T/F), and each audio pitch is calculated with video pitch (length SHW). Thus, the computational complexity should be O((SHW)*( F * T/F)).

May I ask if my idea is correct? Your comments will be really appreciated.

@TreeberryTomato
Copy link

i think in the paper, the computational complexity is calculated by the size of two sequences, so in O((SHW)( S * T/F)), SHW is the size of video, and ST/F is the size of audio. It should be correct.
However, I am confused that the cross-attention is calculated iteratively for all the segments instead of only one segment mentioned in the paper. So I think the complexity should be O((SHW)*( S * T/F) * F/S)=O((SHW)*T), where extra F/S means it calculates F/S iterations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants