You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
In the paper, it is mentioned that the in the ORVIT block the object region attention is carried out by different q, k and v values i.e; q is set to the patch tokens and k,v are set as the concatenated tokens from the patches and the object regions.
X = THWd , C = T(HW+O)d
So, in the object-region attention; it should be (acc to the paper) : Q = XWq; k = CWk; V = CWv
However, in the code, I realize that the concatenated tokens are being passed to the trajectory attention module.
me too, when I tried to run train code with AVA dataset using MVIT_16X4.yaml file, I got an error getting unexpected keyword argument 'drop_rate'. And also having trouble downloading Something-Something V2 and SomethingElse dataset cause it has 503 error on its downloading webpage. Is there any way to solve these issues??
Hello,
In the paper, it is mentioned that the in the ORVIT block the object region attention is carried out by different q, k and v values i.e; q is set to the patch tokens and k,v are set as the concatenated tokens from the patches and the object regions.
X = THWd , C = T(HW+O)d
So, in the object-region attention; it should be (acc to the paper) : Q = XWq; k = CWk; V = CWv
However, in the code, I realize that the concatenated tokens are being passed to the trajectory attention module.
ORViT/slowfast/models/ORViT/orvit.py
Line 149 in 3bfd2c7
Also, in the trajectory attention module,
ORViT/slowfast/models/attention.py
Line 479 in 3bfd2c7
Can you please help me explain this ? I cant seem to find where the original patch tokens are set to the q for the trajectory attention mechanism.
Thanks :)
The text was updated successfully, but these errors were encountered: