You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi,
We think that mdetr has great potential, but we look at table 6 in the paper and find that the metics of moment retrieval on the charades-sta dataset is not much higher than that of ivg-dcl (in particular, ivg-dcl adopts C3d feature for video extractor and glove for text embedding), and your work uses clip feature + slowfast). Have you ever tested on other video grounding dataset, like activitynets?
The text was updated successfully, but these errors were encountered:
Hi @BMEI1314, in our work, we primarily focus on collecting the QVHighlights dataset and developing the MomentDETR model on top of this collected dataset. On CharadesSTA, we did not quite tune the model, but we still notice significant performance improvement on R1@0.5 (e.g., +3, or +5 with pretraining). We did not test on other datasets.
hi,
We think that mdetr has great potential, but we look at table 6 in the paper and find that the metics of moment retrieval on the charades-sta dataset is not much higher than that of ivg-dcl (in particular, ivg-dcl adopts C3d feature for video extractor and glove for text embedding), and your work uses clip feature + slowfast). Have you ever tested on other video grounding dataset, like activitynets?
The text was updated successfully, but these errors were encountered: