Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About paper #6

Closed
BMEI1314 opened this issue Jan 4, 2022 · 2 comments
Closed

About paper #6

BMEI1314 opened this issue Jan 4, 2022 · 2 comments

Comments

@BMEI1314
Copy link

BMEI1314 commented Jan 4, 2022

hi,
We think that mdetr has great potential, but we look at table 6 in the paper and find that the metics of moment retrieval on the charades-sta dataset is not much higher than that of ivg-dcl (in particular, ivg-dcl adopts C3d feature for video extractor and glove for text embedding), and your work uses clip feature + slowfast). Have you ever tested on other video grounding dataset, like activitynets?

@jayleicn
Copy link
Owner

jayleicn commented Jan 4, 2022

Hi @BMEI1314, in our work, we primarily focus on collecting the QVHighlights dataset and developing the MomentDETR model on top of this collected dataset. On CharadesSTA, we did not quite tune the model, but we still notice significant performance improvement on R1@0.5 (e.g., +3, or +5 with pretraining). We did not test on other datasets.

@BMEI1314
Copy link
Author

BMEI1314 commented Jan 5, 2022

Thanks for your quick reply and look forward to your follow-up work

@BMEI1314 BMEI1314 closed this as completed Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants