New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test set of MSR-VTT for downstream evaluation #28
Comments
An external user here that has run into the same. MSR-VTT 1K-A (from JSFusion work) doesn't have a "val" split, so people kind of use the names "test" and "val" interchangeably for it. |
@bryant1410 Did you evaluate the pretrained model on MSR-VTT 1K-A test set? Both the zero-shot and the finetuned results are higher than that reported in Table 5 of the paper. |
I think I haven't run the fine-tuned one with the provided model. For zero-shot one, I get pretty similar results with a different code (I get slightly smaller). Differences in MSR-VTT can be related to the fact that there are repeated labels (so there are ties). |
(but not sure how much the repeated-labels thing affects) |
Hi, yes unfortunately MSR-VTT 1k-A does not have a test split (many of the downstream retrieval datasets), so val and test are one and the same as @bryant1410 says. The line in the paper ought to be: "we train on 9k train videos, and val/test on 1k" Regarding the resulting numbers being slightly higher: I retrained the pre-trained models after submission when rewriting the code, and performance increased a bit -- hence the higher ZS results. For finetuning, the current code picks the best performing checkpoint from val == test, which preforms better than if you train and evaluate and a pre-decided fixed number of epochs (as described in the paper). Doing the latter will give results closer to those written in the paper. |
Hi,
In the paper, it is described that 'Following other works [35], we train on 9Ktrain+val videos and report results on the 1K-A test set'
Howerver, in your provided code for text-to-video retrieval on MSR-VTT, it seems that the validation set and the test set are the same, which is named as 'val_list_jsfusion.txt' with 1K data.
The results of your released model on MSR-VTT test set (val_list_jsfusion.txt) are higher than that reported in the paper.
Is 'val_list_jsfusion.txt' the test set for MSR-VTT evaluation?
Looking forward to your reply.
The text was updated successfully, but these errors were encountered: