Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test set of MSR-VTT for downstream evaluation #28

Closed
geyuying opened this issue Sep 7, 2021 · 5 comments
Closed

Test set of MSR-VTT for downstream evaluation #28

geyuying opened this issue Sep 7, 2021 · 5 comments

Comments

@geyuying
Copy link

geyuying commented Sep 7, 2021

Hi,

In the paper, it is described that 'Following other works [35], we train on 9Ktrain+val videos and report results on the 1K-A test set'

Howerver, in your provided code for text-to-video retrieval on MSR-VTT, it seems that the validation set and the test set are the same, which is named as 'val_list_jsfusion.txt' with 1K data.

The results of your released model on MSR-VTT test set (val_list_jsfusion.txt) are higher than that reported in the paper.

Is 'val_list_jsfusion.txt' the test set for MSR-VTT evaluation?

Looking forward to your reply.

@bryant1410
Copy link
Contributor

An external user here that has run into the same.

MSR-VTT 1K-A (from JSFusion work) doesn't have a "val" split, so people kind of use the names "test" and "val" interchangeably for it.

@geyuying
Copy link
Author

geyuying commented Sep 7, 2021

@bryant1410 Did you evaluate the pretrained model on MSR-VTT 1K-A test set? Both the zero-shot and the finetuned results are higher than that reported in Table 5 of the paper.

@bryant1410
Copy link
Contributor

I think I haven't run the fine-tuned one with the provided model. For zero-shot one, I get pretty similar results with a different code (I get slightly smaller). Differences in MSR-VTT can be related to the fact that there are repeated labels (so there are ties).

@bryant1410
Copy link
Contributor

(but not sure how much the repeated-labels thing affects)

@m-bain
Copy link
Owner

m-bain commented Sep 8, 2021

Hi, yes unfortunately MSR-VTT 1k-A does not have a test split (many of the downstream retrieval datasets), so val and test are one and the same as @bryant1410 says. The line in the paper ought to be: "we train on 9k train videos, and val/test on 1k"

Regarding the resulting numbers being slightly higher: I retrained the pre-trained models after submission when rewriting the code, and performance increased a bit -- hence the higher ZS results.

For finetuning, the current code picks the best performing checkpoint from val == test, which preforms better than if you train and evaluate and a pre-decided fixed number of epochs (as described in the paper). Doing the latter will give results closer to those written in the paper.

@m-bain m-bain closed this as completed Sep 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants