You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that the MSRVTT text-to-video retrieval performance under FT-Joint setting released in the readme is R@1: 0.2720 - R@5: 0.5570 - R@10: 0.6870 - Median R: 4.0, but the result in the paper is R@1: 0.206 - R@5: 0.491 - R@10: 0.629 - Median R: 6.0. What is the difference between them?
Addtionally, what is the performance of the FT-Align setting should be? It seems to be forgotten in the readme. Actually I tried to finetune use the scripts released by the repo but got worse score than FT-Joint on MSRVTT.
The text was updated successfully, but these errors were encountered:
Our paper reports results on ‘Training-7K’ follows the data splits from (Miech et al., 2019). However, the readme reports the results of ‘Training-9K’ which follows the data splits from (Gabeur et al., 2020). You can find two files, MSRVTT_train.7k.csv and MSRVTT_train.9k.csv in our released msrvtt.zip.
Our running on FT-Align (‘Training-9K’ ) has a smaller batch size due to our GPUs limited. Thus, the results on ‘Training-9K’ are also not an obvious advantage over FT-Joint. Our experience is that the finetune hyper-parameters are important, and the FT-Align may not be the same as the FT-Joint. You can test on ‘Training-7K’ as our paper reported.
I found that the MSRVTT text-to-video retrieval performance under FT-Joint setting released in the readme is
R@1: 0.2720 - R@5: 0.5570 - R@10: 0.6870 - Median R: 4.0
, but the result in the paper isR@1: 0.206 - R@5: 0.491 - R@10: 0.629 - Median R: 6.0
. What is the difference between them?Addtionally, what is the performance of the FT-Align setting should be? It seems to be forgotten in the readme. Actually I tried to finetune use the scripts released by the repo but got worse score than FT-Joint on MSRVTT.
The text was updated successfully, but these errors were encountered: