Skip to content
This repository has been archived by the owner on Feb 16, 2022. It is now read-only.

Retrieval results much lower than expected on COCO 5K test set and Flickr30K 1K test set #78

Open
juliuswang0728 opened this issue Dec 11, 2020 · 1 comment

Comments

@juliuswang0728
Copy link

juliuswang0728 commented Dec 11, 2020

Hi!

Is there anyone who has tried to run the evaluation with the provided multi_task_model.bin model?

I obtained

COCO (5K test set), R@{1 | 5 | 10}, IR: image retrieval, TR: text retrieval

  • IR: 32.979 | 61.911 | 74.082
  • TR: 14.62 | 32.18 | 39.76

Flickr30K

  • IR: 52.84 | 79.54 | 87.18
  • TR: 69.3 | 89 | 94

For what it's worth, those are not really comparable with those in the 12-in-1 paper.
I understand that there's room for improvement on TR as there's no hard negative mining for texts, but seems IR results are also unsatisfactory. I'm wondering if there's something missing here.

Thanks!

@shivangibithel
Copy link

Hi
I recently tried IR on Flickr30k 1K test set and getting the following results.
image

Can you tell if you ever got similar results?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants