Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing results #41

Closed
MauritsBleeker opened this issue Jun 3, 2021 · 5 comments
Closed

Reproducing results #41

MauritsBleeker opened this issue Jun 3, 2021 · 5 comments

Comments

@MauritsBleeker
Copy link

MauritsBleeker commented Jun 3, 2021

Hi,

First of al, thanks for sharing this great work!

I've difficulties reproducing the results from the paper as a baseline. I will talk about experiment #3.15 in this issue: VSE++ (ResNet), Flickr30k

So what I get from the paper, the config is the following:

  • 30 epochs
  • Load images from disk, no precomputed features?
  • lower the lr after 15 epochs.
  • lr goes from 0.0002 -> 0.00002

My question is: is the image-encoder here trained end-to-end or not. In other words, is ResNet152 only used as a fixed feature extractor, or is it optimized?

According to your documentation, VSE++ (and therefore I assume 3.14) can be reproduced by - only - using the --max_violation flag, but I get (way) lower results, do I need the --finetune flag as well?

Thanks,
Maurits

@fartashf
Copy link
Owner

fartashf commented Jun 4, 2021

Hi Maurits,

I just reproduced the results for row 3.15 (Flickr30k ResNet without finetune) using the following command:

python train.py --logger_name runs/X --data_name f30k --cnn_type resnet152 --max_violation --num_epochs 30 --lr_update 15

The setup is PyTorch 1.4.0 and Python 3.7.1 and I used the changes in the branch pytorch4.1 and python3. The final result as printed is:

Image to text: 43.8, 72.4, 81.8, 2.0, 13.4
Text to image: 31.6, 59.6, 69.7, 3.0, 26.6

Were you running the same command as above? How big is the gap?

@MauritsBleeker
Copy link
Author

yeah, I use the same command.

Do you use a random seed? I use different Python and Torch versions, but that should not give that much of a difference right? I will share my results later today.

Thanks again,

Maurits

@fartashf
Copy link
Owner

fartashf commented Jun 4, 2021

The seed is not fixed, sorry. Unfortunately, I had not done that in the original code and did not report standard deviations.
Nevertheless, I don't expect std to be higher than 1%.
Make sure the experiment runs for the entire length of training. The recall at the end of the first few epochs for VSE++ is near zero but it picks up quickly.

@MauritsBleeker
Copy link
Author

Okay, I've managed to reproduce the results (finally). I still don't know what was the problem in the end.

Thanks for your feedback,

Maurits

Average i2t Recall: 66.9
mage to text: 45.1 73.5 82.0 2.0 11.7

Average t2i Recall: 55.7
Text to image: 32.8 62.3 72.2 3.0 21.9

@fartashf
Copy link
Owner

fartashf commented Jun 5, 2021

Sounds great. Thanks for reporting the result.

@fartashf fartashf closed this as completed Jun 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants