Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubts about the results of TIRG in the paper #5

Closed
BrandonHanx opened this issue Jul 3, 2021 · 6 comments
Closed

Doubts about the results of TIRG in the paper #5

BrandonHanx opened this issue Jul 3, 2021 · 6 comments

Comments

@BrandonHanx
Copy link

BrandonHanx commented Jul 3, 2021

Hi @numpee ,

For the results of TIRG on fashion_iq mentioned in main paper Table1 and supp material, did you reproduce several experiments with similar results?

I understand you just copy the results reported in VAL. However, I challenge these results are wrong.

According to my experiment, I can have following performance on original split with TIRG (with ResNet 50 and Bi-GRU, no glove and BERT embeddings were used here):

Shirt R@10 Shirt R@50 Dress R@10 Dress R@50 Toptee R@10 Toptee R@50
18.50 43.03 21.81 46.26 24.02 51.10

This performance is much better than both VAL and my produced CoSMo, also is very close to reported CoSMo.

Also, this paper has the same conclusion with me (although our settings are different, our comparison between VAL and TIRG is fair, please see Table 1 for details): TIRG is much better than VAL and the results reported in VAL are wrong.

If our observations are correct, then how to prove the performance effect of CoSMo (although I totally agree with CoSMo's insight)?

Please point me out if I were wrong. Thanks in advance.

@numpee
Copy link
Collaborator

numpee commented Jul 4, 2021

Hi @BrandonHanx,

Sorry for the late reply, we've been busy with other stuff recently.

We actually did reproduce TIRG, and found similar results to what VAL reported. I dug up some old experiments and found the following graphs for the "dress":
image

Note that the best scores across all the different parameters settings are 14.7 and 35.0 for R@10 and R@50, respectively. Since the scores are quite similar to those reported in VAL, we reported the values from VAL.

I am a bit curious how you obtained the numbers shown above. One difference may be that we used the "dress" subset only (as mentioned in Issue #3). We will update the fix to Issue #3 soon so you can run the combined dataset setting more easily.

Regarding VAL: it seems like the paper you linked may not have reproduced VAL properly. For example, in Section 4.1.3 they say they use SGD with a fixed lr=0.01 because "adjusting LR can increase the overall score significantly". The issue is, with transformers, tuning the LR is crucial, and even the choice of optimizer can be important as well (Adam seems to be a common choice over SGD in most transformer-based methods). I haven't worked extensively with GCNs to know how important it is for GCN-based models, but deciding to fix the LR and optimizer across all experiments seems to be a bit overkill and unfair for transformer-based methods. With that said, it may be true that VAL is a bit hard to reproduce; but you may want to ask the authors of VAL about this.

@BrandonHanx
Copy link
Author

Hi @numpee , thanks for your detailed reply.

I have made my reproduced repo public, where the issue in #3 is fixed.

I will add a README file asap to show some running instructions. At this moment, please take a look on my TIRG training log and TIRG training configs.

For all my experiments, the training settings are following CoSMo. Therefore, my reproduced VAL is with Adam.

It is a really nice discussion with you. If you need more information from my side, please let me know.

Again, please point me out if I were wrong.

@postBG postBG closed this as completed Jul 13, 2021
@BrandonHanx
Copy link
Author

Why close this issue? This issue is not resolved for me...

@postBG
Copy link
Owner

postBG commented Jul 13, 2021

It seems that this issue is deadlocked, and I'm afraid that there's nothing we can do for you anymore.
For TIRG, our reproduced results are similar but slightly lower than those of VAL and DCNet, published in peer-reviewed conferences.
So, we had to report the VAL results, which are higher and already published.
Moreover, we also tested this code following the protocol suggested in issue #3, and to us, it is easily reproduced.
For TIRG or other methods, you may ask extra questions to other authors (of DCNet or VAL) who can give you fresh advice.

@BrandonHanx
Copy link
Author

BrandonHanx commented Jul 14, 2021

Thanks for your reply.

As I mentioned in another issue #3 , would you please release your config file and training logs like what I did before (I think it is quite easy because you are using wandb)? Because I was also following the same protocol mentioned in #3 and didn't know which details are different from our both sides.

Sorry to bother you guys.

@postBG
Copy link
Owner

postBG commented Jul 15, 2021

https://wandb.ai/bobrolab/CoSMo_public/workspace?workspace=user-numpee

The above link is our log. Unfortunately, we don't have any idea why your code is hard to reproduce.
Actually, we are working on other materials, so we're sorry that we cannot debug your code closely.

The separately training issue that you pointed out was helpful, and someday we'll update our code as well.
Good luck :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants