New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doubts about the results of TIRG in the paper #5
Comments
Hi @BrandonHanx, Sorry for the late reply, we've been busy with other stuff recently. We actually did reproduce TIRG, and found similar results to what VAL reported. I dug up some old experiments and found the following graphs for the "dress": Note that the best scores across all the different parameters settings are 14.7 and 35.0 for R@10 and R@50, respectively. Since the scores are quite similar to those reported in VAL, we reported the values from VAL. I am a bit curious how you obtained the numbers shown above. One difference may be that we used the "dress" subset only (as mentioned in Issue #3). We will update the fix to Issue #3 soon so you can run the combined dataset setting more easily. Regarding VAL: it seems like the paper you linked may not have reproduced VAL properly. For example, in Section 4.1.3 they say they use SGD with a fixed lr=0.01 because "adjusting LR can increase the overall score significantly". The issue is, with transformers, tuning the LR is crucial, and even the choice of optimizer can be important as well (Adam seems to be a common choice over SGD in most transformer-based methods). I haven't worked extensively with GCNs to know how important it is for GCN-based models, but deciding to fix the LR and optimizer across all experiments seems to be a bit overkill and unfair for transformer-based methods. With that said, it may be true that VAL is a bit hard to reproduce; but you may want to ask the authors of VAL about this. |
Hi @numpee , thanks for your detailed reply. I have made my reproduced repo public, where the issue in #3 is fixed. I will add a README file asap to show some running instructions. At this moment, please take a look on my TIRG training log and TIRG training configs. For all my experiments, the training settings are following CoSMo. Therefore, my reproduced VAL is with Adam. It is a really nice discussion with you. If you need more information from my side, please let me know. Again, please point me out if I were wrong. |
Why close this issue? This issue is not resolved for me... |
It seems that this issue is deadlocked, and I'm afraid that there's nothing we can do for you anymore. |
Thanks for your reply. As I mentioned in another issue #3 , would you please release your config file and training logs like what I did before (I think it is quite easy because you are using wandb)? Because I was also following the same protocol mentioned in #3 and didn't know which details are different from our both sides. Sorry to bother you guys. |
https://wandb.ai/bobrolab/CoSMo_public/workspace?workspace=user-numpee The above link is our log. Unfortunately, we don't have any idea why your code is hard to reproduce. The separately training issue that you pointed out was helpful, and someday we'll update our code as well. |
Hi @numpee ,
For the results of TIRG on fashion_iq mentioned in main paper Table1 and supp material, did you reproduce several experiments with similar results?
I understand you just copy the results reported in VAL. However, I challenge these results are wrong.
According to my experiment, I can have following performance on original split with TIRG (with ResNet 50 and Bi-GRU, no glove and BERT embeddings were used here):
This performance is much better than both VAL and my produced CoSMo, also is very close to reported CoSMo.
Also, this paper has the same conclusion with me (although our settings are different, our comparison between VAL and TIRG is fair, please see Table 1 for details): TIRG is much better than VAL and the results reported in VAL are wrong.
If our observations are correct, then how to prove the performance effect of CoSMo (although I totally agree with CoSMo's insight)?
Please point me out if I were wrong. Thanks in advance.
The text was updated successfully, but these errors were encountered: