Doubts about the results of TIRG in the paper #5

BrandonHanx · 2021-07-03T18:13:49Z

For the results of TIRG on fashion_iq mentioned in main paper Table1 and supp material, did you reproduce several experiments with similar results?

I understand you just copy the results reported in VAL. However, I challenge these results are wrong.

According to my experiment, I can have following performance on original split with TIRG (with ResNet 50 and Bi-GRU, no glove and BERT embeddings were used here):

Shirt R@10	Shirt R@50	Dress R@10	Dress R@50	Toptee R@10	Toptee R@50
18.50	43.03	21.81	46.26	24.02	51.10

This performance is much better than both VAL and my produced CoSMo, also is very close to reported CoSMo.

Also, this paper has the same conclusion with me (although our settings are different, our comparison between VAL and TIRG is fair, please see Table 1 for details): TIRG is much better than VAL and the results reported in VAL are wrong.

If our observations are correct, then how to prove the performance effect of CoSMo (although I totally agree with CoSMo's insight)?

Please point me out if I were wrong. Thanks in advance.

numpee · 2021-07-04T10:16:13Z

Hi @BrandonHanx,

Sorry for the late reply, we've been busy with other stuff recently.

We actually did reproduce TIRG, and found similar results to what VAL reported. I dug up some old experiments and found the following graphs for the "dress":

Note that the best scores across all the different parameters settings are 14.7 and 35.0 for R@10 and R@50, respectively. Since the scores are quite similar to those reported in VAL, we reported the values from VAL.

I am a bit curious how you obtained the numbers shown above. One difference may be that we used the "dress" subset only (as mentioned in Issue #3). We will update the fix to Issue #3 soon so you can run the combined dataset setting more easily.

Regarding VAL: it seems like the paper you linked may not have reproduced VAL properly. For example, in Section 4.1.3 they say they use SGD with a fixed lr=0.01 because "adjusting LR can increase the overall score significantly". The issue is, with transformers, tuning the LR is crucial, and even the choice of optimizer can be important as well (Adam seems to be a common choice over SGD in most transformer-based methods). I haven't worked extensively with GCNs to know how important it is for GCN-based models, but deciding to fix the LR and optimizer across all experiments seems to be a bit overkill and unfair for transformer-based methods. With that said, it may be true that VAL is a bit hard to reproduce; but you may want to ask the authors of VAL about this.

BrandonHanx · 2021-07-04T11:36:47Z

Hi @numpee , thanks for your detailed reply.

I have made my reproduced repo public, where the issue in #3 is fixed.

I will add a README file asap to show some running instructions. At this moment, please take a look on my TIRG training log and TIRG training configs.

For all my experiments, the training settings are following CoSMo. Therefore, my reproduced VAL is with Adam.

It is a really nice discussion with you. If you need more information from my side, please let me know.

Again, please point me out if I were wrong.

BrandonHanx · 2021-07-13T16:26:51Z

Why close this issue? This issue is not resolved for me...

postBG · 2021-07-13T17:09:42Z

It seems that this issue is deadlocked, and I'm afraid that there's nothing we can do for you anymore.
For TIRG, our reproduced results are similar but slightly lower than those of VAL and DCNet, published in peer-reviewed conferences.
So, we had to report the VAL results, which are higher and already published.
Moreover, we also tested this code following the protocol suggested in issue #3, and to us, it is easily reproduced.
For TIRG or other methods, you may ask extra questions to other authors (of DCNet or VAL) who can give you fresh advice.

BrandonHanx · 2021-07-14T07:01:23Z

Thanks for your reply.

As I mentioned in another issue #3 , would you please release your config file and training logs like what I did before (I think it is quite easy because you are using wandb)? Because I was also following the same protocol mentioned in #3 and didn't know which details are different from our both sides.

Sorry to bother you guys.

postBG · 2021-07-15T08:56:24Z

https://wandb.ai/bobrolab/CoSMo_public/workspace?workspace=user-numpee

The above link is our log. Unfortunately, we don't have any idea why your code is hard to reproduce.
Actually, we are working on other materials, so we're sorry that we cannot debug your code closely.

The separately training issue that you pointed out was helpful, and someday we'll update our code as well.
Good luck :)

postBG closed this as completed Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubts about the results of TIRG in the paper #5

Doubts about the results of TIRG in the paper #5

BrandonHanx commented Jul 3, 2021 •

edited

numpee commented Jul 4, 2021 •

edited

BrandonHanx commented Jul 4, 2021

BrandonHanx commented Jul 13, 2021

postBG commented Jul 13, 2021

BrandonHanx commented Jul 14, 2021 •

edited

postBG commented Jul 15, 2021

Doubts about the results of TIRG in the paper #5

Doubts about the results of TIRG in the paper #5

Comments

BrandonHanx commented Jul 3, 2021 • edited

numpee commented Jul 4, 2021 • edited

BrandonHanx commented Jul 4, 2021

BrandonHanx commented Jul 13, 2021

postBG commented Jul 13, 2021

BrandonHanx commented Jul 14, 2021 • edited

postBG commented Jul 15, 2021

BrandonHanx commented Jul 3, 2021 •

edited

numpee commented Jul 4, 2021 •

edited

BrandonHanx commented Jul 14, 2021 •

edited