Discrepency between multiple paper results and results reported here + Bug in implementation of DebiasPL #80

Parskatt · 2023-02-24T15:37:18Z

Hi! Great work on providing this benchmark repo.

We found it somewhat surprisingly that there seems to be large discrepancies on the performance numbers reported here:
https://github.com/microsoft/Semi-supervised-learning/blob/main/results/classic_cv_imb.csv
and in the original implementations of the respective authors.

We found some of the largest discrepancies for DebiasPL (https://github.com/frank-xwang/debiased-pseudo-labeling).
Looking at the code there is a major bug here:

Semi-supervised-learning/semilearn/imb_algorithms/debiaspl/debiaspl.py

Line 52 in b979a23

delta_p = probs.mean()

The mean, in this case, should only be taken over dim=0. In this repo it's reduced to a singleton, which worsens performance.
We also found that the CLD auxiliary loss used in DebiasPL is not present in this repo.

Another recent method that performs significantly worse in this repo is DASO (https://github.com/ytaek-oh/daso).
While I'm not sure of the cause of the discrepancies for this paper, it is concerning that the results seem to deviate significantly from the original authors.

I am concerned that this benchmark in its current state may end up worsening the fairness of comparison, as certain algorithms seem to have gotten a much more careful reimplementation than others.

Hhhhhhao · 2023-02-24T15:47:53Z

Hi there,

Thanks for the helpful suggestions.

Regarding DebiasPL, it is indeed a bug. We will try to fix it and update the results. If you have already got the results from fixing this bug, welcome to open a pull request.

Regarding DASO, have you tried run their code? I didn't get similar results as they reported in paper on CIFAR10 with 500/4000 results. But our results on CIFAR10 with 1500/3000 and CIFAR100 is close to what they reported (within the std). If you notice something about DASO is missing in our implementation, please also let me know.

The fairness is really hard to control. You may also notice that there are also baseline results much higher than reported in papers (DASO and DebiasPL). The purpose of we having all these algorithms here is aiming to provide the fair comparison using the same backbone, same learning rate, same scheduler, and same training iterations.

Parskatt · 2023-02-24T15:54:04Z

Hi, thanks for the quick response!

Regarding DASO, we have run the code provided by the authors in their repository and reproduced their results. However, it might be the case as you say that they run something differently than in this repo. I'll get back to you if we find a more exact cause of the discrepancy.

I agree that it's a very good thing to have a shared space for comparison, but of course, then it's all the more important that methods are evaluated fairly against one another.

Hhhhhhao · 2023-02-24T16:02:27Z

I Agreed. We will try our best to make this benchmark as fair as possible. Open to any suggestions and find of bugs to make USB better.

Hhhhhhao · 2023-07-20T05:33:43Z

Fixed in PR #135

Parskatt · 2023-07-23T03:44:18Z

Then Ill close ;)

Hhhhhhao added the bug Something isn't working label Feb 24, 2023

Hhhhhhao mentioned this issue Jul 20, 2023

[Update] Release semilearn 0.3.1. #135

Merged

Parskatt closed this as completed Jul 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepency between multiple paper results and results reported here + Bug in implementation of DebiasPL #80

Discrepency between multiple paper results and results reported here + Bug in implementation of DebiasPL #80

Parskatt commented Feb 24, 2023

Hhhhhhao commented Feb 24, 2023

Parskatt commented Feb 24, 2023

Hhhhhhao commented Feb 24, 2023

Hhhhhhao commented Jul 20, 2023

Parskatt commented Jul 23, 2023

Discrepency between multiple paper results and results reported here + Bug in implementation of DebiasPL #80

Discrepency between multiple paper results and results reported here + Bug in implementation of DebiasPL #80

Comments

Parskatt commented Feb 24, 2023

Hhhhhhao commented Feb 24, 2023

Parskatt commented Feb 24, 2023

Hhhhhhao commented Feb 24, 2023

Hhhhhhao commented Jul 20, 2023

Parskatt commented Jul 23, 2023