-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepency between multiple paper results and results reported here + Bug in implementation of DebiasPL #80
Comments
Hi there, Thanks for the helpful suggestions. Regarding DebiasPL, it is indeed a bug. We will try to fix it and update the results. If you have already got the results from fixing this bug, welcome to open a pull request. Regarding DASO, have you tried run their code? I didn't get similar results as they reported in paper on CIFAR10 with 500/4000 results. But our results on CIFAR10 with 1500/3000 and CIFAR100 is close to what they reported (within the std). If you notice something about DASO is missing in our implementation, please also let me know. The fairness is really hard to control. You may also notice that there are also baseline results much higher than reported in papers (DASO and DebiasPL). The purpose of we having all these algorithms here is aiming to provide the fair comparison using the same backbone, same learning rate, same scheduler, and same training iterations. |
Hi, thanks for the quick response! Regarding DASO, we have run the code provided by the authors in their repository and reproduced their results. However, it might be the case as you say that they run something differently than in this repo. I'll get back to you if we find a more exact cause of the discrepancy. I agree that it's a very good thing to have a shared space for comparison, but of course, then it's all the more important that methods are evaluated fairly against one another. |
I Agreed. We will try our best to make this benchmark as fair as possible. Open to any suggestions and find of bugs to make USB better. |
Fixed in PR #135 |
Then Ill close ;) |
Hi! Great work on providing this benchmark repo.
We found it somewhat surprisingly that there seems to be large discrepancies on the performance numbers reported here:
https://github.com/microsoft/Semi-supervised-learning/blob/main/results/classic_cv_imb.csv
and in the original implementations of the respective authors.
We found some of the largest discrepancies for DebiasPL (https://github.com/frank-xwang/debiased-pseudo-labeling).
Looking at the code there is a major bug here:
Semi-supervised-learning/semilearn/imb_algorithms/debiaspl/debiaspl.py
Line 52 in b979a23
The mean, in this case, should only be taken over dim=0. In this repo it's reduced to a singleton, which worsens performance.
We also found that the CLD auxiliary loss used in DebiasPL is not present in this repo.
Another recent method that performs significantly worse in this repo is DASO (https://github.com/ytaek-oh/daso).
While I'm not sure of the cause of the discrepancies for this paper, it is concerning that the results seem to deviate significantly from the original authors.
I am concerned that this benchmark in its current state may end up worsening the fairness of comparison, as certain algorithms seem to have gotten a much more careful reimplementation than others.
The text was updated successfully, but these errors were encountered: