New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replicating results from Table 3 for ARPL #13
Comments
Thanks for your attention!
|
I changed the learning rate, the results are better, but still far from the ones reported in the paper
In the paper you have TNR ~ 53, AUROC ~ 93, DTAC ~ 87, AUIN ~ 90, AUOUT ~ 96 (cifar10 vs svhn, table 3, arpl method). |
This may be related to the seeds on different machines. I ran the results yesterday without any problems. In addition, to prevent this impact, I updated the model selection strategy, and you can run it again. |
The seed is fixed in the code by |
First of all, because we did this work early, we did not add all the seed settings at that time. The complete seed should include random, numpy, torch, cuda, etc. The seed here is not complete. The performance will indeed be affected by the seed. The results on our machine are no problem. In addition, the update here is only to prevent the impact of seed. If the best model is selected, the performance should be much better than that in the paper. Finally, our method has been optimized by the following repo: |
I see, I will try to run in multiple times to limit the effect of random seeds. |
Hi, one more clarification. To train the ARPL+CS only thing I need to change is to add
I tested it on cifar10 vs. cifar100 and get similar conclusions (ARPL ~ ARPL+CS in performance). |
Hi,
I have trouble replicating the results that you reported in Table 3 in the paper for the ARPL method (so far).
I downloaded you git repository and run this command to train the ARPL model (I run it three times to account for random initialization using different out dirs):
python ood.py --dataset cifar10 --out-dataset svhn --model arpl --loss ARPLoss --outf log_arpl
Here is a training log from one of the model training script logs.txt.
To evaluate I run this command (for each trained model):
python ood.py --dataset cifar10 --out-dataset svhn --model arpl --loss ARPLoss --outf log_arpl --eval
I get the following results:
There were no errors or warnings during the running of the scripts.
All metrics are significantly below the reported numbers.
Do you have any idea what may be the issue?
Thank you.
The text was updated successfully, but these errors were encountered: