Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicating results from Table 3 for ARPL #13

Closed
vojirt opened this issue Oct 11, 2022 · 7 comments
Closed

Replicating results from Table 3 for ARPL #13

vojirt opened this issue Oct 11, 2022 · 7 comments

Comments

@vojirt
Copy link

vojirt commented Oct 11, 2022

Hi,
I have trouble replicating the results that you reported in Table 3 in the paper for the ARPL method (so far).

I downloaded you git repository and run this command to train the ARPL model (I run it three times to account for random initialization using different out dirs):
python ood.py --dataset cifar10 --out-dataset svhn --model arpl --loss ARPLoss --outf log_arpl
Here is a training log from one of the model training script logs.txt.

To evaluate I run this command (for each trained model):
python ood.py --dataset cifar10 --out-dataset svhn --model arpl --loss ARPLoss --outf log_arpl --eval

I get the following results:

Acc: 92.58000
       TNR    AUROC  DTACC  AUIN   AUOUT
Bas    25.496 82.803 78.414 65.403 90.405
Acc (%): 92.580  AUROC (%): 82.803       OSCR (%): 79.600
Acc: 92.68000
       TNR    AUROC  DTACC  AUIN   AUOUT
Bas    21.589 78.534 74.769 54.259 88.407
Acc (%): 92.680  AUROC (%): 78.534       OSCR (%): 75.498
Acc: 92.68000
       TNR    AUROC  DTACC  AUIN   AUOUT
Bas    37.930 82.561 77.525 78.463 82.497
Acc (%): 92.680  AUROC (%): 82.561       OSCR (%): 78.857

There were no errors or warnings during the running of the scripts.
All metrics are significantly below the reported numbers.
Do you have any idea what may be the issue?
Thank you.

@iCGY96
Copy link
Owner

iCGY96 commented Oct 12, 2022

Thanks for your attention!
The learning rate of the model should be set as 0.1 with:

python ood.py --lr 0.1 --......

@iCGY96 iCGY96 closed this as completed Oct 12, 2022
@vojirt
Copy link
Author

vojirt commented Oct 12, 2022

I changed the learning rate, the results are better, but still far from the ones reported in the paper

Acc: 92.77000
       TNR    AUROC  DTACC  AUIN   AUOUT
Bas    29.913 88.551 83.837 84.923 92.832
Acc (%): 92.770  AUROC (%): 88.551       OSCR (%): 85.093

In the paper you have TNR ~ 53, AUROC ~ 93, DTAC ~ 87, AUIN ~ 90, AUOUT ~ 96 (cifar10 vs svhn, table 3, arpl method).
Any other ideas what could have gone wrong?
Thanks!

@iCGY96
Copy link
Owner

iCGY96 commented Oct 12, 2022

This may be related to the seeds on different machines. I ran the results yesterday without any problems. In addition, to prevent this impact, I updated the model selection strategy, and you can run it again.

@vojirt
Copy link
Author

vojirt commented Oct 12, 2022

The seed is fixed in the code by torch.manual_seed(options['seed']) and torch.cuda.manual_seed_all(options['seed']).
Frankly, selecting the best model on test data is not a correct way and should not be used.

@iCGY96
Copy link
Owner

iCGY96 commented Oct 12, 2022

First of all, because we did this work early, we did not add all the seed settings at that time. The complete seed should include random, numpy, torch, cuda, etc. The seed here is not complete. The performance will indeed be affected by the seed. The results on our machine are no problem.

In addition, the update here is only to prevent the impact of seed. If the best model is selected, the performance should be much better than that in the paper.

Finally, our method has been optimized by the following repo:
https://github.com/sgvaze/osr_closed_set_all_you_need
https://github.com/Jingkang50/OpenOOD

@vojirt
Copy link
Author

vojirt commented Oct 12, 2022

I see, I will try to run in multiple times to limit the effect of random seeds.
I will also look at the other repositories. Thank you for the support.

@vojirt
Copy link
Author

vojirt commented Oct 13, 2022

Hi, one more clarification. To train the ARPL+CS only thing I need to change is to add --cs option?
I could now replicate the results for the ARPL method (within a std), but the ARPL+CS is performing similarly to ARPL not significantly better (I ran all methods three times - without the "model selection on test") and on cifar10 vs. svhn I get (mean values + std which is similar for both methods):

                ACC    TNR     AUROC      DTACC       AUIN    AUOUT       OSCR
ARPL           92.51  38.82    89.58      83.33       84.41   93.76       85.38
ARPL+CS        92.33  37.46    90.23      84.40       86.16   93.88       86.00
std             0.17   7.18     2.00       2.34        3.08    1.25        1.8 

I tested it on cifar10 vs. cifar100 and get similar conclusions (ARPL ~ ARPL+CS in performance).
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants