Replicating results from Table 3 for ARPL #13

vojirt · 2022-10-11T12:57:06Z

Hi,
I have trouble replicating the results that you reported in Table 3 in the paper for the ARPL method (so far).

I downloaded you git repository and run this command to train the ARPL model (I run it three times to account for random initialization using different out dirs):
python ood.py --dataset cifar10 --out-dataset svhn --model arpl --loss ARPLoss --outf log_arpl
Here is a training log from one of the model training script logs.txt.

To evaluate I run this command (for each trained model):
python ood.py --dataset cifar10 --out-dataset svhn --model arpl --loss ARPLoss --outf log_arpl --eval

I get the following results:

Acc: 92.58000
       TNR    AUROC  DTACC  AUIN   AUOUT
Bas    25.496 82.803 78.414 65.403 90.405
Acc (%): 92.580  AUROC (%): 82.803       OSCR (%): 79.600

Acc: 92.68000
       TNR    AUROC  DTACC  AUIN   AUOUT
Bas    21.589 78.534 74.769 54.259 88.407
Acc (%): 92.680  AUROC (%): 78.534       OSCR (%): 75.498

Acc: 92.68000
       TNR    AUROC  DTACC  AUIN   AUOUT
Bas    37.930 82.561 77.525 78.463 82.497
Acc (%): 92.680  AUROC (%): 82.561       OSCR (%): 78.857

There were no errors or warnings during the running of the scripts.
All metrics are significantly below the reported numbers.
Do you have any idea what may be the issue?
Thank you.

The text was updated successfully, but these errors were encountered:

iCGY96 · 2022-10-12T02:03:00Z

Thanks for your attention!
The learning rate of the model should be set as 0.1 with:

python ood.py --lr 0.1 --......

vojirt · 2022-10-12T09:25:16Z

I changed the learning rate, the results are better, but still far from the ones reported in the paper

Acc: 92.77000
       TNR    AUROC  DTACC  AUIN   AUOUT
Bas    29.913 88.551 83.837 84.923 92.832
Acc (%): 92.770  AUROC (%): 88.551       OSCR (%): 85.093

In the paper you have TNR ~ 53, AUROC ~ 93, DTAC ~ 87, AUIN ~ 90, AUOUT ~ 96 (cifar10 vs svhn, table 3, arpl method).
Any other ideas what could have gone wrong?
Thanks!

iCGY96 · 2022-10-12T10:33:57Z

This may be related to the seeds on different machines. I ran the results yesterday without any problems. In addition, to prevent this impact, I updated the model selection strategy, and you can run it again.

vojirt · 2022-10-12T11:24:55Z

The seed is fixed in the code by torch.manual_seed(options['seed']) and torch.cuda.manual_seed_all(options['seed']).
Frankly, selecting the best model on test data is not a correct way and should not be used.

iCGY96 · 2022-10-12T12:00:18Z

First of all, because we did this work early, we did not add all the seed settings at that time. The complete seed should include random, numpy, torch, cuda, etc. The seed here is not complete. The performance will indeed be affected by the seed. The results on our machine are no problem.

In addition, the update here is only to prevent the impact of seed. If the best model is selected, the performance should be much better than that in the paper.

Finally, our method has been optimized by the following repo:
https://github.com/sgvaze/osr_closed_set_all_you_need
https://github.com/Jingkang50/OpenOOD

vojirt · 2022-10-12T12:04:14Z

I see, I will try to run in multiple times to limit the effect of random seeds.
I will also look at the other repositories. Thank you for the support.

vojirt · 2022-10-13T15:19:22Z

Hi, one more clarification. To train the ARPL+CS only thing I need to change is to add --cs option?
I could now replicate the results for the ARPL method (within a std), but the ARPL+CS is performing similarly to ARPL not significantly better (I ran all methods three times - without the "model selection on test") and on cifar10 vs. svhn I get (mean values + std which is similar for both methods):

                ACC    TNR     AUROC      DTACC       AUIN    AUOUT       OSCR
ARPL           92.51  38.82    89.58      83.33       84.41   93.76       85.38
ARPL+CS        92.33  37.46    90.23      84.40       86.16   93.88       86.00
std             0.17   7.18     2.00       2.34        3.08    1.25        1.8

I tested it on cifar10 vs. cifar100 and get similar conclusions (ARPL ~ ARPL+CS in performance).
Thank you.

iCGY96 closed this as completed Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicating results from Table 3 for ARPL #13

Replicating results from Table 3 for ARPL #13

vojirt commented Oct 11, 2022

iCGY96 commented Oct 12, 2022

vojirt commented Oct 12, 2022

iCGY96 commented Oct 12, 2022

vojirt commented Oct 12, 2022

iCGY96 commented Oct 12, 2022

vojirt commented Oct 12, 2022 •

edited

vojirt commented Oct 13, 2022 •

edited

Replicating results from Table 3 for ARPL #13

Replicating results from Table 3 for ARPL #13

Comments

vojirt commented Oct 11, 2022

iCGY96 commented Oct 12, 2022

vojirt commented Oct 12, 2022

iCGY96 commented Oct 12, 2022

vojirt commented Oct 12, 2022

iCGY96 commented Oct 12, 2022

vojirt commented Oct 12, 2022 • edited

vojirt commented Oct 13, 2022 • edited

vojirt commented Oct 12, 2022 •

edited

vojirt commented Oct 13, 2022 •

edited