test() function is correct? #47

yunsangq · 2019-08-19T08:11:04Z

I tested cifar10(abnormal_class: bird) with the latest commit code which solved the loss problem.

As a result, a maximum of 0.579(AUC) was obtained, which was different from the GANomaly-bird score of 0.510 on paper(skip-ganomaly).

So while checking the source code, I found that self.netg.eval() is missing from the test() function in lib/model.py

So I added self.netg.eval() to 197 line and retested to get a 0.410 AUC.
It was quite different from the results of the paper.

I think it's right to test by adding eval(). What do you think?
If adding eval() is right, what do you think about the evaluation results of the paper?

dsuess · 2019-08-23T04:21:26Z

I got the same result. Also, when you shuffle the test-dataset, you get the same lower score. I think here's what's happening:

the batchnorm is not frozen during inference and changes while we perform inference
since the weights are "used to" the original batchnorm values, we get better reconstructions at the beginning of the evaluation
since the "normal" examples are evaluated first and the anomaly score is based on the reconstruction error, the model's evaluation results are skewed in the original implementation

tiandamiao · 2019-08-30T08:07:24Z

When testing(the test set shuffle = True), how do you solve it?

tiandamiao · 2019-08-30T08:07:37Z

@dsuess

dsuess · 2019-08-30T09:35:15Z

Could you please be more specific what you mean by "solve"?

tiandamiao · 2019-08-30T16:37:22Z

Could you please be more specific what you mean by "solve"?

I mean, when shuffle the test set, the AUC value is very low, how to solve it and why? And,when testing ,Does the test set need to be shuffled during testing? The default is False.
second,in the test function, Do you think it's necessary to add self.netg.eval()? I want to get more reliable results.
My confusion stems entirely from the testing phase.I hope to hear from you.thanks

dsuess · 2019-08-31T05:55:24Z

I think this rather shows that there's severe problem with this implementation -- the evaluation score shouldn't depend on the order. As far as I can tell, the only way to reproduce the high scores from the paper is by not shuffling, which biases the results as outlined above. Maybe @samet-akcay could clarify things

tiandamiao · 2019-09-05T01:41:45Z

@samet-akcay I really need your help.

oziris · 2019-09-17T09:03:12Z

The test function doesn't call the eval() mode (this influences the BatchNorm layers behavior) .

While in training:
self.netg.train()
self.netd.train()

While in testing:
self.netg.eval()
self.netd.eval()

In testing only generator part is used, so actually you can do just:
self.opt.phase = 'test'
self.netg.eval()

ChristianEschen · 2019-10-15T12:24:23Z

I would agree that one should use the .eval() function during testing

lzzlxxlsz · 2020-04-06T01:21:11Z

@samet-akcay I really need your help.

你好，请问一下，你实验结果怎么样呢对于你自己的数据集

24hours mentioned this issue Jul 23, 2020

Weird Validation behavior samet-akcay/skip-ganomaly#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test() function is correct? #47

test() function is correct? #47

yunsangq commented Aug 19, 2019

dsuess commented Aug 23, 2019

tiandamiao commented Aug 30, 2019

tiandamiao commented Aug 30, 2019

dsuess commented Aug 30, 2019

tiandamiao commented Aug 30, 2019

dsuess commented Aug 31, 2019

tiandamiao commented Sep 5, 2019

oziris commented Sep 17, 2019 •

edited

Loading

ChristianEschen commented Oct 15, 2019

lzzlxxlsz commented Apr 6, 2020

test() function is correct? #47

test() function is correct? #47

Comments

yunsangq commented Aug 19, 2019

dsuess commented Aug 23, 2019

tiandamiao commented Aug 30, 2019

tiandamiao commented Aug 30, 2019

dsuess commented Aug 30, 2019

tiandamiao commented Aug 30, 2019

dsuess commented Aug 31, 2019

tiandamiao commented Sep 5, 2019

oziris commented Sep 17, 2019 • edited Loading

ChristianEschen commented Oct 15, 2019

lzzlxxlsz commented Apr 6, 2020

oziris commented Sep 17, 2019 •

edited

Loading