-
Notifications
You must be signed in to change notification settings - Fork 211
test() function is correct? #47
Comments
I got the same result. Also, when you shuffle the test-dataset, you get the same lower score. I think here's what's happening:
|
When testing(the test set shuffle = True), how do you solve it? |
Could you please be more specific what you mean by "solve"? |
I mean, when shuffle the test set, the AUC value is very low, how to solve it and why? And,when testing ,Does the test set need to be shuffled during testing? The default is False. |
I think this rather shows that there's severe problem with this implementation -- the evaluation score shouldn't depend on the order. As far as I can tell, the only way to reproduce the high scores from the paper is by not shuffling, which biases the results as outlined above. Maybe @samet-akcay could clarify things |
@samet-akcay I really need your help. |
The test function doesn't call the eval() mode (this influences the BatchNorm layers behavior) . While in training: While in testing: In testing only generator part is used, so actually you can do just: |
I would agree that one should use the .eval() function during testing |
你好,请问一下,你实验结果怎么样呢对于你自己的数据集 |
I tested cifar10(abnormal_class: bird) with the latest commit code which solved the loss problem.
As a result, a maximum of 0.579(AUC) was obtained, which was different from the GANomaly-bird score of 0.510 on paper(skip-ganomaly).
So while checking the source code, I found that self.netg.eval() is missing from the test() function in lib/model.py
So I added self.netg.eval() to 197 line and retested to get a 0.410 AUC.
It was quite different from the results of the paper.
I think it's right to test by adding eval(). What do you think?
If adding eval() is right, what do you think about the evaluation results of the paper?
The text was updated successfully, but these errors were encountered: