Question about accuracy #19

ChisamXz · 2019-08-29T19:47:52Z

Hi, I have a question about how to report the accuracy. Through experiments, you reported the accuracy of the convergence result after a fixed number of epochs. For example, your code runs MUTAG for 300 epochs and reports the accuracy of the 300th epoch for each fold and averages over 10 fold, which is eventually around 85%. Actually, if we adopt the early stop and only report the best accuracy for each fold, we could achieve the accuracy of 91.6%. I'm curious why didn't you do that to get a better result.

muhanzhang · 2019-08-30T07:00:43Z

Hi, the reason is because I found using a validation fold to determine the stopping condition is not suitable here. Since the graph datasets are typically very small, a validation fold won't be representative enough to determine the optimal stopping epoch for test fold. If you have other better early stopping strategies, please share with me here. Thanks!

ChisamXz · 2019-08-30T16:01:12Z

Hi, for each fold, I just recorded every best accuracy and if it couldn't get a higher accuracy after 150 epochs, stoped it. It's very simple and I'm not sure if its convincing enough.

ChisamXz · 2019-08-30T16:07:26Z

BTW, when I used bsize=1 on MUTAG, I could still only get the accuracy of 83.3% while you could get a result of 86.1% . Did you use this pytorch version code or did you make any changes which is not shown here? Thank you in advance~

muhanzhang · 2019-08-30T16:42:21Z

No, you can't do this. You cannot report the best test accuracy across all training epochs, as it is essentially using the test data as the validation data to determine the stopping epoch. Although the code prints the testing accuracy for every epoch, you should only use the test data once in practice which is used to evaluate your final model performance. Check 5.3 Data Snooping.” Learning from Data: a Short Course" for more details.

muhanzhang · 2019-08-30T16:56:11Z

For your second question, any cuda version/pytorch version/numpy version differences between your machine and mine could lead to differences of results. That is why I suggest you give up MUTAG -- it is too small and the result variances can be too large. That is also why I suggest doing 10 series of 10-fold cross validation, and report the average accuracy of the 100 runs -- to reduce the result variances on small datasets.

ChisamXz · 2019-09-04T02:34:34Z

Oh, thank you so much for the answering. I was a little confused by the early stopping because the code of GAT used it. The MUTAG indeed has a large accuracy variance. I was trying to do experiments using the same hype-parameters as yours. However, when I used bsize = 2, I could get the result around 85.5%. Your reply has already solved my questions, thank you again for the patient answering.

This was referenced Oct 24, 2020

This code use test accuracy to select model HongyangGao/Graph-U-Nets#16

Closed

Model selection? HongyangGao/Graph-U-Nets#20

Open

muhanzhang closed this as completed Apr 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about accuracy #19

Question about accuracy #19

ChisamXz commented Aug 29, 2019

muhanzhang commented Aug 30, 2019

ChisamXz commented Aug 30, 2019

ChisamXz commented Aug 30, 2019

muhanzhang commented Aug 30, 2019 •

edited

Loading

muhanzhang commented Aug 30, 2019

ChisamXz commented Sep 4, 2019

Question about accuracy #19

Question about accuracy #19

Comments

ChisamXz commented Aug 29, 2019

muhanzhang commented Aug 30, 2019

ChisamXz commented Aug 30, 2019

ChisamXz commented Aug 30, 2019

muhanzhang commented Aug 30, 2019 • edited Loading

muhanzhang commented Aug 30, 2019

ChisamXz commented Sep 4, 2019

muhanzhang commented Aug 30, 2019 •

edited

Loading