Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about accuracy #19

Closed
ChisamXz opened this issue Aug 29, 2019 · 6 comments
Closed

Question about accuracy #19

ChisamXz opened this issue Aug 29, 2019 · 6 comments

Comments

@ChisamXz
Copy link

Hi, I have a question about how to report the accuracy. Through experiments, you reported the accuracy of the convergence result after a fixed number of epochs. For example, your code runs MUTAG for 300 epochs and reports the accuracy of the 300th epoch for each fold and averages over 10 fold, which is eventually around 85%. Actually, if we adopt the early stop and only report the best accuracy for each fold, we could achieve the accuracy of 91.6%. I'm curious why didn't you do that to get a better result.

@muhanzhang
Copy link
Owner

Hi, the reason is because I found using a validation fold to determine the stopping condition is not suitable here. Since the graph datasets are typically very small, a validation fold won't be representative enough to determine the optimal stopping epoch for test fold. If you have other better early stopping strategies, please share with me here. Thanks!

@ChisamXz
Copy link
Author

Hi, for each fold, I just recorded every best accuracy and if it couldn't get a higher accuracy after 150 epochs, stoped it. It's very simple and I'm not sure if its convincing enough.

@ChisamXz
Copy link
Author

BTW, when I used bsize=1 on MUTAG, I could still only get the accuracy of 83.3% while you could get a result of 86.1% . Did you use this pytorch version code or did you make any changes which is not shown here? Thank you in advance~

@muhanzhang
Copy link
Owner

muhanzhang commented Aug 30, 2019

No, you can't do this. You cannot report the best test accuracy across all training epochs, as it is essentially using the test data as the validation data to determine the stopping epoch. Although the code prints the testing accuracy for every epoch, you should only use the test data once in practice which is used to evaluate your final model performance. Check 5.3 Data Snooping.” Learning from Data: a Short Course" for more details.

@muhanzhang
Copy link
Owner

For your second question, any cuda version/pytorch version/numpy version differences between your machine and mine could lead to differences of results. That is why I suggest you give up MUTAG -- it is too small and the result variances can be too large. That is also why I suggest doing 10 series of 10-fold cross validation, and report the average accuracy of the 100 runs -- to reduce the result variances on small datasets.

@ChisamXz
Copy link
Author

ChisamXz commented Sep 4, 2019

Oh, thank you so much for the answering. I was a little confused by the early stopping because the code of GAT used it. The MUTAG indeed has a large accuracy variance. I was trying to do experiments using the same hype-parameters as yours. However, when I used bsize = 2, I could get the result around 85.5%. Your reply has already solved my questions, thank you again for the patient answering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants