Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the evaluation on the PCBA dataset seems wrong #92

Closed
Noisyntrain opened this issue Mar 1, 2022 · 4 comments
Closed

the evaluation on the PCBA dataset seems wrong #92

Noisyntrain opened this issue Mar 1, 2022 · 4 comments

Comments

@Noisyntrain
Copy link

Hi authors,
Thank you for your great work! I noticed that the result of 'the mean of the 4 cards ap' is different from the result of 'gather all pred and labels of different cards and do evaluation once'. And the latter method's result tends to be lower than the former one. It seems that when doing evaluation, Graphormer is using the former method. May I know that if you have the valid and test result of Graphormer model evulating the whole dataset once? Thank you!

@Noisyntrain Noisyntrain changed the title About the PCBA Dataset ap evaluation the evaluation on the PCBA dataset seems wrong Mar 1, 2022
@zhengsx
Copy link
Collaborator

zhengsx commented Mar 1, 2022

Thanks for using Graphormer.

In v1: the average precision is correctly calculation by gathering all results from different cards. Please kindly use this for Graphormer-v1 on PCBA.

In v2: currently we haven't prepared pcba script in example since the architecture has been modified and we plan to release the pcba example after we search optimal configuration and hyper-parameters.

If this feature is urgent for you, please kindly click the thumb up reaction at this issue, and we will promote the priority.

@Noisyntrain
Copy link
Author

Hi zhengsx, thank you for your replying. I'm a little confused now, since in the function validation_epoch_end of model.py, the code goes like:
self.log('valid_ap', loss, sync_dist=True) . Is the code suggesting that each card will calculate it's own ap first then calculate the mean of 4 cards' ap as the final result ?

@zhengsx
Copy link
Collaborator

zhengsx commented Mar 1, 2022

Yes, so when we validate and test the results on this dataset, we use only 1 gpu to avoid this potential issue. The logging output during training is a monitor for the program.

@Noisyntrain
Copy link
Author

Ok, I got it, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants