Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train from scratch on molecule datasets #39

Open
hehuanma opened this issue Nov 30, 2021 · 4 comments
Open

train from scratch on molecule datasets #39

hehuanma opened this issue Nov 30, 2021 · 4 comments
Labels
good first issue Good for newcomers

Comments

@hehuanma
Copy link

Hello, I am trying to use Graphormer on other commonly used datasets from MoleculeNet (https://moleculenet.org/datasets-1) to check the performance, such as BACE, BBBP, etc. I have used the default hparams in the script of molhiv, but the results are horrible...

  1. May I know have you tried your model on these datasets without pretrained model? And do you have any suggestions on the hparams for these datasets if we want to train from scratch? I am trying to find out why the results are so bad...
  2. For molhiv without pretrained model, I have tried with the provided script in the examples folder, with not adding the "checkpoint_path" argument, and train for 100 epochs. But the best val score is only around 0.763 and the corresponding test score is only 0.636... I don't know what goes wrong... May I know have you tried to use Graphormer directly on molhiv without pretrained model? How is the performance?
    Thank you.
@zhengsx
Copy link
Collaborator

zhengsx commented Nov 30, 2021

Good question. While we have not tested Graphormer on MoleculeNet by training from scratch, the unsatisfactory performance is in expectation. Graphormer is built upon a standard Transformer model, which has very powerful expresiveness. This would be valuable for more challenging large-scale dataset, but will hurt the performance on small benchmark due to the crazy overfitting. Just imagine training ViT or SWin on MNIST or Cifar10 (although Transformer-based models already have been the de-facto standard on image processing).

If someone insist to get a good performance on those extremely small datasets such as MoleculeNet, e.g., less than 100K molecules, here is some tips which may be helpful:

  1. Reduce the parameter size of Graphormer like what we do on ZINC.
  2. Add strong regularization techniques like what we do on hiv and pcba.
  3. Use pretrained model which is a very good method to overcome overfitting.

@hehuanma
Copy link
Author

Thank you for the information! That makes sense, we did observe crazy overfitting for some datasets, and for others the training was quite unstable. Btw, do you plan to upload the pretrained model used in the paper? Thus we can apply it and save some computational costs. Thanks!

@zhengsx
Copy link
Collaborator

zhengsx commented Nov 30, 2021

In our latest plan, all the pre-trained checkpoint models will be released together with the new efficient framework of Graphormer in the next release. Please stay tuned.

@hehuanma
Copy link
Author

Sounds great! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants