Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in joint train performance for aromaticity CL of CGLB and BeGin #14

Closed
WeiWeic6222848 opened this issue Apr 10, 2023 · 2 comments

Comments

@WeiWeic6222848
Copy link

Hello,

I've come across another CGL benchmark, BeGin (https://arxiv.org/pdf/2211.14568.pdf), in table 5.c. the performance of aromaticity-CL is reported, they have a result of 0.286, while in CGLB, this is ~78%. I would expect the result to be more similar.

Kr

@QueuQ
Copy link
Owner

QueuQ commented May 22, 2023

Hello,

I've come across another CGL benchmark, BeGin (https://arxiv.org/pdf/2211.14568.pdf), in table 5.c. the performance of aromaticity-CL is reported, they have a result of 0.286, while in CGLB, this is ~78%. I would expect the result to be more similar.

Kr

Hi,

This seems to be a new benchmark and I haven't checked its code. But I think such a huge difference should be caused by the number of classes included.

As explained in the Appendix 1.1 of our paper, 'For the G-CGL datasets, classes removed from Aromaticity-CL are {2,
3, 4, 8, 35, 36, 37, 38, 39, 40, 41} since they contain less than 20 examples and are causing difficulties
for model training. The other 30 classes of Aromaticity-CL are kept and constructed as 15 tasks.'

We found that these very small classes are causing difficulty for the model to learn, and even the joint train cannot perform well. After checking, the difficulty is that the model is overfitted to the very few training data, therefore is not performing well on the test set of these small classes. This difficulty is actually concerned with how to design the GNNs with better generalization power, and is not the focus of continual learning. If the model cannot learn well on each task, then it does not make sense to further discuss the forgetting problem, which is the focus of continual graph learning. Therefore, we chose to only keep part of the original datasets to ensure that the constructed tasks are suitable for evaluating continual learning models.

@WeiWeic6222848
Copy link
Author

Thank you for the clarification, it can indeed be the case. It is a interesting question to think about when continual learning performance is harmed due to external factors such as overfitting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants