You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've come across another CGL benchmark, BeGin (https://arxiv.org/pdf/2211.14568.pdf), in table 5.c. the performance of aromaticity-CL is reported, they have a result of 0.286, while in CGLB, this is ~78%. I would expect the result to be more similar.
Kr
The text was updated successfully, but these errors were encountered:
I've come across another CGL benchmark, BeGin (https://arxiv.org/pdf/2211.14568.pdf), in table 5.c. the performance of aromaticity-CL is reported, they have a result of 0.286, while in CGLB, this is ~78%. I would expect the result to be more similar.
Kr
Hi,
This seems to be a new benchmark and I haven't checked its code. But I think such a huge difference should be caused by the number of classes included.
As explained in the Appendix 1.1 of our paper, 'For the G-CGL datasets, classes removed from Aromaticity-CL are {2,
3, 4, 8, 35, 36, 37, 38, 39, 40, 41} since they contain less than 20 examples and are causing difficulties
for model training. The other 30 classes of Aromaticity-CL are kept and constructed as 15 tasks.'
We found that these very small classes are causing difficulty for the model to learn, and even the joint train cannot perform well. After checking, the difficulty is that the model is overfitted to the very few training data, therefore is not performing well on the test set of these small classes. This difficulty is actually concerned with how to design the GNNs with better generalization power, and is not the focus of continual learning. If the model cannot learn well on each task, then it does not make sense to further discuss the forgetting problem, which is the focus of continual graph learning. Therefore, we chose to only keep part of the original datasets to ensure that the constructed tasks are suitable for evaluating continual learning models.
Thank you for the clarification, it can indeed be the case. It is a interesting question to think about when continual learning performance is harmed due to external factors such as overfitting.
Hello,
I've come across another CGL benchmark, BeGin (https://arxiv.org/pdf/2211.14568.pdf), in table 5.c. the performance of aromaticity-CL is reported, they have a result of 0.286, while in CGLB, this is ~78%. I would expect the result to be more similar.
Kr
The text was updated successfully, but these errors were encountered: