-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent evaluation on ogbl-collab datasets #457
Comments
Hi! The evaluation rule is stated as is. One can use validation edges for both training and inference as long as all hyper-parameters are selected based on validation edges (not test edges). As you rightly pointed out, our example code indeed only uses the validation set for inference, but it is just for simplicity. Your example code is totally valid, but it's a bit interesting to see you are validating on validation edges while also using validation edges as training supervision. So you are essentially using training loss to do model selection? Wouldn't that cause serious over-fitting? |
I think overfitting may not be an issue for this or 2000 epochs training
has not reached overfitting yet. More indepth analysis may be needed. I
also find it quite interesting that this naive method can get such a good
performance.
If the results can be reproduced, should the leaderboard get updated
accordingly?
…On Sat, Sep 2, 2023 at 1:49 PM Weihua Hu ***@***.***> wrote:
Hi! The evaluation rule is stated as is. One can use validation edges for
both training and inference as long as all hyper-parameters are selected
based on validation edges (not test edges). As you rightly pointed out, our
example code indeed only uses the validation set for inference, but it is
just for simplicity. Your example code is totally valid, but it's a bit
interesting to see you are validating on validation edges while also using
validation edges as training supervision. So you are essentially using
training loss to do model selection? Wouldn't that cause serious
over-fitting?
—
Reply to this email directly, view it on GitHub
<#457 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGNQQIWMYC6CEIVMK3OMIXTXYLCHBANCNFSM6AAAAAA4HH2YL4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Got it. Thanks for clarifying. Please feel free to submit to our leaderboard yourself. |
Hi,
According to the rule of evaluation (https://ogb.stanford.edu/docs/leader_rules/#:~:text=The%20only%20exception,the%20validation%20labels.), the Collab for link prediction allows using the validation set during the model training. However, the example code in (https://github.com/snap-stanford/ogb/blob/master/examples/linkproppred/collab/gnn.py) seems to only use the validation set for inference rather than training. After using these validation sets as the training edges, the performance of vanilla SAGE can achieve 68+ in Hits@50.
The implementation can be found here (https://github.com/Barcavin/ogb/tree/val_as_input_collab/examples/linkproppred/collab).
In fact, GCN can reach 69.45 ± 0.52 and SAGE can reach 68.20 ± 0.35. The differences between this implementation and the original example code are:
I believe the most critical trick to make the model perform well is the learnable node embedding rather than the node attributes. To reproduce, please run
python gnn.py --use_valedges_as_input [--use_sage]
Therefore, I am confused about what the correct way is to evaluate model performance on Collab.
Besides, I found that some of the submissions on the leaderboard of Collab utilize the validation set as training edges (both supervision signal and message-passing edges) while others use it only for inference (message-passing edges). This may cause an evaluation discrepancy for these models. For example, the current top-1 (GIDN@YITU) uses validation sets in the training, while ELPH uses the validation set only for inference.
Thus, I believe a common protocol for evaluating models on Collab needs to be placed for a fair comparison.
Thanks,
The text was updated successfully, but these errors were encountered: