-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inability to reproduce paper results #92
Comments
Hi Cameron, Thank you for your interest in our work! In the meantime, I think the key hyperparameter to change from your setting is Further, you can think of both The mean reduction patch only affects processors that use the mean aggregation, which we never use in our official experiments, as the max aggregator was always superior. I hope this is helpful. If you have any other issues, please don't hesitate to contact us. Thanks, |
To follow up on this, PR #94 integrates these hyperparameters into the main codebase. |
Thank you for the quick response! I was able to replicate the paper results much more closely with the new specifications. |
Hello, I just wanted to confirm that the paper settings for GAT was number of heads = 1, head size = 128? |
Hi Cameron, I am not completely sure at this time, but what we report as "GAT" is actually the maximum performance out of Basically, the best performance we were able to get out of all of these GAT variants, we reported as "GAT", due to a reduced amount of horizontal space. |
Thanks to the authors for constructing this benchmark.
I'm having trouble reproducing some of the test scores reported in the paper, in Table 2. Comparing my runs against the paper results (averaging across 3 seeds: 42, 43, and 44):
Graham Scan task:
MPNN: 0.6355 vs. 0.9104 published
PGN: 0.3622 vs. 0.5687 published
Binary Search task:
MPNN: 0.2026 vs. 0.3683 published
PGN: 0.4390 vs. 0.7695 published
Here are the values I used for my reproduction experiments:
![image](https://user-images.githubusercontent.com/44640939/186577077-12dc8a2e-9a46-4b95-ab19-e4065af5695f.png)
Values for batch size, train items, learning rate, and hint teaching forcing noise were obtained from sections 4.1 and 4.2 of the paper. Values for eval_every, dropout, use_ln, and use_lstm (which were not found in the paper) were default values in the provided run file. Additionally, I used processor type "pgn_mask" for the PGN experiments.
What setting should I use to more accurately reproduce the paper results? Were there hyperparameter settings unspecified in the paper (or specified) that I am getting wrong?
Finally, I noticed the most recent commit, fixing the axis for mean reduction in PGN. Would that cause PGN to perform differently than reported in the paper? And perhaps explain the discrepancy in results I obtained.
The text was updated successfully, but these errors were encountered: