Inability to reproduce paper results #92

CameronDiao · 2022-08-25T04:49:39Z

Thanks to the authors for constructing this benchmark.

I'm having trouble reproducing some of the test scores reported in the paper, in Table 2. Comparing my runs against the paper results (averaging across 3 seeds: 42, 43, and 44):

Graham Scan task:
MPNN: 0.6355 vs. 0.9104 published
PGN: 0.3622 vs. 0.5687 published

Binary Search task:
MPNN: 0.2026 vs. 0.3683 published
PGN: 0.4390 vs. 0.7695 published

Here are the values I used for my reproduction experiments:

Values for batch size, train items, learning rate, and hint teaching forcing noise were obtained from sections 4.1 and 4.2 of the paper. Values for eval_every, dropout, use_ln, and use_lstm (which were not found in the paper) were default values in the provided run file. Additionally, I used processor type "pgn_mask" for the PGN experiments.

What setting should I use to more accurately reproduce the paper results? Were there hyperparameter settings unspecified in the paper (or specified) that I am getting wrong?

Finally, I noticed the most recent commit, fixing the axis for mean reduction in PGN. Would that cause PGN to perform differently than reported in the paper? And perhaps explain the discrepancy in results I obtained.

PetarV- · 2022-08-25T09:02:17Z

Hi Cameron,

Thank you for your interest in our work!
As you rightfully noted, some of our final chosen hyperparameters did not propagate to the public GitHub's run file, and this caused a bit of a discrepancy. Sorry for the inconvenience! We are already preparing a new commit to fix that.

In the meantime, I think the key hyperparameter to change from your setting is hint_mode, which should be encoded_decoded_nodiff. You already figured out the other important hyperparameter to change (hint_teacher_forcing_noise).

Further, you can think of both pgn and pgn_mask as PGNs ("mask" is a hyperparameter for the PGN, masking out possible predictions for the edge targets to follow the graph's edges. Sometimes this is a perfect inductive bias, sometimes it is very wrong.). What we did in the paper is, per-task, report the better result out of those two in the "PGN" column.

The mean reduction patch only affects processors that use the mean aggregation, which we never use in our official experiments, as the max aggregator was always superior.

I hope this is helpful. If you have any other issues, please don't hesitate to contact us.

Thanks,
Petar

PetarV- · 2022-08-26T11:42:40Z

To follow up on this, PR #94 integrates these hyperparameters into the main codebase.

CameronDiao · 2022-08-27T16:55:31Z

Thank you for the quick response! I was able to replicate the paper results much more closely with the new specifications.

CameronDiao · 2022-09-20T18:30:22Z

Hello, I just wanted to confirm that the paper settings for GAT was number of heads = 1, head size = 128?

PetarV- · 2022-09-20T18:34:09Z

Hi Cameron, I am not completely sure at this time, but what we report as "GAT" is actually the maximum performance out of gat, gat_full, gatv2, and gatv2_full, and I think also we swept number of heads between [1, 4, 8].

Basically, the best performance we were able to get out of all of these GAT variants, we reported as "GAT", due to a reduced amount of horizontal space.

CameronDiao closed this as completed Aug 27, 2022

CameronDiao reopened this Sep 20, 2022

CameronDiao closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inability to reproduce paper results #92

Inability to reproduce paper results #92

CameronDiao commented Aug 25, 2022

PetarV- commented Aug 25, 2022

PetarV- commented Aug 26, 2022

CameronDiao commented Aug 27, 2022

CameronDiao commented Sep 20, 2022

PetarV- commented Sep 20, 2022 •

edited

Loading

Inability to reproduce paper results #92

Inability to reproduce paper results #92

Comments

CameronDiao commented Aug 25, 2022

PetarV- commented Aug 25, 2022

PetarV- commented Aug 26, 2022

CameronDiao commented Aug 27, 2022

CameronDiao commented Sep 20, 2022

PetarV- commented Sep 20, 2022 • edited Loading

PetarV- commented Sep 20, 2022 •

edited

Loading