Skip to content
This repository has been archived by the owner on Nov 10, 2023. It is now read-only.

Reproducibility of Table 3 #5

Open
SilvioGiancola opened this issue Jul 17, 2019 · 9 comments
Open

Reproducibility of Table 3 #5

SilvioGiancola opened this issue Jul 17, 2019 · 9 comments

Comments

@SilvioGiancola
Copy link

Hi,

I am running your code several time on the DD dataset and obtain different results than the one you present on Table 3 of your paper.
In particular I run 20 times this experiment, estimate the average and std but find out that the training is very random, with the std rising up to +-10%. Note that I used the same hyperparameters you provide in your paper and your google sheet (see issue #2).
I also tried with the ReduceLROnPlateau scheduler for the LR, but still have an std up to 5%.
How did you select your seed and how come there is such variation?

Thank you for your support,

Best,

@ThyrixYang
Copy link

ThyrixYang commented Oct 14, 2019

I can confirm with @SilvioGiancola that the variance is much larger than results on google sheet, by using hyperparameters as suggested on the dataset D&D. And the mean accuracy seems not that high.

@ThyrixYang
Copy link

Hi @SilvioGiancola , they said in their paper that

In our experiments, we evaluated the pooling methods over 20 random seeds using 10-fold cross validation. A total of 200 testing results were used to obtain the final accuracy of each method on each dataset

So we evaluate by run CV (the main.py file) 10 times, calculate their mean, take this as one result, and repeat this procedure 20 times.
I hope this helps.

@SilvioGiancola
Copy link
Author

Hi @ThyrixYang , thank you for sharing this detail!

In that case, it's not exactly like running 10 times the code as the main.py is not doing 10-fold cross validation but random splitting.

With this averaging over 10 runs, did you get a variance similar than one in Table 3?

@ThyrixYang
Copy link

@SilvioGiancola yes, the variance is similar, although its mean is a bit lower.

@SilvioGiancola
Copy link
Author

SilvioGiancola commented Oct 22, 2019

@ThyrixYang I solved the variance issue with this 10-fold cross validation.

Although, when I reproduce their results, I am getting 10% lower than what they claim on the DD dataset, using the global pooling model. Are you also having such a big difference in your results?

I wished the authors could provide a code to reproduce their results. It is impossible to build upon them...

@ThyrixYang
Copy link

ThyrixYang commented Oct 23, 2019

@SilvioGiancola Are you doing exactly 10-fold CV?
I remember on the random CV that it's about 2~3% lower, not 10%
I'm working on a paper about graph pooling now, but not based mainly on this paper, I will do more experiments later, maybe we can share some results then.

@SilvioGiancola
Copy link
Author

@ThyrixYang I'll be more than happy to share some results with you on this baseline.
I guess I am performing the 10-fold CV properly, in particular:

  1. I randomly split the dataset (DD in my case) in 10 folds of same length (last fold has a slight different length)
  2. I use 9 folds for training, the 10th for testing
  3. I repeat the 9-fold training 10 times, segregating a different fold for the testing at each time
  4. I average the testing performance over the 10 folds -> this gives me the results for 1 run
  5. I repeat steps 1-4 for 20 times, with 20 different random splits for the folds
  6. I estimate the average and the std over the 20 runs

I get 65.1 ± 1.23 on DD using the global pooling setting, while the paper claim 76.19 ± 0.94. BTW I tried both the SAGPool implementations from this repo and from pytorch-geometric, with similar results. I also used the hyper parameters from the gsheet (lr=0.005, nhid=128, weight decay=0.00001)

Are you doing anything different for the 10 fold CV? Have you tried the same dataset or a different one?

@jiaruHithub
Copy link

@ThyrixYang I'll be more than happy to share some results with you on this baseline.
I guess I am performing the 10-fold CV properly, in particular:

  1. I randomly split the dataset (DD in my case) in 10 folds of same length (last fold has a slight different length)
  2. I use 9 folds for training, the 10th for testing
  3. I repeat the 9-fold training 10 times, segregating a different fold for the testing at each time
  4. I average the testing performance over the 10 folds -> this gives me the results for 1 run
  5. I repeat steps 1-4 for 20 times, with 20 different random splits for the folds
  6. I estimate the average and the std over the 20 runs

I get 65.1 ± 1.23 on DD using the global pooling setting, while the paper claim 76.19 ± 0.94. BTW I tried both the SAGPool implementations from this repo and from pytorch-geometric, with similar results. I also used the hyper parameters from the gsheet (lr=0.005, nhid=128, weight decay=0.00001)

Are you doing anything different for the 10 fold CV? Have you tried the same dataset or a different one?

Hi, I tried to reproduce the experiment too.
How is your replication of the experiment now?
I am looking forward to your reply. I am very interested in it
Thank you!

@Abelpzx
Copy link

Abelpzx commented Jul 31, 2021

Hi, each time I run this code, I got different results. I have set these seeds but still got different results.

torch.cuda.manual_seed(12345)  
torch.cuda.manual_seed_all(12345)  
random.seed(12345)
np.random.seed(12345)
#torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

What can I do to get the sample after running this code each time?
I am looking forward to your reply.
Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants