Variance of the training/testing results #5

qiyan98 · 2022-08-17T21:54:27Z

Hi there,

Thanks for sharing the code for your wonderful project. I have a question about the variance of the sampling results. I ran the training on the grid dataset using the default config file (with an arbitrary seed).

The testing-time performance metrics I got are:
MMD_full {'degree': 0.460601, 'cluster': 0.008495, 'orbit': 0.126024, 'spectral': 0.681714}

On the other hand, The MMD results claimed on the paper are: deg: 0.111, clus: 0.005, orbit: 0.070 for GDSS and deg: 0.171, clus: 0.011, orbit: 0.223 for GDSS-seq.

The MMD results of the samples generated by the provided checkpoint model are:
MMD_full {'degree': 0.093013, 'cluster': 0.00718, 'orbit': 0.101709, 'spectral': 0.793645}

I understand that the random seed could affect the sampling results. But this variance is a bit large in my perspective (especially the network I trained myself). Do you have any insights about this? The previous EDP-GNN baseline seems to have a large variance when the number of generated samples is small. Do you think it could be attributed to the intrinsics of the score-based model?

Best,
Qi

The text was updated successfully, but these errors were encountered:

harryjo97 · 2022-08-18T08:18:06Z

Hi Qi,

Thanks for reaching out. Although we did not encounter such a large variance problem for our trained models, the random seeds of training and sampling could affect the results since the training/sampling of score-based models depends heavily on the sampled noises. This could be intensified for larger graphs such as grid.

Furthermore, we provide the generation performance of using 1024 generated samples in Section D.1 of our paper. We can observe that the performance is similar to that of using a smaller number of samples. Thus evaluating the performance with a small number of samples (which is actually the same number of graphs in the test set) would not attribute to the large variance.

qiyan98 · 2022-08-20T20:48:32Z

Hi Jaehyeong,

Thanks for your explanation on the randomness. By the way, your paper presents an interesting variant, GDSS-seq, where the interaction modeling between adjacency matrix and node features is not as good as GDSS. I wondered what would happen if there was no joint generation of node features, for example, to generate the grid dataset without using the one-hot encoded degree embedding. What's your take on this?

Many thanks,
Qi

harryjo97 · 2022-08-21T08:17:48Z

Hi Qi,

Thanks for your interest. You could modify grid.yaml by changing data.init from "deg" to "zeros" or "ones" to test the effect of different node features.

For the community_small dataset, using one-hot encoded degree embedding resulted in better performance compared to using ones or zeros node features, since exploiting the degree information leads to easier learning of node-edge dependency.

qiyan98 · 2022-08-23T17:08:08Z

Hi Jaehyeong,

Thanks for your helpful comments! Have a wonderful day.

Best,
Qi

qiyan98 changed the title ~~Variance of the sampling performance~~ Variance of the training/testing results Aug 17, 2022

harryjo97 closed this as completed Aug 24, 2022

haotiansun14 mentioned this issue Sep 7, 2022

Having Error 'c10::HIPError' #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variance of the training/testing results #5

Variance of the training/testing results #5

qiyan98 commented Aug 17, 2022

harryjo97 commented Aug 18, 2022

qiyan98 commented Aug 20, 2022

harryjo97 commented Aug 21, 2022

qiyan98 commented Aug 23, 2022

Variance of the training/testing results #5

Variance of the training/testing results #5

Comments

qiyan98 commented Aug 17, 2022

harryjo97 commented Aug 18, 2022

qiyan98 commented Aug 20, 2022

harryjo97 commented Aug 21, 2022

qiyan98 commented Aug 23, 2022