Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variance of the training/testing results #5

Closed
qiyan98 opened this issue Aug 17, 2022 · 4 comments
Closed

Variance of the training/testing results #5

qiyan98 opened this issue Aug 17, 2022 · 4 comments

Comments

@qiyan98
Copy link

qiyan98 commented Aug 17, 2022

Hi there,

Thanks for sharing the code for your wonderful project. I have a question about the variance of the sampling results. I ran the training on the grid dataset using the default config file (with an arbitrary seed).

The testing-time performance metrics I got are:
MMD_full {'degree': 0.460601, 'cluster': 0.008495, 'orbit': 0.126024, 'spectral': 0.681714}

On the other hand, The MMD results claimed on the paper are: deg: 0.111, clus: 0.005, orbit: 0.070 for GDSS and deg: 0.171, clus: 0.011, orbit: 0.223 for GDSS-seq.

The MMD results of the samples generated by the provided checkpoint model are:
MMD_full {'degree': 0.093013, 'cluster': 0.00718, 'orbit': 0.101709, 'spectral': 0.793645}

I understand that the random seed could affect the sampling results. But this variance is a bit large in my perspective (especially the network I trained myself). Do you have any insights about this? The previous EDP-GNN baseline seems to have a large variance when the number of generated samples is small. Do you think it could be attributed to the intrinsics of the score-based model?

Best,
Qi

@qiyan98 qiyan98 changed the title Variance of the sampling performance Variance of the training/testing results Aug 17, 2022
@harryjo97
Copy link
Owner

Hi Qi,

Thanks for reaching out. Although we did not encounter such a large variance problem for our trained models, the random seeds of training and sampling could affect the results since the training/sampling of score-based models depends heavily on the sampled noises. This could be intensified for larger graphs such as grid.

Furthermore, we provide the generation performance of using 1024 generated samples in Section D.1 of our paper. We can observe that the performance is similar to that of using a smaller number of samples. Thus evaluating the performance with a small number of samples (which is actually the same number of graphs in the test set) would not attribute to the large variance.

@qiyan98
Copy link
Author

qiyan98 commented Aug 20, 2022

Hi Jaehyeong,

Thanks for your explanation on the randomness. By the way, your paper presents an interesting variant, GDSS-seq, where the interaction modeling between adjacency matrix and node features is not as good as GDSS. I wondered what would happen if there was no joint generation of node features, for example, to generate the grid dataset without using the one-hot encoded degree embedding. What's your take on this?

Many thanks,
Qi

@harryjo97
Copy link
Owner

Hi Qi,

Thanks for your interest. You could modify grid.yaml by changing data.init from "deg" to "zeros" or "ones" to test the effect of different node features.

For the community_small dataset, using one-hot encoded degree embedding resulted in better performance compared to using ones or zeros node features, since exploiting the degree information leads to easier learning of node-edge dependency.

@qiyan98
Copy link
Author

qiyan98 commented Aug 23, 2022

Hi Jaehyeong,

Thanks for your helpful comments! Have a wonderful day.

Best,
Qi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants