Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About what cSBM parameters to use #4

Closed
Xiuyu-Li opened this issue Jun 2, 2021 · 6 comments
Closed

About what cSBM parameters to use #4

Xiuyu-Li opened this issue Jun 2, 2021 · 6 comments

Comments

@Xiuyu-Li
Copy link

Xiuyu-Li commented Jun 2, 2021

In the Appendix A.5 of the paper, it is stated that n=5000 and f=2000. However, create_cSBM_dataset.sh set n=800 and f=1000. Which set of parameters should I use?

Also, I was not able to find what average degree was used in the paper. Should I just set it to 5 as in create_cSBM_dataset.sh? Thanks.

@Xiuyu-Li
Copy link
Author

Xiuyu-Li commented Jun 2, 2021

If I used the parameters listed in the paper and set the average degree to 5, I got the following edge homophily table

\phi -1 -0.75 -0.5 -0.25 0 0.25 0.5
H(G) 0.042 0.075 0.171 0.323 0.5 0.678 0.824

which is much more homophilic than the table in the paper. Can you tell me the exact parameters used for generating cSBM synthetic datasets?

@jianhao2016
Copy link
Owner

Hi Xiuyu,

Thank you for your interest in our work. For cSBM datasets you should use what we have stated in the supplement, i.e. n = 5000 and f = 2000. As for average degree, the default of 5 should be fine. For the homophily table, can you elaborate a bit more on how you calculate the value, or pasted here the function you used? Also, what is the value of epsilon you used? It should be 3.25 instead of the default 0.1 which will be too small.

@Xiuyu-Li
Copy link
Author

Xiuyu-Li commented Jun 2, 2021

Hi Jianhao,

Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:

def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()

which should be consistent with how H(G) was defined.

@jianhao2016
Copy link
Owner

Hi Jianhao,

Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:

def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()

which should be consistent with how H(G) was defined.

Can you show the code for remove_self_loops as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.

@Xiuyu-Li
Copy link
Author

Xiuyu-Li commented Jun 2, 2021

Hi Jianhao,
Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:

def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()

which should be consistent with how H(G) was defined.

Can you show the code for remove_self_loops as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.

Sure. It is just the torch_geometric function from torch_geometric.utils import remove_self_loops.

@jianhao2016
Copy link
Owner

jianhao2016 commented Jun 2, 2021

Hi Jianhao,
Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:

def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()

which should be consistent with how H(G) was defined.

Can you show the code for remove_self_loops as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.

Sure. It is just the torch_geometric function from torch_geometric.utils import remove_self_loops.

Hi Xiuyu,

Thank you so much for pointing out this issue! I have tested both your function and our previous function with cSBM dataset and other simple graph and it turns out you're correct about the homophily scores. There is a small bug in our code when computing the homophily scores (doing division with torch integers) which caused the numbers to be smaller. I have fixed it and got similar results as yours. We will update the values in our paper accordingly. Thanks again for letting us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants