About what cSBM parameters to use #4

Xiuyu-Li · 2021-06-02T01:19:18Z

In the Appendix A.5 of the paper, it is stated that n=5000 and f=2000. However, create_cSBM_dataset.sh set n=800 and f=1000. Which set of parameters should I use?

Also, I was not able to find what average degree was used in the paper. Should I just set it to 5 as in create_cSBM_dataset.sh? Thanks.

The text was updated successfully, but these errors were encountered:

Xiuyu-Li · 2021-06-02T01:51:00Z

If I used the parameters listed in the paper and set the average degree to 5, I got the following edge homophily table

\phi	-1	-0.75	-0.5	-0.25	0	0.25	0.5
H(G)	0.042	0.075	0.171	0.323	0.5	0.678	0.824

which is much more homophilic than the table in the paper. Can you tell me the exact parameters used for generating cSBM synthetic datasets?

jianhao2016 · 2021-06-02T03:33:06Z

Hi Xiuyu,

Thank you for your interest in our work. For cSBM datasets you should use what we have stated in the supplement, i.e. n = 5000 and f = 2000. As for average degree, the default of 5 should be fine. For the homophily table, can you elaborate a bit more on how you calculate the value, or pasted here the function you used? Also, what is the value of epsilon you used? It should be 3.25 instead of the default 0.1 which will be too small.

Xiuyu-Li · 2021-06-02T04:01:13Z

Hi Jianhao,

Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:

def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()

which should be consistent with how H(G) was defined.

jianhao2016 · 2021-06-02T04:09:56Z

Hi Jianhao,

Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:

def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()

which should be consistent with how H(G) was defined.

Can you show the code for remove_self_loops as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.

Xiuyu-Li · 2021-06-02T04:11:34Z

Hi Jianhao,
Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:
def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()
which should be consistent with how H(G) was defined.
Can you show the code for remove_self_loops as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.

Sure. It is just the torch_geometric function from torch_geometric.utils import remove_self_loops.

jianhao2016 · 2021-06-02T05:45:36Z

Hi Jianhao,
Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:
def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()
which should be consistent with how H(G) was defined.
Can you show the code for remove_self_loops as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.
Sure. It is just the torch_geometric function from torch_geometric.utils import remove_self_loops.

Hi Xiuyu,

Thank you so much for pointing out this issue! I have tested both your function and our previous function with cSBM dataset and other simple graph and it turns out you're correct about the homophily scores. There is a small bug in our code when computing the homophily scores (doing division with torch integers) which caused the numbers to be smaller. I have fixed it and got similar results as yours. We will update the values in our paper accordingly. Thanks again for letting us know!

jianhao2016 closed this as completed Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About what cSBM parameters to use #4

About what cSBM parameters to use #4

Xiuyu-Li commented Jun 2, 2021

Xiuyu-Li commented Jun 2, 2021

jianhao2016 commented Jun 2, 2021

Xiuyu-Li commented Jun 2, 2021

jianhao2016 commented Jun 2, 2021

Xiuyu-Li commented Jun 2, 2021

jianhao2016 commented Jun 2, 2021 •

edited

About what cSBM parameters to use #4

About what cSBM parameters to use #4

Comments

Xiuyu-Li commented Jun 2, 2021

Xiuyu-Li commented Jun 2, 2021

jianhao2016 commented Jun 2, 2021

Xiuyu-Li commented Jun 2, 2021

jianhao2016 commented Jun 2, 2021

Xiuyu-Li commented Jun 2, 2021

jianhao2016 commented Jun 2, 2021 • edited

jianhao2016 commented Jun 2, 2021 •

edited