Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to include initial labels in weisfeiler_lehman_subgraph_hashes #6601

Merged
merged 5 commits into from Mar 10, 2024

Conversation

aaronzo
Copy link
Contributor

@aaronzo aaronzo commented Apr 1, 2023

Apologies for the spam folks, I thought I had caught the mistakes in #6598 but had to correct some things further down too - good catch from @dschult in the PR comments.

I've read the whole docstring through a couple of times now and it looks right.

@dschult
Copy link
Member

dschult commented Apr 1, 2023

I think the original description is correct. Each node's label is created by combining the node labels of nodes within i-hops of the node. Since those nodes at distance i have their labels created from nodes at distance i, the node labels are influenced by all nodes within $2i$ hops of the given node. The fact that the doc_string used 2i instead of i in so many places made me suspect that the 2i is correct and not just a typo. My investigation leads me to think the 2i is correct.

What do you think? If it really should be 2i then we should change back the switch in that text from #6598. Other changes in the doc_string may or may not need adjusting too. And maybe we should add enough text to make this aspect of the description more clear.

@aaronzo
Copy link
Contributor Author

aaronzo commented Apr 1, 2023

I think the original description is correct. Each node's label is created by combining the node labels of nodes within i-hops of the node. Since those nodes at distance i have their labels created from nodes at distance i, the node labels are influenced by all nodes within 2i hops of the given node. The fact that the doc_string used 2i instead of i in so many places made me suspect that the 2i is correct and not just a typo. My investigation leads me to think the 2i is correct.

Just for context, I originally wrote the function & docstring, and I think this error is an easy mistake to make - which is probably why I got it wrong the first time around. But I'm quite sure it should be just $i$. Let me try and take multiple stabs at explaining it, and hopefully one of them is useful.

The flaw in your description is that a node label at iteration $i$ is influenced by its immediate neighbors labels on iteration $i-1$, and not in the same iteration.

At each iteration, every node receives information from its immediate neighbors only. Therefore, information cannot possibly traverse the network at a faster rate than 1 edge per iteration.

The algorithm is essentially a message passing algorithm, like pregel or like graph convolutional networks. What we're describing here is the 'receptive field' of the algorithm, and literature like slide 40 in this stanford course http://snap.stanford.edu/class/cs246-2021/slides/20-GNN.pdf establish that K layers = K hop receptive field.

Finally, a practical way of convincing yourself is to override the _hash_label function in the algorithm to lambda u, *_: str(u) - then, instead of the labels being hashed, you will see exactly which initial labels appear in which labels per iteration.

Notice that in the 2 successive iterations, never is a node label more than $i$ distance from the original node present in the path graph

Here's a snippet to show it:

import networkx as nx
import matplotlib.pyplot as plt

nx.algorithms.graph_hashing._hash_label = lambda u, *_: str(u)

G = nx.path_graph(5)
nx.set_node_attributes(G, {u: u for u in G}, "label0")

wl = nx.weisfeiler_lehman_subgraph_hashes(G, node_attr="label0")
nx.set_node_attributes(G, {u: wl[u][0] for u in G}, "label1")
nx.set_node_attributes(G, {u: wl[u][1] for u in G}, "label2")

_, axs = plt.subplots(3, figsize=(8, 20))
pos = nx.spring_layout(G)
nx.draw(G, pos=pos, ax=axs[0], labels=nx.get_node_attributes(G, "label0"))
nx.draw(G, pos=pos, ax=axs[1], labels=nx.get_node_attributes(G, "label1"))
nx.draw(G, pos=pos, ax=axs[2], labels=nx.get_node_attributes(G, "label2"))
plt.show()

image

@dschult
Copy link
Member

dschult commented Apr 3, 2023

So, you are saying that each iteration only collects the labels from the neighbors... and not the neighbors-neighbors when i=2 and the nbrs-nbrs-nbrs when i=3... etc. OK. clearly I misunderstood.

Another change that would be helpful is to go through the doc_string and use i for the iteration number everywhere. I think that k and n are used where i would be more consistent. And n is already being used for a node so shouldn't be used for the iteration count. (and maybe I've got this wrong too and they should be k, n, i but it looks strange to me.)

@aaronzo
Copy link
Contributor Author

aaronzo commented Apr 8, 2023

I think I caught all the cases @dschult - I changed $n$ to $u$ as well since it seems more common.

@dschult
Copy link
Member

dschult commented Mar 4, 2024

Sorry this has taken so long to get back to -- and Thanks for responding so quickly to the new Issue! :)

It looks like this PR is now conflicting with main in one of the doc strings. Can you merge/rebase the main branch into this one? Go ahead and make the changes needed to fix #7330 and flag me -- I will try to make it a fast review this time. :)

@aaronzo aaronzo force-pushed the further-wl-docstring-improvements branch from 9795796 to 055f243 Compare March 6, 2024 22:47
Comment on lines 177 to 183
Lists of subgraph hashes are sorted in increasing order of depth from
their root node, with the hash at index i corresponding to a subgraph
of nodes at most i edges distance from u. Thus, each list will contain
``iterations + 1`` elements - a hash for a subgraph at each depth, and
additionally a hash of the initial node label (or equivalently a
subgraph of depth 0)
`iterations` elements - a hash for a subgraph at each depth. If
`include_initial_labels` is set to `True`, each list will additionally
have contain a hash of the initial node label (or equivalently a
subgraph of depth 0) prepended, totalling `iterations + 1` elements.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I be using latex notation in this paragraph for i and u?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general guide I use is "test is better than latex except when it isn't". So symbols like i and u should be kept in text rather than LaTeX. Mathematical expressions can become unreadable in text format so that's when we should use LaTeX.

@aaronzo
Copy link
Contributor Author

aaronzo commented Mar 6, 2024

@dschult went ahead and implemented the changes discussed in #7330 as well as fixing the docstring. I'm also bit confused whether I should be using single or double backticks in the docstring.

@aaronzo aaronzo changed the title further docstring improvements to weisfeiler_lehman_subgraph_hashes Option to include initial labels in weisfeiler_lehman_subgraph_hashes Mar 6, 2024
Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @aaronzo , just a minor nit on backticks. FWIW the rule of thumb re: backticks is single backticks for parameters and link targets, double backticks for inline literals.

networkx/algorithms/graph_hashing.py Outdated Show resolved Hide resolved
@kalekundert
Copy link

The changes relating to #7330 look good to me. Thanks for getting to this so quickly!

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aaronzo and @kalekundert for this! And @rossbar for review

@dschult dschult merged commit ef5f9ac into networkx:main Mar 10, 2024
41 checks passed
@jarrodmillman jarrodmillman added this to the 3.3 milestone Mar 10, 2024
@aaronzo aaronzo deleted the further-wl-docstring-improvements branch March 11, 2024 12:13
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
…s` (networkx#6601)

* docstring

* change n -> u for node in docstring

* fix issue 7330

* Update networkx/algorithms/graph_hashing.py

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

---------

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants