Option to include initial labels in `weisfeiler_lehman_subgraph_hashes` #6601

aaronzo · 2023-04-01T19:06:36Z

Apologies for the spam folks, I thought I had caught the mistakes in #6598 but had to correct some things further down too - good catch from @dschult in the PR comments.

I've read the whole docstring through a couple of times now and it looks right.

dschult · 2023-04-01T21:26:11Z

I think the original description is correct. Each node's label is created by combining the node labels of nodes within i-hops of the node. Since those nodes at distance i have their labels created from nodes at distance i, the node labels are influenced by all nodes within $2i$ hops of the given node. The fact that the doc_string used 2i instead of i in so many places made me suspect that the 2i is correct and not just a typo. My investigation leads me to think the 2i is correct.

What do you think? If it really should be 2i then we should change back the switch in that text from #6598. Other changes in the doc_string may or may not need adjusting too. And maybe we should add enough text to make this aspect of the description more clear.

aaronzo · 2023-04-01T21:42:34Z

I think the original description is correct. Each node's label is created by combining the node labels of nodes within i-hops of the node. Since those nodes at distance i have their labels created from nodes at distance i, the node labels are influenced by all nodes within 2i hops of the given node. The fact that the doc_string used 2i instead of i in so many places made me suspect that the 2i is correct and not just a typo. My investigation leads me to think the 2i is correct.

Just for context, I originally wrote the function & docstring, and I think this error is an easy mistake to make - which is probably why I got it wrong the first time around. But I'm quite sure it should be just $i$. Let me try and take multiple stabs at explaining it, and hopefully one of them is useful.

The flaw in your description is that a node label at iteration $i$ is influenced by its immediate neighbors labels on iteration $i-1$, and not in the same iteration.

At each iteration, every node receives information from its immediate neighbors only. Therefore, information cannot possibly traverse the network at a faster rate than 1 edge per iteration.

The algorithm is essentially a message passing algorithm, like pregel or like graph convolutional networks. What we're describing here is the 'receptive field' of the algorithm, and literature like slide 40 in this stanford course http://snap.stanford.edu/class/cs246-2021/slides/20-GNN.pdf establish that K layers = K hop receptive field.

Finally, a practical way of convincing yourself is to override the _hash_label function in the algorithm to lambda u, *_: str(u) - then, instead of the labels being hashed, you will see exactly which initial labels appear in which labels per iteration.

Notice that in the 2 successive iterations, never is a node label more than $i$ distance from the original node present in the path graph

Here's a snippet to show it:

import networkx as nx
import matplotlib.pyplot as plt

nx.algorithms.graph_hashing._hash_label = lambda u, *_: str(u)

G = nx.path_graph(5)
nx.set_node_attributes(G, {u: u for u in G}, "label0")

wl = nx.weisfeiler_lehman_subgraph_hashes(G, node_attr="label0")
nx.set_node_attributes(G, {u: wl[u][0] for u in G}, "label1")
nx.set_node_attributes(G, {u: wl[u][1] for u in G}, "label2")

_, axs = plt.subplots(3, figsize=(8, 20))
pos = nx.spring_layout(G)
nx.draw(G, pos=pos, ax=axs[0], labels=nx.get_node_attributes(G, "label0"))
nx.draw(G, pos=pos, ax=axs[1], labels=nx.get_node_attributes(G, "label1"))
nx.draw(G, pos=pos, ax=axs[2], labels=nx.get_node_attributes(G, "label2"))
plt.show()

dschult · 2023-04-03T02:51:09Z

So, you are saying that each iteration only collects the labels from the neighbors... and not the neighbors-neighbors when i=2 and the nbrs-nbrs-nbrs when i=3... etc. OK. clearly I misunderstood.

Another change that would be helpful is to go through the doc_string and use i for the iteration number everywhere. I think that k and n are used where i would be more consistent. And n is already being used for a node so shouldn't be used for the iteration count. (and maybe I've got this wrong too and they should be k, n, i but it looks strange to me.)

aaronzo · 2023-04-08T23:42:52Z

I think I caught all the cases @dschult - I changed $n$ to $u$ as well since it seems more common.

dschult · 2024-03-04T03:51:37Z

Sorry this has taken so long to get back to -- and Thanks for responding so quickly to the new Issue! :)

It looks like this PR is now conflicting with main in one of the doc strings. Can you merge/rebase the main branch into this one? Go ahead and make the changes needed to fix #7330 and flag me -- I will try to make it a fast review this time. :)

aaronzo · 2024-03-06T22:48:41Z

networkx/algorithms/graph_hashing.py

    Lists of subgraph hashes are sorted in increasing order of depth from
    their root node, with the hash at index i corresponding to a subgraph
    of nodes at most i edges distance from u. Thus, each list will contain
-    ``iterations + 1`` elements - a hash for a subgraph at each depth, and
-    additionally a hash of the initial node label (or equivalently a
-    subgraph of depth 0)
+    `iterations` elements - a hash for a subgraph at each depth. If
+    `include_initial_labels` is set to `True`, each list will additionally
+    have contain a hash of the initial node label (or equivalently a
+    subgraph of depth 0) prepended, totalling `iterations + 1` elements.


should I be using latex notation in this paragraph for i and u?

The general guide I use is "test is better than latex except when it isn't". So symbols like i and u should be kept in text rather than LaTeX. Mathematical expressions can become unreadable in text format so that's when we should use LaTeX.

aaronzo · 2024-03-06T22:51:26Z

@dschult went ahead and implemented the changes discussed in #7330 as well as fixing the docstring. I'm also bit confused whether I should be using single or double backticks in the docstring.

rossbar

LGTM, thanks @aaronzo , just a minor nit on backticks. FWIW the rule of thumb re: backticks is single backticks for parameters and link targets, double backticks for inline literals.

networkx/algorithms/graph_hashing.py

kalekundert · 2024-03-07T13:13:59Z

The changes relating to #7330 look good to me. Thanks for getting to this so quickly!

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

dschult

Thanks @aaronzo and @kalekundert for this! And @rossbar for review

…s` (networkx#6601) * docstring * change n -> u for node in docstring * fix issue 7330 * Update networkx/algorithms/graph_hashing.py Co-authored-by: Ross Barnowski <rossbar@berkeley.edu> --------- Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

aaronzo mentioned this pull request Apr 1, 2023

corrections to docstring of weisfeiler_lehman_subgraph_hashes #6598

Merged

aaronzo mentioned this pull request Mar 3, 2024

nx.weisfeiler_lehman_subgraph_hashes() doesn't return initial node label hashes #7330

Closed

aaronzo and others added 3 commits March 6, 2024 22:45

docstring

1e0d0bc

change n -> u for node in docstring

accdf35

fix issue 7330

055f243

aaronzo force-pushed the further-wl-docstring-improvements branch from 9795796 to 055f243 Compare March 6, 2024 22:47

aaronzo commented Mar 6, 2024

View reviewed changes

aaronzo changed the title ~~further docstring improvements to weisfeiler_lehman_subgraph_hashes~~ Option to include initial labels in weisfeiler_lehman_subgraph_hashes Mar 6, 2024

rossbar approved these changes Mar 7, 2024

View reviewed changes

networkx/algorithms/graph_hashing.py Outdated Show resolved Hide resolved

dschult added type: Enhancements type: Documentation labels Mar 7, 2024

aaronzo and others added 2 commits March 8, 2024 13:32

Merge branch 'main' into further-wl-docstring-improvements

90d588b

Update networkx/algorithms/graph_hashing.py

3fa3e4d

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

dschult approved these changes Mar 10, 2024

View reviewed changes

dschult merged commit ef5f9ac into networkx:main Mar 10, 2024
41 checks passed

jarrodmillman added this to the 3.3 milestone Mar 10, 2024

aaronzo deleted the further-wl-docstring-improvements branch March 11, 2024 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to include initial labels in `weisfeiler_lehman_subgraph_hashes` #6601

Option to include initial labels in `weisfeiler_lehman_subgraph_hashes` #6601

aaronzo commented Apr 1, 2023

dschult commented Apr 1, 2023

aaronzo commented Apr 1, 2023 •

edited

dschult commented Apr 3, 2023

aaronzo commented Apr 8, 2023

dschult commented Mar 4, 2024

aaronzo Mar 6, 2024

dschult Mar 7, 2024

aaronzo commented Mar 6, 2024

rossbar left a comment

kalekundert commented Mar 7, 2024

dschult left a comment

Option to include initial labels in weisfeiler_lehman_subgraph_hashes #6601

Option to include initial labels in weisfeiler_lehman_subgraph_hashes #6601

Conversation

aaronzo commented Apr 1, 2023

dschult commented Apr 1, 2023

aaronzo commented Apr 1, 2023 • edited

dschult commented Apr 3, 2023

aaronzo commented Apr 8, 2023

dschult commented Mar 4, 2024

aaronzo Mar 6, 2024

Choose a reason for hiding this comment

dschult Mar 7, 2024

Choose a reason for hiding this comment

aaronzo commented Mar 6, 2024

rossbar left a comment

Choose a reason for hiding this comment

kalekundert commented Mar 7, 2024

dschult left a comment

Choose a reason for hiding this comment

Option to include initial labels in `weisfeiler_lehman_subgraph_hashes` #6601

Option to include initial labels in `weisfeiler_lehman_subgraph_hashes` #6601

aaronzo commented Apr 1, 2023 •

edited