# The Weisfeiler-Lehman Isomorphism Test

The Weisfeiler-Lehman Isomorphism Test, also called WL-Test, is a classical result from the graph theory. It is an heuristic to find out if two graphs are isomorphic. The problem of the graph isomorphism does not have a definitive solution in polinomial time yet, making some people say it may be NP-Complete. The WL-Test offers one alternative that allows us to estimate it.

However, since this is an heuristic, the test is not perfect. Therefore, it fail in some simple cases and because of that new versions of this test were proposed.

In this notebook we will explore a little about the theory of the test and implement it in its classical form and also some newer versions.

In [3]:
!pip3 install networkx

Defaulting to user installation because normal site-packages is not writeable
Collecting networkx
  Downloading networkx-3.4.2-py3-none-any.whl (1.7 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m0m eta [36m0:00:01[0m[36m0:00:01[0m
[0mInstalling collected packages: networkx
Successfully installed networkx-3.4.2


In [4]:
import networkx as nx
def load_graph(file):
    G = nx.read_graphml(file).to_undirected()
    G.graph['phrase'] = G.graph.get('phrase', 'No phrase found')
    return G

# New implementation (gpt)

In [None]:
from __future__ import annotations
from collections import defaultdict
from collections import Counter

# ---------- WL over trees ----------
def wl_tree_signature(G: nx.Graph) -> str:
    """
    1-WL color refinement on an (unlabeled) tree G.
    Returns a canonical signature string usable as a dict key.
    """
    # Relabel to 0..n-1 for array-friendly processing
    G = nx.relabel.convert_node_labels_to_integers(G, ordering="sorted")  # keeps attrs by default
    n = G.number_of_nodes()

    # adjacency as list of lists for speed
    adj = [list(G.neighbors(v)) for v in range(n)]  # neighbors() yields iterator.  [oai_citation:6‡networkx.org](https://networkx.org/documentation/stable/reference/classes/generated/networkx.Graph.neighbors.html?utm_source=chatgpt.com)
    colors = [0] * n
    tmp = [None] * n
    i = 0
    signature = ""
    while True:      
        
        for v in range(n):
            neigh_cols = sorted(colors[u] for u in adj[v])
            tmp[v] = str(colors[v]) +"_"+ str(tuple(neigh_cols))
        # compress tuples → small ints
        mapping = {}
        next_c = 0
        new_colors = [0] * n
        for v in range(n):
            key = tmp[v]
            #print(f"key is {key}")
            if key not in mapping:
                mapping[key] = next_c
                next_c += 1
            new_colors[v] = mapping[key]
        if new_colors == colors:
            break
        colors = new_colors
    hist = defaultdict(int)
    
    for c in tmp:
        hist[c] += 1
    signature += "|".join(f"{c}:{hist[c]}" for c in sorted(hist))
    return signature


In [30]:
from pathlib import Path
GRAPH_DIR = "./UD_Spanish-GSD"
# --------- CONFIG ---------
FOLDER = Path(GRAPH_DIR)
OUT_CSV = Path("isomorphic_groups_distances_>10.csv")

In [31]:
import copy
from tqdm import tqdm

def get_graph_dataset(folder):
    
    iso_groups = defaultdict(set)

    for path in tqdm(folder.rglob("*.graphml")):
        G = load_graph(path)
        if G.number_of_nodes() < 10:
            continue
        sig = wl_tree_signature(G)
        iso_groups[sig].add(str(path))

        for node in G.nodes:

            copy_G = copy.deepcopy(G)
            copy_G.remove_node(node)
            sig = wl_tree_signature(copy_G)
            iso_groups[sig].add(f"({node})-" + str(path))
            

    return iso_groups

In [32]:
groups = get_graph_dataset(FOLDER)

# --------- Write result ---------
with OUT_CSV.open("w", encoding="utf-8") as f:
    f.write("signature;count;files\n")
    for sig, files in groups.items():
        f.write(f"{sig};{len(files)};\"{'|'.join(files)}\"\n")

print(f"Done. Wrote {len(groups)} isomorphism classes to {OUT_CSV}")

14187it [01:38, 143.97it/s]


Done. Wrote 296559 isomorphism classes to isomorphic_groups_distances_>10.csv


In [33]:
import pandas as pd

df = pd.read_csv("isomorphic_groups_distances_>10.csv", sep=";")
df


Unnamed: 0,signature,count,files
0,"0_(7,):2|10_(11,):1|11_(1, 9, 10):1|1_(2, 2, 2...",1,UD_Spanish-GSD/es_gsd-ud-train_12948.graphml
1,"0_():2|10_(1, 8, 9):1|1_(2, 2, 2, 4, 6, 10):1|...",1,(2)-UD_Spanish-GSD/es_gsd-ud-train_12948.graphml
2,"0_(1, 1, 1, 3, 5, 11):1|10_(11,):1|11_(0, 9, 1...",1,(1)-UD_Spanish-GSD/es_gsd-ud-train_12948.graphml
3,"0_(7,):2|10_(1, 9):1|1_(2, 2, 2, 4, 6, 10):1|2...",1,(4)-UD_Spanish-GSD/es_gsd-ud-train_12948.graphml
4,"0_(7,):1|10_(11,):1|11_(1, 9, 10):1|1_(2, 2, 2...",1,(19)-UD_Spanish-GSD/es_gsd-ud-train_12948.graphml
...,...,...,...
296554,"0_(14,):4|10_(8, 9, 12):1|11_(12,):2|12_(10, 1...",1,(17)-UD_Spanish-GSD/es_gsd-ud-train_6903.graphml
296555,"0_(14,):4|10_(9, 12):1|11_(12,):2|12_(10, 11, ...",1,(21)-UD_Spanish-GSD/es_gsd-ud-train_6903.graphml
296556,"0_(14,):4|10_(11,):1|11_(9, 10):1|12_():2|13_(...",1,(25)-UD_Spanish-GSD/es_gsd-ud-train_6903.graphml
296557,"0_(13,):4|10_(11,):2|11_(9, 10, 10):1|12_(14,)...",1,(16)-UD_Spanish-GSD/es_gsd-ud-train_6903.graphml


In [50]:
signature = df[df["count"]==10]["signature"][134907]
df[df["count"]==10]#["signature"][189848]

Unnamed: 0,signature,count,files
2287,"0_(0,):2|1_():2|2_(3,):1|3_(2, 5):1|4_(5,):2|5...",10,(3)-UD_Spanish-GSD/es_gsd-ud-train_2718.graphm...
2292,"0_(2,):1|1_(3,):2|2_(0, 3):1|3_(1, 1, 2, 4):1|...",10,(7)-UD_Spanish-GSD/es_gsd-ud-train_9720.graphm...
3389,"0_(4, 5):1|1_(2, 2, 2, 6):1|2_(1,):3|3_(5,):3|...",10,(6)-UD_Spanish-GSD/es_gsd-ud-train_11997.graph...
12135,"0_(12,):3|10_(9, 9, 11):3|11_(6, 10, 10, 10, 1...",10,(22)-UD_Spanish-GSD/es_gsd-ud-train_10635.grap...
12146,"0_(12,):3|10_(9, 9, 11):3|11_(6, 10, 10, 10, 1...",10,(8)-UD_Spanish-GSD/es_gsd-ud-train_13243.graph...
...,...,...,...
134907,"0_(1,):6|1_(0, 0, 0, 3):2|2_(3,):1|3_(1, 1, 2,...",10,(5)-UD_Spanish-GSD/es_gsd-ud-train_6273.graphm...
146151,"0_(13, 16):1|10_(11,):8|11_(1, 10, 10):4|12_(1...",10,(27)-UD_Spanish-GSD/es_gsd-ud-train_13937.grap...
156842,"0_(11,):4|10_(6, 9, 13):1|11_(0, 0, 0, 0, 19):...",10,(43)-UD_Spanish-GSD/es_gsd-ud-train_11917.grap...
160833,"0_(11,):2|10_(12,):4|11_(0, 28):2|12_(9, 10):4...",10,(58)-UD_Spanish-GSD/es_gsd-ud-train_7953.graph...


In [51]:
df[df["signature"] == signature]["files"].values[0].split("|")

['(5)-UD_Spanish-GSD/es_gsd-ud-train_6273.graphml',
 '(4)-UD_Spanish-GSD/es_gsd-ud-train_6863.graphml',
 '(4)-UD_Spanish-GSD/es_gsd-ud-train_6273.graphml',
 '(1)-UD_Spanish-GSD/es_gsd-ud-train_7887.graphml',
 '(2)-UD_Spanish-GSD/es_gsd-ud-train_7897.graphml',
 '(1)-UD_Spanish-GSD/es_gsd-ud-train_3901.graphml',
 '(6)-UD_Spanish-GSD/es_gsd-ud-train_1557.graphml',
 '(5)-UD_Spanish-GSD/es_gsd-ud-train_6863.graphml',
 '(3)-UD_Spanish-GSD/es_gsd-ud-train_2787.graphml',
 '(7)-UD_Spanish-GSD/es_gsd-ud-train_1557.graphml']