# The Weisfeiler-Lehman Isomorphism Test

The Weisfeiler-Lehman Isomorphism Test, also called WL-Test, is a classical result from the graph theory. It is an heuristic to find out if two graphs are isomorphic. The problem of the graph isomorphism does not have a definitive solution in polinomial time yet, making some people say it may be NP-Complete. The WL-Test offers one alternative that allows us to estimate it.

However, since this is an heuristic, the test is not perfect. Therefore, it fail in some simple cases and because of that new versions of this test were proposed.

In this notebook we will explore a little about the theory of the test and implement it in its classical form and also some newer versions.

In [3]:
!pip3 install networkx

Defaulting to user installation because normal site-packages is not writeable
Collecting networkx
  Downloading networkx-3.4.2-py3-none-any.whl (1.7 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m0m eta [36m0:00:01[0m[36m0:00:01[0m
[0mInstalling collected packages: networkx
Successfully installed networkx-3.4.2


In [52]:
import networkx as nx
def load_graph(file):
    G = nx.Graph(nx.read_graphml(file))
    G.graph['phrase'] = G.graph.get('phrase', 'No phrase found')
    return G

# New implementation (gpt)

In [56]:
from __future__ import annotations
from pathlib import Path
from collections import defaultdict
from collections import Counter

def wl_signature_strong(G: nx.Graph) -> str:
    G = nx.relabel.convert_node_labels_to_integers(G, ordering="decreasing degree")
    n = G.number_of_nodes()
    adj = [list(G.neighbors(v)) for v in range(n)]
    colors = [0]*n
    tmp = [None]*n

    while True:
        for v in range(n):
            neigh_cols = sorted(colors[u] for u in adj[v])
            tmp[v] = (colors[v], tuple(neigh_cols))
        mapping, next_c = {}, 0
        new_colors = [0]*n
        for v in range(n):
            key = tmp[v]
            if key not in mapping:
                mapping[key] = next_c
                next_c += 1
            new_colors[v] = mapping[key]
        if new_colors == colors:
            break
        colors = new_colors

    # Stronger signature: ordered list of final colors
    sig_vec = ','.join(map(str, colors))

    # Optional: also bake in sorted (color, degree) histogram for robustness
    hist = Counter(colors)
    sig_hist = '|'.join(f'{c}:{hist[c]}' for c in sorted(hist))
    return f"{len(colors)}#{sig_vec}#{sig_hist}"


In [57]:
from pathlib import Path
GRAPH_DIR = "./UD_Spanish-GSD"
# --------- CONFIG ---------
FOLDER = Path(GRAPH_DIR)
OUT_CSV = Path("isomorphic_groups3.csv")

In [58]:
from tqdm import tqdm  # progress bar for long operations

groups = defaultdict(list)

for path in tqdm(FOLDER.rglob("*.graphml")):
    try:
         G = load_graph(path)
    except Exception as e:
        print(f"[WARN] Could not read {path}: {e}")
        continue

    sig = wl_signature_strong(G)
    groups[sig].append(str(path))

# --------- Write result ---------
with OUT_CSV.open("w", encoding="utf-8") as f:
    f.write("signature;count;files\n")
    for sig, files in groups.items():
        f.write(f"{sig};{len(files)};\"{'|'.join(files)}\"\n")

print(f"Done. Wrote {len(groups)} isomorphism classes to {OUT_CSV}")

14187it [00:07, 1960.68it/s]

Done. Wrote 12336 isomorphism classes to isomorphic_groups3.csv





In [59]:
import pandas as pd

df = pd.read_csv("isomorphic_groups3.csv", sep=";")
df


Unnamed: 0,signature,count,files
0,"9#0,1,2,3,3,3,2,2,2#0:1|1:1|2:4|3:3",5,UD_Spanish-GSD/es_gsd-ud-train_7825.graphml|UD...
1,"19#0,1,2,3,4,5,6,6,7,8,8,9,10,10,10,11,11,6,9#...",1,UD_Spanish-GSD/es_gsd-ud-train_12948.graphml
2,"24#0,1,2,3,4,5,6,7,8,9,9,10,10,11,12,13,11,13,...",1,UD_Spanish-GSD/es_gsd-ud-train_9745.graphml
3,"37#0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,14,15,16...",1,UD_Spanish-GSD/es_gsd-ud-train_3290.graphml
4,"50#0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,15,16...",1,UD_Spanish-GSD/es_gsd-ud-train_1588.graphml
...,...,...,...
12331,"34#0,1,2,3,4,5,6,7,8,9,10,11,12,12,12,12,12,13...",1,UD_Spanish-GSD/es_gsd-ud-train_7590.graphml
12332,"29#0,1,2,3,4,5,6,7,8,9,10,11,10,12,13,14,14,14...",1,UD_Spanish-GSD/es_gsd-ud-train_11797.graphml
12333,"12#0,1,2,3,4,5,6,7,8,7,5,8#0:1|1:1|2:1|3:1|4:1...",1,UD_Spanish-GSD/es_gsd-ud-train_8663.graphml
12334,"15#0,1,2,3,4,5,6,7,8,8,8,9,5,5,8#0:1|1:1|2:1|3...",1,UD_Spanish-GSD/es_gsd-ud-train_1945.graphml


In [60]:
df[df["signature"] == "0:2|1:1|2:1|3:1|4:1|5:1|6:1|7:1"]

Unnamed: 0,signature,count,files
