# The Weisfeiler-Lehman Isomorphism Test

The Weisfeiler-Lehman Isomorphism Test, also called WL-Test, is a classical result from the graph theory. It is an heuristic to find out if two graphs are isomorphic. The problem of the graph isomorphism does not have a definitive solution in polinomial time yet, making some people say it may be NP-Complete. The WL-Test offers one alternative that allows us to estimate it.

However, since this is an heuristic, the test is not perfect. Therefore, it fail in some simple cases and because of that new versions of this test were proposed.

In this notebook we will explore a little about the theory of the test and implement it in its classical form and also some newer versions.

In [3]:
!pip3 install networkx

Defaulting to user installation because normal site-packages is not writeable
Collecting networkx
  Downloading networkx-3.4.2-py3-none-any.whl (1.7 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m0m eta [36m0:00:01[0m[36m0:00:01[0m
[0mInstalling collected packages: networkx
Successfully installed networkx-3.4.2


In [2]:
import networkx as nx
def load_graph(file):
    G = nx.read_graphml(file).to_undirected()
    G.graph['phrase'] = G.graph.get('phrase', 'No phrase found')
    return G

# New implementation (gpt)

In [9]:
from __future__ import annotations
from pathlib import Path
from collections import defaultdict
from collections import Counter
import copy

# ---------- WL over trees ----------
def wl_tree_signature(G: nx.Graph) -> str:
    """
    1-WL color refinement on an (unlabeled) tree G.
    Returns a canonical signature string usable as a dict key.
    """
    # Relabel to 0..n-1 for array-friendly processing
    G = nx.relabel.convert_node_labels_to_integers(G, ordering="sorted")  # keeps attrs by default
    n = G.number_of_nodes()

    # adjacency as list of lists for speed
    adj = [list(G.neighbors(v)) for v in range(n)]  # neighbors() yields iterator.  [oai_citation:6‡networkx.org](https://networkx.org/documentation/stable/reference/classes/generated/networkx.Graph.neighbors.html?utm_source=chatgpt.com)
    colors = [0] * n
    tmp = [None] * n
    i = 0
    signature = ""
    while True:      
        
        for v in range(n):
            neigh_cols = sorted(colors[u] for u in adj[v])
            tmp[v] = str(colors[v]) +"_"+ str(tuple(neigh_cols))
        # compress tuples → small ints
        mapping = {}
        next_c = 0
        new_colors = [0] * n
        for v in range(n):
            key = tmp[v]
            #print(f"key is {key}")
            if key not in mapping:
                mapping[key] = next_c
                next_c += 1
            new_colors[v] = mapping[key]
        if new_colors == colors:
            break
        colors = new_colors
    hist = defaultdict(int)
    
    for c in tmp:
        hist[c] += 1
    signature += "|".join(f"{c}:{hist[c]}" for c in sorted(hist))
    return signature


In [7]:
from pathlib import Path
GRAPH_DIR = "./UD_Spanish-GSD"
# --------- CONFIG ---------
FOLDER = Path(GRAPH_DIR)
OUT_CSV = Path("isomorphic_groups3.csv")

In [11]:
from tqdm import tqdm  # progress bar for long operations

groups = defaultdict(list)

for path in tqdm(FOLDER.rglob("*.graphml")):
    try:
         G = load_graph(path)
    except Exception as e:
        print(f"[WARN] Could not read {path}: {e}")
        continue

    sig = wl_tree_signature(G)

    groups[sig].append(str(path))

# --------- Write result ---------
with OUT_CSV.open("w", encoding="utf-8") as f:
    f.write("signature;count;files\n")
    for sig, files in groups.items():
        f.write(f"{sig};{len(files)};\"{'|'.join(files)}\"\n")

print(f"Done. Wrote {len(groups)} isomorphism classes to {OUT_CSV}")

14187it [00:08, 1769.69it/s]

Done. Wrote 12842 isomorphism classes to isomorphic_groups3.csv





In [12]:
import pandas as pd

df = pd.read_csv("isomorphic_groups3.csv", sep=";")
df


Unnamed: 0,signature,count,files
0,"0_(1,):4|1_(0, 0, 0, 0, 3):1|2_(3,):3|3_(1, 2,...",5,UD_Spanish-GSD/es_gsd-ud-train_7825.graphml|UD...
1,"0_(7,):2|10_(11,):1|11_(1, 9, 10):1|1_(2, 2, 2...",1,UD_Spanish-GSD/es_gsd-ud-train_12948.graphml
2,"0_(4,):3|10_(9, 11, 12):1|11_(4, 10):1|12_(10,...",1,UD_Spanish-GSD/es_gsd-ud-train_9745.graphml
3,"0_(8,):2|10_(11,):1|11_(9, 10):1|12_(20,):2|13...",1,UD_Spanish-GSD/es_gsd-ud-train_3290.graphml
4,"0_(13,):2|10_(8, 9, 9, 9, 9, 11):1|11_(10, 14,...",1,UD_Spanish-GSD/es_gsd-ud-train_1588.graphml
...,...,...,...
12837,"0_(17,):3|10_(11,):1|11_(9, 10, 13):1|12_(13,)...",1,UD_Spanish-GSD/es_gsd-ud-train_7590.graphml
12838,"0_(17,):3|10_(11,):1|11_(6, 10, 13, 15):1|12_(...",1,UD_Spanish-GSD/es_gsd-ud-train_11797.graphml
12839,"0_(4,):2|1_(2,):2|2_(1, 1, 8):1|3_(5,):2|4_(0,...",1,UD_Spanish-GSD/es_gsd-ud-train_8663.graphml
12840,"0_(5,):4|1_(2,):3|2_(1, 1, 1, 9):1|3_(4,):1|4_...",1,UD_Spanish-GSD/es_gsd-ud-train_1945.graphml


In [18]:
df[df["count"]>10]

Unnamed: 0,signature,count,files
10,"0_(5,):1|1_(9,):4|2_(3, 9):1|3_(2,):1|4_(8,):1...",33,UD_Spanish-GSD/es_gsd-ud-train_7835.graphml|UD...
36,"0_(1,):4|1_(0, 0, 0, 0, 3):1|2_(3,):1|3_(1, 2):1",14,UD_Spanish-GSD/es_gsd-ud-train_8572.graphml|UD...
61,"0_(1,):3|1_(0, 0, 0, 2):1|2_(1, 4):1|3_(4,):2|...",15,UD_Spanish-GSD/es_gsd-ud-train_5387.graphml|UD...
73,"0_(3,):1|1_(7, 8, 8):1|2_(6,):2|3_(0, 5, 6):1|...",18,UD_Spanish-GSD/es_gsd-ud-train_8051.graphml|UD...
77,"0_(2,):2|1_(2, 2, 5, 5, 7):1|2_(0, 1, 4):2|3_(...",23,UD_Spanish-GSD/es_gsd-ud-train_9433.graphml|UD...
104,"0_(1,):2|1_(0, 3):2|2_(3,):2|3_(1, 1, 2, 2):1",15,UD_Spanish-GSD/es_gsd-ud-train_5246.graphml|UD...
171,"0_(1,):2|1_(0, 0, 3):1|2_(3,):2|3_(1, 2, 2, 5)...",16,UD_Spanish-GSD/es_gsd-ud-train_1531.graphml|UD...
199,"0_(1, 1):1|1_(0,):2",13,UD_Spanish-GSD/es_gsd-ud-train_10125.graphml|U...
233,"0_(1,):2|1_(0, 0, 3):1|2_(3,):3|3_(1, 2, 2, 2):1",21,UD_Spanish-GSD/es_gsd-ud-train_1306.graphml|UD...
259,"0_(1,):3|1_(0, 0, 0, 3):1|2_(3,):2|3_(1, 2, 2):1",16,UD_Spanish-GSD/es_gsd-ud-train_13478.graphml|U...


In [14]:
df[df["signature"] == "0_(5,):1|1_(9,):4|2_(3, 9):1|3_(2,):1|4_(8,):1|5_(0, 7, 8):1|6_(7,):1|7_(5, 6):1|8_(4, 5, 9):1|9_(1, 1, 1, 1, 2, 8):1"]["files"].values[0].split("|")

['UD_Spanish-GSD/es_gsd-ud-train_7835.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_909.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_8682.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_4727.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_8188.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_2360.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_13984.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_13446.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_562.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_13089.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_10222.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_7743.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_7510.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_8105.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_7604.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_7619.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_8647.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_6767.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_12780.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_7194.graphml',
 'UD_Spanish-GSD/es_gsd-ud-train_533.