Skip to content

RPaolino/loopy

Repository files navigation

Weisfeiler and Leman Go Loopy
arXiv

Code Structure

You can use r-neighborhoods in your project by importing the following modules

├── src/
    └── data/
        └── custom_collate.py   # collate function that accounts for r-neighborhoods
    └── transforms/
        └── r_neighborhood.py   # build the r-neighborhood of each node in the graph
    └── nn/
        └── loopy.py   # class definition of loopy layers that process r-neighborhoods

and modifying them according to your needs.

r-neighborhoods

Given an input graph $G$ and a node $v$, the r-neighborhood $\mathcal{N}_r(v)$ of $v$ is defined as the collection of simple paths of length $r$ between its distinct neighborhoods.

The computation of r-neighborhoods uses the networkx.simple_cycles function, which returns simple cycles of the input graph $G$. The paths are then obtained as cyclic permutations of the simple cycles: the first node is the neighborhood center, while the others form the path.

The paths are usually computed in the preprocessing step. However, this could lead to memory overload, especially when $G$ is dense. To prevent OOM, we also provide a --lazy flag, which postpones the computation of cyclic permutation to the forward step. In this way, we don't store all paths for each graph in the dataset but compute them on the flight.

Loopy Layers

The $r$-neighborhoods are fed into a path-wise layer, which computes an embedding for each path. The embeddings are then processed together to get an embedding of the central node $v$.

Our code uses GIN layers to process paths, as it is simple but maximally expressive on paths. You can choose a different neural architecture. Note that messages on paths are transmitted via torch.nn.functional.conv3d with kernel $[1, 0, 1]$ since only consecutive nodes are linked.

To limit the number of learnable parameters, we provide a --shared flag: it guarantees that we have shared weights among paths of different lengths in each loopy layer.

To implement the pooling operations, we use segment_csr instead of scatter: the former is fully deterministic, as noted in the documentation. When indices are not unique, the behavior of scatter is non-deterministic: one of the values from src will be picked arbitrarily, and the gradient propagated to all elements with the same index, resulting in an incorrect gradient computation.


Experiments

The hyperparams used for the experiments can be retrieved from the folder

├── scripts/

You can reproduce the results by typing

bash scripts/<dataset_name>.sh

or you can specify your own configuration as

python run_model.py --dataset zinc_subset --r 5

For subgraphcount [2], you need to specify the target motiv $n$

n 0 1 2 3 4 5 6 7 8
F
hom(F, G) sub(F, G)

by typing

python run_model.py --dataset subgraphcount_2 --r 1

The first three motifs are used to test against homomorphism-counts, the latter six against subgraph-counts. The preprocessing of the dataset is done following the official repo of [3].

Similarly, you can specify the regression target of qm9 by qm9_<n> where $n$ is the column's index of the target.

For brec [1], you need to specify the name of the raw file, i.e., brec_<name> where name is one among basic, extension, regular, 4vtx (4-vertex condition), dr (distance regular), str (strongly regular), and cfi.

Moreover, exp_iso is the name given to exp when the task is to count the number of indistinguishable pairs.


References

[1] Yanbo Wang et al. "Towards better evaluation of gnn expressiveness with brec dataset." arXiv preprint arXiv:2304.07702 (2023).

[2] Lingxiao Zhao et al. "From stars to subgraphs: Uplifting any gnn with local structure awareness." In International Conference on Learning Representations, 2022.

[3] Bohang Zhang et al. "Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness." In International Conference on Learning Representations, 2024.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published