# Identifying neighbourhood orientation: rotationally symmetric structures

*Background:*
Rotationally equivariant geometric GNNs aggregate local geometric information via summing together the neighbourhood geometric features, which are either **cartesian vectors** or **higher order spherical tensors**. 
The ideal geometric GNN would injectively aggregate local geometric infromation to perfectly identify neighbourhood identities, orientations, etc.
In practice, the choice of basis (cartesian vs. spherical) comes with tradeoffs between tractability and empirical performance.

*Experiment:*
In this notebook, we study how rotational symmetries interact with tensor order in equivariant GNNs. 
We evaluate equivariant layers on their ability to distinguish the orientation of **structures with rotational symmetry**. 
An [$L$-fold symmetric structure](https://en.wikipedia.org/wiki/Rotational_symmetry) does not change when rotated by an angle $\frac{2\pi}{L}$ around a point (in 2D) or axis (3D).
We consider two *distinct* rotated versions of each $L$-fold symmetric structure and train single layer equivariant GNNs to classify the two orientations using the updated geometric features.

![Rotationally symmetric structures](fig/rotsym.png)

*Result:*
- **We find that layers using order $L$ tensors are unable to identify the orientation of structures with rotation symmetry higher than $L$-fold.** This observation may be attributed to **spherical harmonics**, which are used as the underlying orthonormal basis and are rotationally symmetric themselves.
- Layers such as E-GNN and GVP-GNN using **cartesian vectors** (corresponding to tensor order 1) are popular as working with higher order tensors can be computationally intractable for many applications. However, E-GNN and GVP-GNN are particularly poor at disciminating orientation of rotationally symmetric structures. 

In [76]:
%load_ext autoreload
%autoreload 2

import sys
sys.path.append('../')

import random
import math
import numpy as np
import torch
from torch.nn import functional as F
import torch_geometric
from torch_geometric.data import Data, Batch
from torch_geometric.loader import DataLoader
from torch_geometric.utils import is_undirected, to_undirected, remove_self_loops, to_dense_adj, dense_to_sparse
import e3nn
from e3nn import o3
from functools import partial
from typing import Optional

print("PyTorch version {}".format(torch.__version__))
print("PyG version {}".format(torch_geometric.__version__))
print("e3nn version {}".format(e3nn.__version__))

from src.utils.plot_utils import plot_2d, plot_3d
from src.utils.train_utils import run_experiment
from src.models import MPNNModel, EGNNModel, GVPGNNModel, TFNModel, SchNetModel, DimeNetPPModel, MACEModel
from cartesian_mace.models.model import CartesianMACE

# Check PyTorch has access to MPS (Metal Performance Shader, Apple's GPU architecture)
# print(f"Is MPS (Metal Performance Shader) built? {torch.backends.mps.is_built()}")
# print(f"Is MPS available? {torch.backends.mps.is_available()}")

# Set the device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
# device = torch.device("cpu")
print(f"Using device: {device}")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
PyTorch version 1.13.1
PyG version 2.0.3
e3nn version 0.5.1
Using device: cpu


In [68]:
def create_rotsym_envs(fold=3):
    dataset = []

    # Environment 0
    atoms = torch.LongTensor([ 0 ] + [ 0 ] * fold)
    edge_index = torch.LongTensor( [ [0] * fold, [i for i in range(1, fold+1)] ] )
    x = torch.Tensor([1,0,0])
    pos = [
        torch.Tensor([0,0,0]),  # origin
        x,   # first spoke 
    ]
    for count in range(1, fold):
        R = o3.matrix_z(torch.Tensor([2*math.pi/fold * count])).squeeze(0)
        pos.append(x @ R.T)
    pos = torch.stack(pos)
    y = torch.LongTensor([0])  # Label 0
    data1 = Data(atoms=atoms, edge_index=edge_index, pos=pos, y=y)
    data1.edge_index = to_undirected(data1.edge_index)
    dataset.append(data1)
    
    # Environment 1
    q = 2*math.pi/(fold + random.randint(1, fold))
    assert q < 2*math.pi/fold
    Q = o3.matrix_z(torch.Tensor([q])).squeeze(0)
    pos = pos @ Q.T
    y = torch.LongTensor([1])  # Label 1
    data2 = Data(atoms=atoms, edge_index=edge_index, pos=pos, y=y)
    data2.edge_index = to_undirected(data2.edge_index)
    dataset.append(data2)

    for data in dataset:
        data.to(device)
    
    return dataset

In [114]:
def run_rotsym(model_name: str, max_ell: int, fold: int, n_epochs: Optional[int] = 100, n_times: Optional[int] = 3) -> None:

    dataset = create_rotsym_envs(fold)
    # for
    # 2d(data, lim=1)

    # Create dataloaders
    dataloader = DataLoader(dataset, batch_size=1, shuffle=True)
    val_loader = DataLoader(dataset, batch_size=1, shuffle=False)
    test_loader = DataLoader(dataset, batch_size=1, shuffle=False)

    model = {
    "mpnn": MPNNModel,
    "schnet": SchNetModel,
    "dimenet": DimeNetPPModel,
    "egnn": EGNNModel,
    "gvp": GVPGNNModel,
    "tfn": partial(TFNModel, max_ell=max_ell, scalar_pred=False),
    "mace": partial(MACEModel, max_ell=max_ell, correlation=correlation, scalar_pred=False),
    "cmace": partial(CartesianMACE, self_tp_rank_max=max_ell, basis_rank_max=max_ell, feature_rank_max=max_ell, nu_max=2)
    }[model_name](num_layers=1, in_dim=2, out_dim=2)

    best_val_acc, test_acc, train_time = run_experiment(
        model,
        dataloader,
        val_loader,
        test_loader,
        n_epochs=n_epochs,
        n_times=n_times,
        device=device,
        verbose=False
    )

In [115]:
max_ell = 1
fold = 2
run_rotsym("cmace", max_ell, fold=fold, n_epochs=1000,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 1000/1000 [00:06<00:00, 149.33it/s]


Done! Averaged over 1 runs: 
 - Training time: 6.70s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [59]:
max_ell = 1
fold = 3
run_rotsym("cmace", max_ell, fold=fold, n_epochs=1000,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 1000/1000 [00:07<00:00, 137.51it/s]


Done! Averaged over 1 runs: 
 - Training time: 7.27s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [60]:
max_ell = 1
fold = 5
run_rotsym("cmace", max_ell, fold=fold, n_epochs=1000,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 1000/1000 [00:07<00:00, 140.61it/s]


Done! Averaged over 1 runs: 
 - Training time: 7.11s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [61]:
max_ell = 1
fold = 10
run_rotsym("cmace", max_ell, fold=fold, n_epochs=1000,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 1000/1000 [00:07<00:00, 139.21it/s]


Done! Averaged over 1 runs: 
 - Training time: 7.18s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [41]:
max_ell = 2
fold = 2
run_rotsym("cmace", max_ell, fold=fold, n_epochs=200,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 200/200 [00:03<00:00, 64.45it/s]


Done! Averaged over 1 runs: 
 - Training time: 3.10s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 






In [43]:
max_ell = 2
fold = 3
run_rotsym("cmace", max_ell, fold=fold, n_epochs=200,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 200/200 [00:03<00:00, 58.22it/s]


Done! Averaged over 1 runs: 
 - Training time: 3.44s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [44]:
max_ell = 2
fold = 5
run_rotsym("cmace", max_ell, fold=fold, n_epochs=200,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 200/200 [00:03<00:00, 63.76it/s]


Done! Averaged over 1 runs: 
 - Training time: 3.14s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [45]:
max_ell = 2
fold = 10
run_rotsym("cmace", max_ell, fold=fold, n_epochs=200,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 200/200 [00:03<00:00, 57.56it/s]


Done! Averaged over 1 runs: 
 - Training time: 3.48s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [46]:
max_ell = 3
fold = 2
run_rotsym("cmace", max_ell, fold=fold, n_epochs=200,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 200/200 [00:14<00:00, 14.20it/s]


Done! Averaged over 1 runs: 
 - Training time: 14.08s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 






In [50]:
max_ell = 3
fold = 3
run_rotsym("cmace", max_ell, fold=fold, n_epochs=300,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 300/300 [00:25<00:00, 11.64it/s]


Done! Averaged over 1 runs: 
 - Training time: 25.76s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 






In [51]:
max_ell = 3
fold = 5
run_rotsym("cmace", max_ell, fold=fold, n_epochs=300,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 300/300 [00:21<00:00, 14.05it/s]


Done! Averaged over 1 runs: 
 - Training time: 21.36s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [52]:
max_ell = 3
fold = 10
run_rotsym("cmace", max_ell, fold=fold, n_epochs=300,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 300/300 [00:26<00:00, 11.27it/s]


Done! Averaged over 1 runs: 
 - Training time: 26.61s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [65]:
max_ell = 5
fold = 2
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 150/150 [20:46<00:00,  8.31s/it]


Done! Averaged over 1 runs: 
 - Training time: 1246.29s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 






In [66]:
max_ell = 5
fold = 3
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 150/150 [25:47<00:00, 10.32s/it]


Done! Averaged over 1 runs: 
 - Training time: 1547.40s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 






In [71]:
max_ell = 5
fold = 5
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 150/150 [24:19<00:00,  9.73s/it]


Done! Averaged over 1 runs: 
 - Training time: 1459.06s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 






In [72]:
max_ell = 5
fold = 10
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 150/150 [28:40<00:00, 11.47s/it]


Done! Averaged over 1 runs: 
 - Training time: 1720.04s ± 0.00. 
 - Best validation accuracy: 50.000 ± 0.000. 
- Test accuracy: 50.0 ± 0.0. 






In [None]:
max_ell = 10
fold = 3
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

In [None]:
max_ell = 10
fold = 5
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

In [86]:
max_ell = 10
fold = 2
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 150/150 [01:32<00:00,  1.61it/s]


Done! Averaged over 1 runs: 
 - Training time: 92.95s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 






In [103]:
max_ell = 10
fold = 3
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 150/150 [10:02<00:00,  4.02s/it]


Done! Averaged over 1 runs: 
 - Training time: 602.54s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 






In [107]:
max_ell = 10
fold = 5
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 150/150 [44:23<00:00, 17.75s/it]


Done! Averaged over 1 runs: 
 - Training time: 2663.17s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 






In [112]:
max_ell = 10
fold = 10
run_rotsym("cmace", max_ell, fold=fold, n_epochs=150,n_times=1)

Running experiment for CartesianMACE (cpu).


100%|██████████| 150/150 [1:39:04<00:00, 39.63s/it]


Done! Averaged over 1 runs: 
 - Training time: 5944.13s ± 0.00. 
 - Best validation accuracy: 100.000 ± 0.000. 
- Test accuracy: 100.0 ± 0.0. 




