# Using a group convolutional neural network to learn the ground-state of a symmetric spin model

The goal of this tutorial is to learn about group convolutional neural networks (G-CNNs), a useful tool for simulating lattices with high symmetry. The G-CNN is a generalization to the convolutional neural network (CNN) to non-abelian symmetry groups. G-CNNs are a natural fit for lattices that have both point group and translational symmetries as rotations/reflections don't commute with translations.

In this tutorial we will consider the antiferromagnetic Heisenberg model on honeycomb lattice,

$$ H = \sum_{i,j \in <>} \vec{\sigma}_{i} \cdot \vec{\sigma}_{j},$$

where $\vec{\sigma}_{i}$ are Pauli matrices and $<>$ denotes nearest neighbor interactons. A fully symmetric Honeycomb lattice can be defined for $2n^2$ sites and periodic boundary conditions, where $n$ is an integer. For this tutorial, we will use a GPU and choose $n=4$, but you can follow along on a CPU with n=2 if you'd like. If you don't have access to a GPU, you can open a [Google Colab](https://colab.research.google.com/) notebook, and set runtime type to GPU.  

This tutorial wil be split into two parts. First I'll provide a brief introduction to G-CNNs and describe what advantages they bring. Second, we'll use NetKet to find the ground state of the antiferromagnetic Heisenberg model on the honeycomb lattice.

# G-CNNs are generalizations of CNNs to non-abelian groups

The convolutional neural network (CNN) has revolutionized the field of computer vision. The CNN enforces translational invariance, which means that feeding a CNN translated copies of an image will produce the exact same output. This is important for recognizing objects, which may located differently in different images.

The  hidden layers of a CNN contain a group of ${\bf features}$, corresponding to translations of the image, where each feature is represented by a vector. At each layer, the CNN integrates over these features to produce a different set of features over the translation group:

$$ C^i_{x,y} = \sum_h {\bf W}_{x'-x, y'-y} \cdot {\bf f}_{x,y} $$

As you can see, the index of the filter W is based on the displacement between the input feature {x',y'} and the output feature {x, y}. This is known as an equivariant operation, as displacements in the input are propagated as displacements in the output (equivariance is actually bit more general, we'll get to that in a moment). In the last layer, the CNN averages over these different features, forcing the output to be invariant to the input. 

To generalize the CNN to the G-CNN, lets abstract away from the specifics of the convolution. Instead of indexing the features with translations, we will use elements from a general symmetry group which may contain non-commmuting operations. In this case we must define a particular order of operations. For example, we could define an operation in the $p6m$ space group, as a translation, followed by a rotation and a reflection about the origin. Non-abelian groups still maintain associativity and a closed algebra. This is easy to see with lattice symmetry groups. If two successive symmetry operations leave the lattice unchanged, applying both must also leave the lattice unchanged and therefore be symmetry operation in the group. 

For G-convolutions, the building blocks of the G-CNN, this algebra is all we need. The G-convolution also indexes the filters by looking at the "difference" between group elements, however this time there is an orientation to it. The G-convolution is defined as follows: 

$$ C^i_g = \sum_h {\bf W}_{g^{-1} h} \cdot {\bf f}_h $$

The filters are indexed by $g^{-1} h$, which describes the mapping from $g \rightarrow h$ but not vice-versa. This causes the output to be an ${\bf involution}$ of the input, meaning that the group elements are mapped to their respective inverses.

G-convolutions are the most expressive linear transformation over a particular symmetry group. Therefore, if you want to define a linear-based model with a particular symmetry, G-CNNs maximize the number of parameters you can fit into a given memory profile. G-CNNs can be mapped down to other symmetry-averaged multi-layer linear models by masking filters (setting them to zero). On the Honeycomb lattice, the G-CNN (approximately) has a factor of 12 more parameters than a CNN averaged over $d_6$ and a factor of $12 N$ more parameters than a FFNN averaged over $p6m$ (where N is the number of sites) under an identical memory constraint. 

If you'd like to learn more about G-CNNs, check out the [original paper](http://proceedings.mlr.press/v48/cohenc16.pdf) by Cohen ${\it et \ al.}$ or [this paper](https://arxiv.org/pdf/2104.05085.pdf) by Roth ${\it et \ al.}$ that applies G-CNNs to quantum many-body systems.

Now that the boring stuff is over with, lets start our simulations. First let's import NetKet and other neccesities 

In [1]:
# Run if you're on Colab or you don't have NetKet
# pip install 'git+https://github.com/netket/netket.git#egg=netket[all]'

import netket as nk

# Import Json, this will be needed to examine log files
import json

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt



# Defining the Hamiltonian

We begin by defining the Hamiltonian as a list of lattice points. NetKet will automatically convert these points into a graph with nearest neighbor connections. The honeycomb lattice is a triangular lattice with two sites per unit cell.

In [2]:
#Basis Vectors that define the positioning of the unit cell
basis_vectors = [[0,1],[np.sqrt(3)/2,-1/2]]

#Locations of atoms within the unit cell
atom_positions = [[0,0],[np.sqrt(3)/6,1/2]]

#Number of unit cells in each direction
dimensions = [4,4]

#Define the graph 
graph = nk.graph.Lattice(basis_vectors=basis_vectors, 
                         atoms_coord = atom_positions, 
                         extent = dimensions
                        )

Lets check to see if our graph looks as expected. Since we have two sites per unit cell, we should have $4 \times 4 \times 2 = 32$ sites. The coordination number of a hexagonal lattice is 3, so we should have $\frac{32 \times 3}{2} = 48$ edges. Finally we have p6m symmetry, which should give ue $4 \times 4 \times 12 = 192$ symmetry operations.

In [3]:
#Use Netket to find symmetries of the graph
symmetries = graph.automorphisms()

#Check that graph info is correct
print(graph.n_nodes)
print(graph.n_edges)
print(len(symmetries))

32
48
192


Everything looks good! Now we use this graph to define our Hilbert space and Hamiltonian:

In [4]:
# Define the Hilbert space
hi = nk.hilbert.Spin(s=1 / 2, N=graph.n_nodes, total_sz = 0)

#Define the Hamiltonian
ha = nk.operator.Heisenberg(hilbert=hi, graph=graph, sign_rule=True)

Since the Hexagonal lattice is bipartite, we know the phases obey a Marhall-Perels sign rule. Therefore, we can use a real valued NN and just learn the amplitudes of the wavefunction. Furthermore, we know the ground state has total spin 0.

# Defining the GCNN

We can define a GCNN with an arbitrary number of layers and specify the feature dimension of each layer accordingly:

In [5]:
#Feature dimensions of hidden layers, from first to last
feature_dims = [8,8,8,8]

#Number of layers
num_layers = 4

#Define the GCNN 
ma = nk.models.GCNN(symmetries = symmetries, layers = num_layers, features = feature_dims)

This a G-CNN with four layers, where each hidden layer contains a feature vector of length 8 for each element in p6m. This means that each hidden state has $8 \times 192 = 768$ nodes. That's a huge model. But since we're not symmetry-averaging, we only need to compute one wavefunction for each ${\bf \sigma}. By default GCNN uses a SELU non-linearity, which moves the activations in the direction of zero mean and unit variance.  

# Variational Monte Carlo

In order to perform VMC we need to define a sampler and an optimizer. We sample using Metropolis-Hastings, where our proposals flip two spins. This keeps the total spin at zero. We optimize uzing stochaistic reconfiguration, which uses curvature information to find the best direction of descent.

In [6]:
#Metropois-Hastings with two spins flipped that are at most second nearest neighbors 
sa = nk.sampler.MetropolisExchange(hilbert = hi, graph=graph, d_max=2)

#Stochaistic reconfiguration   
op = nk.optimizer.Sgd(learning_rate=1e-2)
sr = nk.optimizer.SR(diag_shift=0.01)

#Define a variational state so we can keep the parameters if we like
vstate = nk.variational.MCState(sampler=sa, model=ma, n_samples=100)

#Define a driver that performs VMC
gs = nk.driver.VMC(ha, op, sr=sr, variational_state=vstate)

Lets start by running for 100 minibatches. If you're on a CPU this will take ~40 seconds per iteration, on a GPU it will take 1 second per iteration. GPUs are fast! 

In [None]:
#Run the optimization
gs.run(n_iter=100, out=None)

This should get us within a percent of the ground state energy. Two minutes ago I knew nothing about the hexagonal lattice. Now I have access to an accurate ground state wavefunction! For fine tuning we can run with a larger batch size. We can access the batch size via the variational state  

In [None]:
#Change batch size
vstate.n_samples = 1000

#Driver uses new batch size
gs = nk.driver.VMC(ha, op, sr=sr, variational_state=vstate)

#Run for 100 more iterations
gs.run(n_iter = 100,out = None)

This takes about 16 seconds per iteration on the GPU. You will notice that the variance continues to get smaller, giving evidence that we are approaching an eigenstate. After 100 epochs you should have a $E \approx -70.6$ and $\sigma^2 \approx 1$.   

# Checking with ED

These energies seem correct, but it's always good to do a sanity check. The Hilbert space for a $4 \times 4$ lattice is a bit to large to do ED in a reasonable time, so lets do a slightly smaller lattice. The energy/site should be slighty more negative for the small lattice

In [13]:
#Dimensions for smaller graph
small_dimensions = [3,3]

#Define smaller graph 
small_graph = nk.graph.Lattice(basis_vectors=basis_vectors, 
                         atoms_coord = atom_positions, 
                         extent = small_dimensions
                        )

# redefine Hilbert space and Hamiltonian
small_hi = nk.hilbert.Spin(s=1 / 2, N=small_graph.n_nodes, total_sz = 0)

#Define the Hamiltonian
small_ha = nk.operator.Heisenberg(hilbert=small_hi, graph=small_graph)

#Exact Diagonalization
E_gs = nk.exact.lanczos_ed(small_ha, compute_eigenvectors=False)

Lets compare the energies

In [14]:
print(E_gs[0]/18)
print(-70.6/32)

-2.2438146303344157
-2.20625
