In [None]:
!pip install pennylane

# Symmetry-invariant quantum machine learning force fields

Symmetries are ubiquitous in physics. From condensed matter to particle
physics, they have helped us make connections and formulate new
theories. In the context of machine learning, inductive bias has proven
to be successful in the presence of symmetries. This framework, known as
geometric deep learning, often enjoys better generalization and
trainability. In this demo, we will learn how to use geometric quantum
machine learning to drive molecular dynamics as introduced in recent
research. We will take as an example a triatomic molecule of $H_2O.$

## Introduction


First, let's introduce the overall playground of this work: **molecular
dynamics (MD)**. MD is an essential computational simulation method to
analyze the dynamics of atoms or molecules in a chemical system. The
simulations can be used to obtain macroscopic thermodynamic properties
of ergodic systems. Within the simulation, Newton\'s equations of motion
are numerically integrated. Therefore, it is crucial to have access to
the forces acting on the constituents of the system or, equivalently,
the potential energy surface (PES), from which we can obtain the atomic
forces. Previous research by presented variational quantum learning
models (VQLMs) that were able to learn the potential energy and atomic
forces of a selection of molecules from *ab initio* reference data.

The description of molecules can be greatly simplified by considering
inherent **symmetries**. For example, actions such as translation,
rotation, or the interchange of identical atoms or molecules leave the
system unchanged. To achieve better performance, it is thus desirable to
include this information in our model. To do so, the data input can
simply be made invariant itself, e.g., by making use of so-called
symmetry functions--hence yielding invariant energy predictions.

In this demo, we instead take the high road and design an intrinsically
symmetry-aware model based on equivariant quantum neural networks.
Equivariant machine learning models have demonstrated many advantages
such as being more robust to noisy data and enjoying better
generalization capabilities. Moreover, this has the additional advantage
of relaxing the need for data preprocessing, as the raw Cartesian
coordinates can be given directly as inputs to the learning model.

An overview of the workflow is shown in the figure below. First, the
relevant symmetries are identified and used to build the quantum machine
model. We then train it on the PES of some molecule, e.g. $H_2O,$ and
finally obtain the forces by computing the gradient of the learned PES.

![](Hands_on_8_images/overview.png)




In order to incorporate symmetries into machine learning models, we need
a few concepts from group theory. A formal course on the subject is out
of the scope of the present document, which is why we have the next sections on
equivariant graph
embedding 
and geometric quantum machine
learning . 


# Introduction to Geometric Quantum Machine Learning



# Introduction


Symmetries are at the heart of physics. Indeed in condensed matter and
particle physics we often define a thing simply by the symmetries it
adheres to. What does symmetry mean for those in machine learning? In
this context the ambition is straightforward --- it is a means to reduce
the parameter space and improve the trained model\'s ability to
sucessfully label unseen data, i.e., its ability to generalise.

Suppose we have a learning task and the data we are learning from has an
underlying symmetry. For example, consider a game of Noughts and Crosses
(aka Tic-tac-toe): if we win a game, we would have won it if the board
was rotated or flipped along any of the lines of symmetry. Now if we
want to train an algorithm to spot the outcome of these games, we can
either ignore the existence of this symmetry or we can somehow include
it. The advantage of paying attention to the symmetry is it identifies
multiple configurations of the board as \'the same thing\' as far as the
symmetry is concerned. This means we can reduce our parameter space, and
so the amount of data our algorithm must sift through is immediately
reduced. Along the way, the fact that our learning model must encode a
symmetry that actually exists in the system we are trying to represent
naturally encourages our results to be more generalisable. The encoding
of symmetries into our learning models is where the term *equivariance*
will appear. We will see that demanding that certain symmetries are
included in our models means that the mappings that make up our
algorithms must be such that we could transform our input data with
respect to a certain symmetry, then apply our mappings, and this would
be the same as applying the mappings and then transforming the output
data with the same symmetry. This is the technical property that gives
us the name \"equavariant learning\".

In classical machine learning, this area is often referred to as
geometric deep learning (GDL) due to the traditional association of
symmetry to the world of geometry, and the fact that these
considerations usually focus on deep neural networks (see or for a broad
introduction). We will refer to the quantum computing version of this as
*quantum geometric machine learning* (QGML).

# Representation theory in circuits


The first thing to discuss is how do we work with symmetries in the
first place? The answer lies in the world of group representation
theory.

First, let\'s define what we mean by a group:

**Definition**: A group is a set $G$ together with a binary operation on
$G$, here denoted $\circ,$ that combines any two elements $a$ and $b$ to
form an element of $G,$ denoted $a \circ b,$ such that the following
three requirements, known as group axioms, are satisfied as follows:

1.  **Associativity**: For all $a, b, c$ in $G,$ one has
    $(a \circ b) \circ c=a \circ (b \circ c).$

2.  

    **Identity element**: There exists an element $e$ in $G$ such that, for every $a$ in $G,$ one

    :   has $e \circ a=a$ and $a \circ e=a.$ Such an element is unique.
        It is called the identity element of the group.

3.  

    **Inverse element**: For each $a$ in $G,$ there exists an element $b$ in $G$

    :   such that $a \circ b=e$ and $b \circ a=e,$ where $e$ is the
        identity element. For each $a,$ the element $b$ is unique: it is
        called the inverse of $a$ and is commonly denoted $a^{-1}.$

With groups defined, we are in a position to articulate what a
representation is: Let $\varphi$ be a map sending $g$ in group $G$ to a
linear map $\varphi(g): V \rightarrow V,$ for some vector space $V,$
which satisfies

$$\varphi\left(g_{1} g_{2}\right)=\varphi\left(g_{1}\right) \circ \varphi\left(g_{2}\right) \quad \text { for all } g_{1}, g_{2} \in G.$$

The idea here is that just as elements in a group act on each other to
reach further elements, i.e., $g\circ h = k,$ a representation sends us
to a mapping acting on a vector space such that
$\varphi(g)\circ \varphi(h) = \varphi(k).$ In this way we are
representing the structure of the group as a linear map. For a
representation, our mapping must send us to the general linear group
$GL(n)$ (the space of invertible $n \times n$ matrices with matrix
multiplication as the group multiplication). Note how this is both a
group, and by virtue of being a collection of invertible matrices, also
a set of linear maps (they\'re all invertble matrices that can act on
row vectors). Fundamentally, representation theory is based on the
prosaic observation that linear algebra is easy and group theory is
abstract. So what if we can study groups via linear maps?

Now due to the importance of unitarity in quantum mechnics, we are
particularly interested in the unitary representations: representations
where the linear maps are unitary matrices. If we can identify these
then we will have a way to naturally encode groups in quantum circuits
(which are mostly made up of unitary gates).

![](Hands_on_8_images/sphere_equivariant.png)

How does all this relate to symmetries? Well, a large class of
symmetries can be characterised as a group, where all the elements of
the group leave some space we are considering unchanged. Let\'s consider
an example: the symmetries of a sphere. Now when we think of this
symmetry we probably think something along the lines of \"it\'s the same
no matter how we rotate it, or flip it left to right, etc\". There is
this idea of being invariant under some operation. We also have the idea
of being able to undo these actions: if we rotate one way, we can rotate
it back. If we flip the sphere right-to-left we can flip it
left-to-right to get back to where we started (notice too all these
inverses are unique). Trivially we can also do nothing. What exactly are
we describing here? We have elements that correspond to an action on a
sphere that can be inverted and for which there exists an identity. It
is also trivially the case here that if we consider three operations a,
b, c from the set of rotations and reflections of the sphere, that if we
combine two of them together then
$a\circ (b \circ c) = (a\circ b) \circ c.$ The operations are
associative. These features turn out to literally define a group!

As we\'ve seen the group in itself is a very abstract creature; this is
why we look to its representations. The group labels what symmetries we
care about, they tell us the mappings that our system is invariant
under, and the unitary representations show us how those symmetries look
on a particular space of unitary matrices. If we want to encode the
structure of the symmeteries in a quantum circuit we must restrict our
gates to being unitary representations of the group.

There remains one question: *what is equivariance?* With our newfound
knowledge of group representation theory we are ready to tackle this.
Let $G$ be our group, and $V$ and $W,$ with elements $v$ and $w$
respectively, be vector spaces over some field $F$ with a map $f$
between them. Suppose we have representations
$\varphi: G \rightarrow GL(V)$ and $\psi: G \rightarrow GL(W).$
Furthermore, let\'s write $\varphi_g$ for the representation of $g$ as a
linear map on $V$ and $\psi_g$ as the same group element represented as
a linear map on $W$ respectively. We call $f$ *equivariant* if

$$f(\varphi_g(v))=\psi_g(f(v)) \quad \text { for all } g\in G.$$

The importance of such a map in machine learning is that if, for
example, our neural network layers are equivariant maps then two inputs
that are related by some intrinsic symmetry (maybe they are reflections)
preserve this information in the outputs.

Consider the following figure for example. What we see is a board with a
cross in a certain square on the left and some numerical encoding of
this on the right, where the 1 is where the X is in the number grid. We
present an equivariant mapping between these two spaces with respect to
a group action that is a rotation or a swap (here a $\pi$ rotation). We
can either apply a group action to the original grid and then map to the
number grid, or we could map to the number grid and then apply the group
action. Equivariance demands that the result of either of these
procedures should be the same.

![](Hands_on_8_images/equivariant-example.jpg)

Given the vast amount of input data required to train a neural network
the principle that one can pre-encode known symmetry structures into the
network allows us to learn better and faster. Indeed it is the reason
for the success of convolutional neural networks (CNNs) for image
analysis, where it is known they are equivariant with respect to
translations. They naturally encode the idea that a picture of a dog is
symmetrically related to the same picture slid to the left by n pixels,
and they do this by having neural network layers that are equivariant
maps. With our focus on unitary representations (and so quantum
circuits) we are looking to extend this idea to quantum machine
learning.


## Noughts and Crosses


Let\'s look at the game of noughts and crosses, as inspired by. Two
players take turns to place a O or an X, depending on which player they
are, in a 3x3 grid. The aim is to get three of your symbols in a row,
column, or diagonal. As this is not always possible depending on the
choices of the players, there could be a draw. Our learning task is to
take a set of completed games labelled with their outcomes and teach the
algorithm to identify these correctly.


This board of nine elements has the symmetry of the square, also known
as the *dihedral group*. This means it is symmetric under
$\frac{\pi}{2}$ rotations and flips about the lines of symmetry of a
square (vertical, horizontal, and both diagonals).


![](Hands_on_8_images/NandC_sym.png)


**The question is, how do we encode this in our QML problem?**

First, let us encode this problem classically. We will consider a
nine-element vector $v,$ each element of which identifies a square of
the board. The entries themselves can be $+1$,$0,$$-1,$ representing a
nought, no symbol, or a cross. The label is one-hot encoded in a vector
$y=(y_O,y_- , y_X)$ with $+1$ in the correct label and $-1$ in the
others. For instance (-1,-1,1) would represent an X in the relevant
position.


To create the quantum model let us take nine qubits and let them
represent squares of our board. We\'ll initialise them all as
$|0\rangle,$ which we note leaves the board invariant under the
symmetries of the problem (flip and rotate all you want, it\'s still
going to be zeroes whatever your mapping). We will then look to apply
single qubit $R_x(\theta)$ rotations on individual qubits, encoding each
of the possibilities in the board squares at an angle of
$\frac{2\pi}{3}$ from each other. For our parameterised gates we will
have a single-qubit $R_x(\theta_1)$ and $R_y(\theta_2)$ rotation at each
point. We will then use $CR_y(\theta_3)$ for two-qubit entangling gates.
This implies that, for each encoding, crudely, we\'ll need 18
single-qubit rotation parameters and $\binom{9}{2}=36$ two-qubit gate
rotations. Let\'s see how, by using symmetries, we can reduce this.


![..](Hands_on_8_images/grid.jpg)

The indexing of our game board.


The secret will be to encode the symmetries into the gate set so the
observables we are interested in inherently respect the symmetries. How
do we do this? We need to select the collections of gates that commute
with the symmetries. In general, we can use the twirling formula for
this:

Tip:

Let $\mathcal{S}$ be the group that encodes our symmetries and $U$ be a
unitary representation of $\mathcal{S}.$ Then,

$$\mathcal{T}_{U}[X]=\frac{1}{|\mathcal{S}|} \sum_{s \in \mathcal{S}} U(s) X U(s)^{\dagger}$$

defines a projector onto the set of operators commuting with all
elements of the representation, i.e.,
$\left[\mathcal{T}_{U}[X], U(s)\right]=$ 0 for all $X$ and
$s \in \mathcal{S}.$

The twirling process applied to an arbitrary unitary will give us a new
unitary that commutes with the group as we require. We remember that
unitary gates typically have the form $W = \exp(-i\theta H),$ where $H$
is a Hermitian matrix called a *generator*, and $\theta$ may be fixed or
left as a free parameter. A recipe for creating a unitary that commutes
with our symmetries is to *twirl the generator of the gate*, i.e., we
move from the gate $W = \exp(-i\theta H)$ to the gate
$W' = \exp(-i\theta\mathcal{T}_U[H]).$ When each term in the twirling
formula acts on different qubits, then this unitary would further
simplify to

$$W' = \bigotimes_{s\in\mathcal{S}}U(s)\exp(-i\tfrac{\theta}{\vert\mathcal{S}\vert})U(s)^\dagger.$$

For simplicity, we can absorb the normalization factor
$\vert\mathcal{S}\vert$ into the free parameter $\theta.$

So let\'s look again at our choice of gates: single-qubit $R_x(\theta)$
and $R_y(\theta)$ rotations, and entangling two-qubit $CR_y(\phi)$
gates. What will we get by twirling these?


In this particular instance we can see the action of the twirling
operation geometrically as the symmetries involved are all permutations.
Let\'s consider the $R_x$ rotation acting on one qubit. Now if this
qubit is in the centre location on the grid, then we can flip around any
symmetry axis we like, and this operation leaves the qubit invariant, so
we\'ve identified one equivariant gate immediately. If the qubit is on
the corners, then the flipping will send this qubit rotation to each of
the other corners. Similarly, if a qubit is on the central edge then the
rotation gate will be sent round the other edges. So we can see that the
twirling operation is a sum over all the possible outcomes of performing
the symmetry action (the sum over the symmetry group actions). Having
done this we can see that for a single-qubit rotation the invariant maps
are rotations on the central qubit, at all the corners, and at all the
central edges (when their rotation angles are fixed to be the same).

As an example consider the following figure, where we take a $R_x$ gate
in the corner and then apply all the symmetries of a square. The result
of this twirling leads us to have the same gate at all the corners.


![](Hands_on_8_images/twirl.jpeg)


For entangling gates the situation is similar. There are three invariant
classes, the centre entangled with all corners, with all edges, and the
edges paired in a ring.


The prediction of a label is obtained via a one-hot-encoding by
measuring the expectation values of three invariant observables:


$$O_{-}=Z_{\text {middle }}=Z_{4}$$

$$O_{\circ}=\frac{1}{4} \sum_{i \in \text { corners }} Z_{i}=\frac{1}{4}\left[Z_{0}+Z_{2}+Z_{6}+Z_{8}\right]$$

$$O_{\times}=\frac{1}{4} \sum_{i \in \text { edges }} Z_{i}=\frac{1}{4}\left[Z_{1}+Z_{3}+Z_{5}+Z_{7}\right]$$

$$\hat{\boldsymbol{y}}=\left(\left\langle O_{\circ}\right\rangle,\left\langle O_{-}\right\rangle,\left\langle O_{\times}\right\rangle\right)$$


This is the quantum encoding of the symmetries into a learning problem.
A prediction for a given data point will be obtained by selecting the
class for which the observed expectation value is the largest.


Now that we have a specific encoding and have decided on our observables
we need to choose a suitable cost function to optimise. We will use an
$l_2$ loss function acting on pairs of games and labels $D={(g,y)},$
where $D$ is our dataset.


Let\'s now implement this!

First let\'s generate some games. Here we are creating a small program
that will play Noughts and Crosses against itself in a random fashion.
On completion, it spits out the winner and the winning board, with
noughts as +1, draw as 0, and crosses as -1. There are 26,830 different
possible games but we will only sample a few hundred.


In [None]:
import torch
import random

# Fix seeds for reproducability
torch.backends.cudnn.deterministic = True
torch.manual_seed(16)
random.seed(16)


#  create an empty board
def create_board():
    return torch.tensor([[0, 0, 0], [0, 0, 0], [0, 0, 0]])


# Check for empty places on board
def possibilities(board):
    l = []
    for i in range(len(board)):
        for j in range(3):
            if board[i, j] == 0:
                l.append((i, j))
    return l


# Select a random place for the player
def random_place(board, player):
    selection = possibilities(board)
    current_loc = random.choice(selection)
    board[current_loc] = player
    return board


# Check if there is a winner by having 3 in a row
def row_win(board, player):
    for x in range(3):
        lista = []
        win = True

        for y in range(3):
            lista.append(board[x, y])

            if board[x, y] != player:
                win = False

        if win:
            break

    return win


# Check if there is a winner by having 3 in a column
def col_win(board, player):
    for x in range(3):
        win = True

        for y in range(3):
            if board[y, x] != player:
                win = False

        if win:
            break

    return win


# Check if there is a winner by having 3 along a diagonal
def diag_win(board, player):
    win1 = True
    win2 = True
    for x, y in [(0, 0), (1, 1), (2, 2)]:
        if board[x, y] != player:
            win1 = False

    for x, y in [(0, 2), (1, 1), (2, 0)]:
        if board[x, y] != player:
            win2 = False

    return win1 or win2


# Check if the win conditions have been met or if a draw has occurred
def evaluate_game(board):
    winner = None
    for player in [1, -1]:
        if row_win(board, player) or col_win(board, player) or diag_win(board, player):
            winner = player

    if torch.all(board != 0) and winner == None:
        winner = 0

    return winner


# Main function to start the game
def play_game():
    board, winner, counter = create_board(), None, 1
    while winner == None:
        for player in [1, -1]:
            board = random_place(board, player)
            counter += 1
            winner = evaluate_game(board)
            if winner != None:
                break

    return [board.flatten(), winner]


def create_dataset(size_for_each_winner):
    game_d = {-1: [], 0: [], 1: []}

    while min([len(v) for k, v in game_d.items()]) < size_for_each_winner:
        board, winner = play_game()
        if len(game_d[winner]) < size_for_each_winner:
            game_d[winner].append(board)

    res = []
    for winner, boards in game_d.items():
        res += [(board, winner) for board in boards]

    return res


NUM_TRAINING = 450
NUM_VALIDATION = 600

# Create datasets but with even numbers of each outcome
with torch.no_grad():
    dataset = create_dataset(NUM_TRAINING // 3)
    dataset_val = create_dataset(NUM_VALIDATION // 3)

Now let\'s create the relevant circuit expectation values that respect
the symmetry classes we defined over the single-site and two-site
measurements.


In [None]:
import pennylane as qml
import matplotlib.pyplot as plt

# Set up a nine-qubit system
dev = qml.device("default.qubit", wires=9)

ob_center = qml.PauliZ(4)
ob_corner = (qml.PauliZ(0) + qml.PauliZ(2) + qml.PauliZ(6) + qml.PauliZ(8)) * (1 / 4)
ob_edge = (qml.PauliZ(1) + qml.PauliZ(3) + qml.PauliZ(5) + qml.PauliZ(7)) * (1 / 4)


# Now let's encode the data in the following qubit models, first with symmetry
@qml.qnode(dev)
def circuit(x, p):

    qml.RX(x[0], wires=0)
    qml.RX(x[1], wires=1)
    qml.RX(x[2], wires=2)
    qml.RX(x[3], wires=3)
    qml.RX(x[4], wires=4)
    qml.RX(x[5], wires=5)
    qml.RX(x[6], wires=6)
    qml.RX(x[7], wires=7)
    qml.RX(x[8], wires=8)

    # Centre single-qubit rotation
    qml.RX(p[0], wires=4)
    qml.RY(p[1], wires=4)

    # Corner single-qubit rotation
    qml.RX(p[2], wires=0)
    qml.RX(p[2], wires=2)
    qml.RX(p[2], wires=6)
    qml.RX(p[2], wires=8)

    qml.RY(p[3], wires=0)
    qml.RY(p[3], wires=2)
    qml.RY(p[3], wires=6)
    qml.RY(p[3], wires=8)

    # Edge single-qubit rotation
    qml.RX(p[4], wires=1)
    qml.RX(p[4], wires=3)
    qml.RX(p[4], wires=5)
    qml.RX(p[4], wires=7)

    qml.RY(p[5], wires=1)
    qml.RY(p[5], wires=3)
    qml.RY(p[5], wires=5)
    qml.RY(p[5], wires=7)

    # Entagling two-qubit gates
    # circling the edge of the board
    qml.CRY(p[6], wires=[0, 1])
    qml.CRY(p[6], wires=[2, 1])
    qml.CRY(p[6], wires=[2, 5])
    qml.CRY(p[6], wires=[8, 5])
    qml.CRY(p[6], wires=[8, 7])
    qml.CRY(p[6], wires=[6, 7])
    qml.CRY(p[6], wires=[6, 3])
    qml.CRY(p[6], wires=[0, 3])

    # To the corners from the centre
    qml.CRY(p[7], wires=[4, 0])
    qml.CRY(p[7], wires=[4, 2])
    qml.CRY(p[7], wires=[4, 6])
    qml.CRY(p[7], wires=[4, 8])

    # To the centre from the edges
    qml.CRY(p[8], wires=[1, 4])
    qml.CRY(p[8], wires=[3, 4])
    qml.CRY(p[8], wires=[5, 4])
    qml.CRY(p[8], wires=[7, 4])

    return [qml.expval(ob_center), qml.expval(ob_corner), qml.expval(ob_edge)]


fig, ax = qml.draw_mpl(circuit)([0] * 9, 18 * [0])

Let\'s also look at the same series of gates but this time they are
applied independently from one another, so we won\'t be preserving the
symmetries with our gate operations. Practically this also means more
parameters, as previously groups of gates were updated together.


In [None]:
@qml.qnode(dev)
def circuit_no_sym(x, p):

    qml.RX(x[0], wires=0)
    qml.RX(x[1], wires=1)
    qml.RX(x[2], wires=2)
    qml.RX(x[3], wires=3)
    qml.RX(x[4], wires=4)
    qml.RX(x[5], wires=5)
    qml.RX(x[6], wires=6)
    qml.RX(x[7], wires=7)
    qml.RX(x[8], wires=8)

    # Centre single-qubit rotation
    qml.RX(p[0], wires=4)
    qml.RY(p[1], wires=4)

    # Note in this circuit the parameters aren't all the same.
    # Previously they were identical to ensure they were applied
    # as one combined gate. The fact they can all vary independently
    # here means we aren't respecting the symmetry.

    # Corner single-qubit rotation
    qml.RX(p[2], wires=0)
    qml.RX(p[3], wires=2)
    qml.RX(p[4], wires=6)
    qml.RX(p[5], wires=8)

    qml.RY(p[6], wires=0)
    qml.RY(p[7], wires=2)
    qml.RY(p[8], wires=6)
    qml.RY(p[9], wires=8)

    # Edge single-qubit rotation
    qml.RX(p[10], wires=1)
    qml.RX(p[11], wires=3)
    qml.RX(p[12], wires=5)
    qml.RX(p[13], wires=7)

    qml.RY(p[14], wires=1)
    qml.RY(p[15], wires=3)
    qml.RY(p[16], wires=5)
    qml.RY(p[17], wires=7)

    # Entagling two-qubit gates
    # circling the edge of the board
    qml.CRY(p[18], wires=[0, 1])
    qml.CRY(p[19], wires=[2, 1])
    qml.CRY(p[20], wires=[2, 5])
    qml.CRY(p[21], wires=[8, 5])
    qml.CRY(p[22], wires=[8, 7])
    qml.CRY(p[23], wires=[6, 7])
    qml.CRY(p[24], wires=[6, 3])
    qml.CRY(p[25], wires=[0, 3])

    # To the corners from the centre
    qml.CRY(p[26], wires=[4, 0])
    qml.CRY(p[27], wires=[4, 2])
    qml.CRY(p[28], wires=[4, 6])
    qml.CRY(p[29], wires=[4, 8])

    # To the centre from the edges
    qml.CRY(p[30], wires=[1, 4])
    qml.CRY(p[31], wires=[3, 4])
    qml.CRY(p[32], wires=[5, 4])
    qml.CRY(p[33], wires=[7, 4])

    return [qml.expval(ob_center), qml.expval(ob_corner), qml.expval(ob_edge)]


fig, ax = qml.draw_mpl(circuit_no_sym)([0] * 9, [0] * 34)

Note again how, though these circuits have a similar form to before,
they are parameterised differently. We need to feed the vector
$\boldsymbol{y}$ made up of the expectation value of these three
operators into the loss function and use this to update our parameters.


In [None]:
import math


def encode_game(game):
    board, res = game
    x = board * (2 * math.pi) / 3
    if res == 1:
        y = [-1, -1, 1]
    elif res == -1:
        y = [1, -1, -1]
    else:
        y = [-1, 1, -1]
    return x, y

Recall that the loss function we\'re interested in is
$\mathcal{L}(\mathcal{D})=\frac{1}{|\mathcal{D}|} \sum_{(\boldsymbol{g}, \boldsymbol{y}) \in \mathcal{D}}\|\hat{\boldsymbol{y}}(\boldsymbol{g})-\boldsymbol{y}\|_{2}^{2}.$
We need to define this and then we can begin our optimisation.


In [None]:
# calculate the mean square error for this classification problem
def cost_function(params, input, target):
    output = torch.stack([torch.hstack(circuit(x, params)) for x in input])
    vec = output - target
    sum_sqr = torch.sum(vec * vec, dim=1)
    return torch.mean(sum_sqr)

Let\'s now train our symmetry-preserving circuit on the data.


In [None]:
from torch import optim
import numpy as np

params = 0.01 * torch.randn(9)
params.requires_grad = True
opt = optim.Adam([params], lr=1e-2)


max_epoch = 15
max_step = 30
batch_size = 10

encoded_dataset = list(zip(*[encode_game(game) for game in dataset]))
encoded_dataset_val = list(zip(*[encode_game(game) for game in dataset_val]))


def accuracy(p, x_val, y_val):
    with torch.no_grad():
        y_val = torch.tensor(y_val)
        y_out = torch.stack([torch.hstack(circuit(x, p)) for x in x_val])
        acc = torch.sum(torch.argmax(y_out, axis=1) == torch.argmax(y_val, axis=1))
        return acc / len(x_val)


print(f"accuracy without training = {accuracy(params, *encoded_dataset_val)}")

x_dataset = torch.stack(encoded_dataset[0])
y_dataset = torch.tensor(encoded_dataset[1], requires_grad=False)

saved_costs_sym = []
saved_accs_sym = []
for epoch in range(max_epoch):
    rand_idx = torch.randperm(len(x_dataset))
    # Shuffled dataset
    x_dataset = x_dataset[rand_idx]
    y_dataset = y_dataset[rand_idx]

    costs = []

    for step in range(max_step):
        x_batch = x_dataset[step * batch_size : (step + 1) * batch_size]
        y_batch = y_dataset[step * batch_size : (step + 1) * batch_size]

        def opt_func():
            opt.zero_grad()
            loss = cost_function(params, x_batch, y_batch)
            costs.append(loss.item())
            loss.backward()
            return loss

        opt.step(opt_func)

    cost = np.mean(costs)
    saved_costs_sym.append(cost)

    if (epoch + 1) % 1 == 0:
        # Compute validation accuracy
        acc_val = accuracy(params, *encoded_dataset_val)
        saved_accs_sym.append(acc_val)

        res = [epoch + 1, cost, acc_val]
        print("Epoch: {:2d} | Loss: {:3f} | Validation accuracy: {:3f}".format(*res))

Now we train the non-symmetry preserving circuit.


In [None]:
params = 0.01 * torch.randn(34)
params.requires_grad = True
opt = optim.Adam([params], lr=1e-2)

# calculate mean square error for this classification problem


def cost_function_no_sym(params, input, target):
    output = torch.stack([torch.hstack(circuit_no_sym(x, params)) for x in input])
    vec = output - target
    sum_sqr = torch.sum(vec * vec, dim=1)
    return torch.mean(sum_sqr)


max_epoch = 15
max_step = 30
batch_size = 15

encoded_dataset = list(zip(*[encode_game(game) for game in dataset]))
encoded_dataset_val = list(zip(*[encode_game(game) for game in dataset_val]))


def accuracy_no_sym(p, x_val, y_val):
    with torch.no_grad():
        y_val = torch.tensor(y_val)
        y_out = torch.stack([torch.hstack(circuit_no_sym(x, p)) for x in x_val])
        acc = torch.sum(torch.argmax(y_out, axis=1) == torch.argmax(y_val, axis=1))
        return acc / len(x_val)


print(f"accuracy without training = {accuracy_no_sym(params, *encoded_dataset_val)}")


x_dataset = torch.stack(encoded_dataset[0])
y_dataset = torch.tensor(encoded_dataset[1], requires_grad=False)

saved_costs = []
saved_accs = []
for epoch in range(max_epoch):
    rand_idx = torch.randperm(len(x_dataset))
    # Shuffled dataset
    x_dataset = x_dataset[rand_idx]
    y_dataset = y_dataset[rand_idx]

    costs = []

    for step in range(max_step):
        x_batch = x_dataset[step * batch_size : (step + 1) * batch_size]
        y_batch = y_dataset[step * batch_size : (step + 1) * batch_size]

        def opt_func():
            opt.zero_grad()
            loss = cost_function_no_sym(params, x_batch, y_batch)
            costs.append(loss.item())
            loss.backward()
            return loss

        opt.step(opt_func)

    cost = np.mean(costs)
    saved_costs.append(costs)

    if (epoch + 1) % 1 == 0:
        # Compute validation accuracy
        acc_val = accuracy_no_sym(params, *encoded_dataset_val)
        saved_accs.append(acc_val)

        res = [epoch + 1, cost, acc_val]
        print("Epoch: {:2d} | Loss: {:3f} | Validation accuracy: {:3f}".format(*res))

Finally let\'s plot the results and see how the two training regimes
differ.


In [None]:
from matplotlib import pyplot as plt

plt.title("Validation accuracies")
plt.plot(saved_accs_sym, "b", label="Symmetric")
plt.plot(saved_accs, "g", label="Standard")

plt.ylabel("Validation accuracy (%)")
plt.xlabel("Optimization steps")
plt.legend()
plt.show()

What we can see then is that by paying attention to the symmetries
intrinsic to the learning problem and reflecting this in an equivariant
gate set we have managed to improve our learning accuracies, while also
using fewer parameters. While the symmetry-aware circuit clearly
outperforms the naive one, it is notable however that the learning
accuracies in both cases are hardly ideal given this is a solved game.
So paying attention to symmetries definitely helps, but it also isn\'t a
magic bullet!


The use of symmetries in both quantum and classical machine learning is
a developing field, so we can expect new results to emerge over the
coming years. If you want to get involved, the references given below
are a great place to start.


# An equivariant graph embedding



A notorious problem when data comes in the form of graphs \-- think of
molecules or social media networks \-- is that the numerical
representation of a graph in a computer is not unique. For example, if
we describe a graph via an [adjacency
matrix](https://en.wikipedia.org/wiki/Adjacency_matrix) whose entries
contain the edge weights as off-diagonals and node weights on the
diagonal, any simultaneous permutation of rows and columns of this
matrix refer to the same graph.

![](Hands_on_8_images/adjacency-matrices.png)

For example, the graph in the image above is represented by each of the
two equivalent adjacency matrices. The top matrix can be transformed
into the bottom matrix by swapping the first row with the third row,
then swapping the third column with the third column, then the new first
row with the second, and finally the first colum with the second.

But the number of such permutations grows factorially with the number of
nodes in the graph, which is even worse than an exponential growth!

If we want computers to learn from graph data, we usually want our
models to \"know\" that all these permuted adjacency matrices refer to
the same object, so we do not waste resources on learning this property.
In mathematical terms, this means that the model should be in- or
equivariant (more about this distinction below) with respect to
permutations. This is the basic motivation of [Geometric Deep
Learning](https://geometricdeeplearning.com/), ideas of which have found
their way into quantum machine learning.

This tutorial shows how to implement an example of a trainable
permutation equivariant graph embedding as proposed in [Skolik et al.
(2022)](https://arxiv.org/pdf/2205.06109.pdf). The embedding maps the
adjacency matrix of an undirected graph with edge and node weights to a
quantum state, such that permutations of an adjacency matrix get mapped
to the same states *if only we also permute the qubit registers in the
same fashion*.

## Permuted adjacency matrices describe the same graph


Let us first verify that permuted adjacency matrices really describe one
and the same graph. We also gain some useful data generation functions
for later.

First we create random adjacency matrices. The entry $a_{ij}$ of this
matrix corresponds to the weight of the edge between nodes $i$ and $j$
in the graph. We assume that graphs have no self-loops; instead, the
diagonal elements of the adjacency matrix are interpreted as node
weights (or \"node attributes\").

Taking the example of a Twitter user retweet network, the nodes would be
users, edge weights indicate how often two users retweet each other and
node attributes could indicate the follower count of a user.


In [None]:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

rng = np.random.default_rng(4324234)

def create_data_point(n):
    """
    Returns a random undirected adjacency matrix of dimension (n,n). 
    The diagonal elements are interpreted as node attributes.
    """
    mat = rng.random((n, n))
    A = (mat + np.transpose(mat))/2    
    return np.round(A, decimals=2)

A = create_data_point(3)
print(A)

Let\'s also write a function to generate permuted versions of this
adjacency matrix.


In [None]:
def permute(A, permutation):
    """
    Returns a copy of A with rows and columns swapped according to permutation. 
    For example, the permutation [1, 2, 0] swaps 0->1, 1->2, 2->0.
    """
    
    P = np.zeros((len(A), len(A)))
    for i,j in enumerate(permutation):
        P[i,j] = 1

    return P @ A @ np.transpose(P)

A_perm = permute(A, [1, 2, 0])
print(A_perm)

If we create [networkx]{.title-ref} graphs from both adjacency matrices
and plot them, we see that they are identical as claimed.


In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2)

# interpret diagonal of matrix as node attributes
node_labels = {n: A[n,n] for n in range(len(A))} 
np.fill_diagonal(A, np.zeros(len(A))) 

G1 = nx.Graph(A)
pos1=nx.spring_layout(G1)
nx.draw(G1, pos1, labels=node_labels, ax=ax1, node_size = 800, node_color = "#ACE3FF")
edge_labels = nx.get_edge_attributes(G1,'weight')
nx.draw_networkx_edge_labels(G1,pos1,edge_labels=edge_labels, ax=ax1)

# interpret diagonal of permuted matrix as node attributes
node_labels = {n: A_perm[n,n] for n in range(len(A_perm))}
np.fill_diagonal(A_perm, np.zeros(len(A)))

G2 = nx.Graph(A_perm)
pos2=nx.spring_layout(G2)
nx.draw(G2, pos2, labels=node_labels, ax=ax2, node_size = 800, node_color = "#ACE3FF")
edge_labels = nx.get_edge_attributes(G2,'weight')
nx.draw_networkx_edge_labels(G2,pos2,edge_labels=edge_labels, ax=ax2)

ax1.set_xlim([1.2*x for x in ax1.get_xlim()])
ax2.set_xlim([1.2*x for x in ax2.get_xlim()])
plt.tight_layout()
plt.show()

Note: 

The issue of non-unique numerical representations of graphs ultimately
stems from the fact that the nodes in a graph do not have an intrinsic
order, and by labelling them in a numerical data structure like a matrix
we therefore impose an arbitrary order.


## Permutation equivariant embeddings


When we design a machine learning model that takes graph data, the first
step is to encode the adjacency matrix into a quantum state using an
embedding or quantum feature
map $\phi:$

$$A \rightarrow |\phi(A)\rangle .$$

We may want the resulting quantum state to be the same for all adjacency
matrices describing the same graph. In mathematical terms, this means
that $\phi$ is an *invariant* embedding with respect to simultaneous row
and column permutations $\pi(A)$ of the adjacency matrix:

$$|\phi(A) \rangle = |\phi(\pi(A))\rangle \;\; \text{ for all } \pi .$$

However, invariance is often too strong a constraint. Think for example
of an encoding that associates each node in the graph with a qubit. We
might want permutations of the adjacency matrix to lead to the same
state *up to an equivalent permutation of the qubits* $P_{\pi},$ where

$$P_{\pi} |q_1,...,q_n \rangle = |q_{\textit{perm}_{\pi}(1)}, ... q_{\textit{perm}_{\pi}(n)} \rangle .$$

The function $\text{perm}_{\pi}$ maps each index to the permuted index
according to $\pi.$

Note: 


The operator $P_{\pi}$ is implemented by PennyLane\'s
`~pennylane.Permute.`{.interpreted-text role="class"}


This results in an *equivariant* embedding with respect to permutations
of the adjacency matrix:

$$|\phi(A) \rangle = P_{\pi}|\phi(\pi(A))\rangle \;\; \text{ for all } \pi .$$

This is exactly what the following quantum embedding is aiming to do!
The mathematical details behind these concepts use group theory and are
beautiful, but can be a bit daunting. Have a look at [this
paper](https://arxiv.org/abs/2210.08566) if you want to learn more.

## Implementation in PennyLane


Let\'s get our hands dirty with an example. As mentioned, we will
implement the permutation-equivariant embedding suggested in [Skolik et
al. (2022)](https://arxiv.org/pdf/2205.06109.pdf) which has this
structure:

![](Hands_on_8_images/circuit1.png)

The image can be found in [Skolik et al.
(2022)](https://arxiv.org/pdf/2205.06109.pdf) and shows one layer of the
circuit. The $\epsilon$ are our edge weights while $\alpha$ describe the
node weights, and the $\beta,$ $\gamma$ are variational parameters.

In PennyLane this looks as follows:


In [None]:
import pennylane as qml

def perm_equivariant_embedding(A, betas, gammas):
    """
    Ansatz to embedd a graph with node and edge weights into a quantum state.
    
    The adjacency matrix A contains the edge weights on the off-diagonal, 
    as well as the node attributes on the diagonal.
    
    The embedding contains trainable weights 'betas' and 'gammas'.
    """
    n_nodes = len(A)
    n_layers = len(betas) # infer the number of layers from the parameters
    
    # initialise in the plus state
    for i in range(n_nodes):
        qml.Hadamard(i)
    
    for l in range(n_layers):

        for i in range(n_nodes):
            for j in range(i):
            	# factor of 2 due to definition of gate
                qml.IsingZZ(2*gammas[l]*A[i,j], wires=[i,j]) 

        for i in range(n_nodes):
            qml.RX(A[i,i]*betas[l], wires=i)

We can use this ansatz in a circuit.


In [None]:
n_qubits = 5
n_layers = 2

dev = qml.device("lightning.qubit", wires=n_qubits)

@qml.qnode(dev)
def eqc(adjacency_matrix, observable, trainable_betas, trainable_gammas):
    """Circuit that uses the permutation equivariant embedding"""
    
    perm_equivariant_embedding(adjacency_matrix, trainable_betas, trainable_gammas)
    return qml.expval(observable)


A = create_data_point(n_qubits)
betas = rng.random(n_layers)
gammas = rng.random(n_layers)
observable = qml.PauliX(0) @ qml.PauliX(1) @ qml.PauliX(3)

qml.draw_mpl(eqc, decimals=2)(A, observable, betas, gammas)
plt.show()

Validating the equivariance
===========================

Let\'s now check if the circuit is really equivariant!

This is the expectation value we get using the original adjacency matrix
as an input:


In [None]:
result_A = eqc(A, observable, betas, gammas)
print("Model output for A:", result_A)

If we permute the adjacency matrix, this is what we get:


In [None]:
perm = [2, 3, 0, 1, 4]
A_perm = permute(A, perm)
result_Aperm = eqc(A_perm, observable, betas, gammas)
print("Model output for permutation of A: ", result_Aperm)

Why are the two values different? Well, we constructed an *equivariant*
ansatz, not an *invariant* one! Remember, an *invariant* ansatz means
that embedding a permutation of the adjacency matrix leads to the same
state as an embedding of the original matrix. An *equivariant* ansatz
embeds the permuted adjacency matrix into a state where the qubits are
permuted as well.

As a result, the final state before measurement is only the same if we
permute the qubits in the same manner that we permute the input
adjacency matrix. We could insert a permutation operator
`qml.Permute(perm)` to achieve this, or we simply permute the wires of
the observables!


In [None]:
observable_perm = qml.PauliX(perm[0]) @ qml.PauliX(perm[1]) @ qml.PauliX(perm[3])

Now everything should work out!


In [None]:
result_Aperm = eqc(A_perm, observable_perm, betas, gammas)
print("Model output for permutation of A, and with permuted observable: ", result_Aperm)

Et voilà!

## Conclusion


Equivariant graph embeddings can be combined with other equivariant
parts of a quantum machine learning pipeline (like measurements and the
cost function). [Skolik et al.
(2022)](https://arxiv.org/pdf/2205.06109.pdf), for example, use such a
pipeline as part of a reinforcement learning scheme that finds heuristic
solutions for the traveling salesman problem. Their simulations compare
a fully equivariant model to circuits that break permutation
equivariance and show that it performs better, confirming that if we
know about structure in our data, we should try to use this knowledge in
machine learning.



# Quantum models as Fourier series




This demonstration is based on the paper *The effect of data encoding on
the expressive power of variational quantum machine learning models* by
[Schuld, Sweke, and Meyer (2020)](https://arxiv.org/abs/2008.08605).

![](Hands_on_8_images/scheme_thumb.png)


The paper links common quantum machine learning models designed for
near-term quantum computers to Fourier series (and, in more general, to
Fourier-type sums). With this link, the class of functions a quantum
model can learn (i.e., its \"expressivity\") can be characterized by the
model\'s control of the Fourier series\' frequencies and coefficients.


Background
==========


Ref. considers quantum machine learning models of the form

$$f_{\boldsymbol \theta}(x) = \langle 0| U^{\dagger}(x,\boldsymbol \theta) M U(x, \boldsymbol \theta) | 0 \rangle$$

where $M$ is a measurement observable and $U(x, \boldsymbol \theta)$ is
a variational quantum circuit that encodes a data input $x$ and depends
on a set of parameters $\boldsymbol \theta.$ Here we will restrict
ourselves to one-dimensional data inputs, but the paper motivates that
higher-dimensional features simply generalize to multi-dimensional
Fourier series.

The circuit itself repeats $L$ layers, each consisting of a
data-encoding circuit block $S(x)$ and a trainable circuit block
$W(\boldsymbol \theta)$ that is controlled by the parameters
$\boldsymbol \theta.$ The data encoding block consists of gates of the
form $\mathcal{G}(x) = e^{-ix H},$ where $H$ is a Hamiltonian. A
prominent example of such gates are Pauli rotations.


The paper shows how such a quantum model can be written as a
Fourier-type sum of the form

$$f_{ \boldsymbol \theta}(x) = \sum_{\omega \in \Omega} c_{\omega}( \boldsymbol \theta) \; e^{i  \omega x}.$$

As illustrated in the picture below (which is Figure 1 from the paper),
the \"encoding Hamiltonians\" in $S(x)$ determine the set $\Omega$ of
available \"frequencies\", and the remainder of the circuit, including
the trainable parameters, determines the coefficients $c_{\omega}.$


![](Hands_on_8_images/scheme.png)


The paper demonstrates many of its findings for circuits in which
$\mathcal{G}(x)$ is a single-qubit Pauli rotation gate. For example, it
shows that $r$ repetitions of a Pauli rotation-encoding gate in
\"sequence\" (on the same qubit, but with multiple layers $r=L$) or in
\"parallel\" (on $r$ different qubits, with $L=1$) creates a quantum
model that can be expressed as a *Fourier series* of the form

$$f_{ \boldsymbol \theta}(x) = \sum_{n \in \Omega} c_{n}(\boldsymbol \theta) e^{i  n x},$$

where $\Omega = \{ -r, \dots, -1, 0, 1, \dots, r\}$ is a spectrum of
consecutive integer-valued frequencies up to degree $r.$

As a result, we expect quantum models that encode an input $x$ by $r$
Pauli rotations to only be able to fit Fourier series of at most degree
$r.$


Goal of this demonstration
==========================


The experiments below investigate this \"Fourier-series\"-like nature of
quantum models by showing how to reproduce the simulations underlying
Figures 3, 4 and 5 in Section II of the paper:

-   **Figures 3 and 4** are function-fitting experiments, where quantum
    models with different encoding strategies have the task to fit
    Fourier series up to a certain degree. As in the paper, we will use
    examples of qubit-based quantum circuits where a single data feature
    is encoded via Pauli rotations.
-   **Figure 5** plots the Fourier coefficients of randomly sampled
    instances from a family of quantum models which is defined by some
    parametrized ansatz.

The code is presented so you can easily modify it in order to play
around with other settings and models. The settings used in the paper
are given in the various subsections.


First of all, let\'s make some imports and define a standard loss
function for the training.


In [None]:
import matplotlib.pyplot as plt
import pennylane as qml
from pennylane import numpy as np

np.random.seed(42)


def square_loss(targets, predictions):
    loss = 0
    for t, p in zip(targets, predictions):
        loss += (t - p) ** 2
    loss = loss / len(targets)
    return 0.5 * loss

Part I: Fitting Fourier series with serial Pauli-rotation encoding
==================================================================


First we will reproduce Figures 3 and 4 from the paper. These show how
quantum models that use Pauli rotations as data-encoding gates can only
fit Fourier series up to a certain degree. The degree corresponds to the
number of times that the Pauli gate gets repeated in the quantum model.

Let us consider circuits where the encoding gate gets repeated
sequentially (as in Figure 2a of the paper). For simplicity we will only
look at single-qubit circuits:

![](Hands_on_8_images/single_qubit_model.png)


Define a target function
========================


We first define a (classical) target function which will be used as a
\"ground truth\" that the quantum model has to fit. The target function
is constructed as a Fourier series of a specific degree.

We also allow for a rescaling of the data by a hyperparameter `scaling`,
which we will do in the quantum model as well. As shown in, for the
quantum model to learn the classical model in the experiment below, the
scaling of the quantum model and the target function have to match,
which is an important observation for the design of quantum machine
learning models.


In [None]:
degree = 1  # degree of the target function
scaling = 1  # scaling of the data
coeffs = [0.15 + 0.15j] * degree  # coefficients of non-zero frequencies
coeff0 = 0.1  # coefficient of zero frequency


def target_function(x):
    """Generate a truncated Fourier series, where the data gets re-scaled."""
    res = coeff0
    for idx, coeff in enumerate(coeffs):
        exponent = np.complex128(scaling * (idx + 1) * x * 1j)
        conj_coeff = np.conjugate(coeff)
        res += coeff * np.exp(exponent) + conj_coeff * np.exp(-exponent)
    return np.real(res)

Let\'s have a look at it.


In [None]:
x = np.linspace(-6, 6, 70, requires_grad=False)
target_y = np.array([target_function(x_) for x_ in x], requires_grad=False)

plt.plot(x, target_y, c="black")
plt.scatter(x, target_y, facecolor="white", edgecolor="black")
plt.ylim(-1, 1)
plt.show()

::: {.note}
::: {.title}
Note
:::

To reproduce the figures in the paper, you can use the following
settings in the cells above:

-   For the settings

        degree = 1
        coeffs = (0.15 + 0.15j) * degree
        coeff0 = 0.1

    this function is the ground truth
    $g(x) = \sum_{n=-1}^1 c_{n} e^{-nix}$ from Figure 3 in the paper.

-   To get the ground truth $g'(x) = \sum_{n=-2}^2 c_{n} e^{-nix}$ with
    $c_0=0.1,$ $c_1 = c_2 = 0.15 - 0.15i$ from Figure 3, you need to
    increase the degree to two:

        degree = 2

-   The ground truth from Figure 4 can be reproduced by changing the
    settings to:

        degree = 5
        coeffs = (0.05 + 0.05j) * degree
        coeff0 = 0.0
:::


Define the serial quantum model
===============================


We now define the quantum model itself.


In [None]:
scaling = 1

dev = qml.device("default.qubit", wires=1)


def S(x):
    """Data-encoding circuit block."""
    qml.RX(scaling * x, wires=0)


def W(theta):
    """Trainable circuit block."""
    qml.Rot(theta[0], theta[1], theta[2], wires=0)


@qml.qnode(dev)
def serial_quantum_model(weights, x):

    for theta in weights[:-1]:
        W(theta)
        S(x)

    # (L+1)'th unitary
    W(weights[-1])

    return qml.expval(qml.PauliZ(wires=0))

You can run the following cell multiple times, each time sampling
different weights, and therefore different quantum models.


In [None]:
r = 1  # number of times the encoding gets repeated (here equal to the number of layers)
weights = (
    2 * np.pi * np.random.random(size=(r + 1, 3), requires_grad=True)
)  # some random initial weights

x = np.linspace(-6, 6, 70, requires_grad=False)
random_quantum_model_y = [serial_quantum_model(weights, x_) for x_ in x]

plt.plot(x, random_quantum_model_y, c="blue")
plt.ylim(-1, 1)
plt.show()

No matter what weights are picked, the single qubit model for
[L=1]{.title-ref} will always be a sine function of a fixed frequency.
The weights merely influence the amplitude, y-shift, and phase of the
sine.

This observation is formally derived in Section II.A of the paper.


::: {.note}
::: {.title}
Note
:::

You can increase the number of layers. Figure 4 from the paper, for
example, uses the settings `L=1`, `L=3` and `L=5`.
:::


Finally, let\'s look at the circuit we just created:


In [None]:
print(qml.draw(serial_quantum_model)(weights, x[-1]))

Fit the model to the target
===========================


The next step is to optimize the weights in order to fit the ground
truth.


In [None]:
def cost(weights, x, y):
    predictions = [serial_quantum_model(weights, x_) for x_ in x]
    return square_loss(y, predictions)


max_steps = 50
opt = qml.AdamOptimizer(0.3)
batch_size = 25
cst = [cost(weights, x, target_y)]  # initial cost

for step in range(max_steps):

    # Select batch of data
    batch_index = np.random.randint(0, len(x), (batch_size,))
    x_batch = x[batch_index]
    y_batch = target_y[batch_index]

    # Update the weights by one optimizer step
    weights, _, _ = opt.step(cost, weights, x_batch, y_batch)

    # Save, and possibly print, the current cost
    c = cost(weights, x, target_y)
    cst.append(c)
    if (step + 1) % 10 == 0:
        print("Cost at step {0:3}: {1}".format(step + 1, c))

To continue training, you may just run the above cell again. Once you
are happy, you can use the trained model to predict function values, and
compare them with the ground truth.


In [None]:
predictions = [serial_quantum_model(weights, x_) for x_ in x]

plt.plot(x, target_y, c="black")
plt.scatter(x, target_y, facecolor="white", edgecolor="black")
plt.plot(x, predictions, c="blue")
plt.ylim(-1, 1)
plt.show()

Let\'s also have a look at the cost during training.


In [None]:
plt.plot(range(len(cst)), cst)
plt.ylabel("Cost")
plt.xlabel("Step")
plt.ylim(0, 0.23)
plt.show()

With the initial settings and enough training steps, the quantum model
learns to fit the ground truth perfectly. This is expected, since the
number of Pauli-rotation-encoding gates and the degree of the ground
truth Fourier series are both one.

If the ground truth\'s degree is larger than the number of layers in the
quantum model, the fit will look much less accurate. And finally, we
also need to have the correct scaling of the data: if one of the models
changes the `scaling` parameter (which effectively scales the
frequencies), fitting does not work even with enough encoding
repetitions.


Note:

You will find that the training takes much longer, and needs a lot more
steps to converge for larger L. Some initial weights may not even
converge to a good solution at all; the training seems to get stuck in a
minimum.

It is an open research question whether for asymptotically large L, the
single qubit model can fit *any* function by constructing arbitrary
Fourier coefficients.



Part II: Fitting Fourier series with parallel Pauli-rotation encoding
=====================================================================


Our next task is to repeat the function-fitting experiment for a circuit
where the Pauli rotation gate gets repeated $r$ times on *different*
qubits, using a single layer $L=1.$

As shown in the paper, we expect similar results to the serial model: a
Fourier series of degree $r$ can only be fitted if there are at least
$r$ repetitions of the encoding gate in the quantum model. However, in
practice this experiment is a bit harder, since the dimension of the
trainable unitaries $W$ grows quickly with the number of qubits.

In the paper, the investigations are made with the assumption that the
purple trainable blocks $W$ are arbitrary unitaries. We could use the
`~.pennylane.templates.ArbitraryUnitary`{.interpreted-text role="class"}
template, but since this template requires a number of parameters that
grows exponentially with the number of qubits ($4^L-1$ to be precise),
this quickly becomes cumbersome to train.

We therefore follow Figure 4 in the paper and use an ansatz for $W.$


![](Hands_on_8_images/parallel_model.png)


Define the parallel quantum model
=================================


The ansatz is PennyLane\'s layer structure called
`~.pennylane.templates.StronglyEntanglingLayers`{.interpreted-text
role="class"}, and as the name suggests, it has itself a user-defined
number of layers (which we will call \"ansatz layers\" to avoid
confusion).


In [None]:
from pennylane.templates import StronglyEntanglingLayers

Let\'s have a quick look at the ansatz itself for 3 qubits by making a
dummy circuit of 2 ansatz layers:


In [None]:
n_ansatz_layers = 2
n_qubits = 3

dev = qml.device("default.qubit", wires=4)


@qml.qnode(dev)
def ansatz(weights):
    StronglyEntanglingLayers(weights, wires=range(n_qubits))
    return qml.expval(qml.Identity(wires=0))


weights_ansatz = 2 * np.pi * np.random.random(size=(n_ansatz_layers, n_qubits, 3))
print(qml.draw(ansatz, level="device")(weights_ansatz))

Now we define the actual quantum model.


In [None]:
scaling = 1
r = 3

dev = qml.device("default.qubit", wires=r)


def S(x):
    """Data-encoding circuit block."""
    for w in range(r):
        qml.RX(scaling * x, wires=w)


def W(theta):
    """Trainable circuit block."""
    StronglyEntanglingLayers(theta, wires=range(r))


@qml.qnode(dev)
def parallel_quantum_model(weights, x):

    W(weights[0])
    S(x)
    W(weights[1])

    return qml.expval(qml.PauliZ(wires=0))

Again, you can sample random weights and plot the model function:


In [None]:
trainable_block_layers = 3
weights = 2 * np.pi * np.random.random(size=(2, trainable_block_layers, r, 3), requires_grad=True)

x = np.linspace(-6, 6, 70, requires_grad=False)
random_quantum_model_y = [parallel_quantum_model(weights, x_) for x_ in x]

plt.plot(x, random_quantum_model_y, c="blue")
plt.ylim(-1, 1)
plt.show()

Training the model
==================


Training the model is done exactly as before, but it may take a lot
longer this time. We set a default of 70 steps, which you should
increase if necessary. Small models of \<6 qubits usually converge after
a few hundred steps at most---but this depends on your settings.


In [None]:
def cost(weights, x, y):
    predictions = [parallel_quantum_model(weights, x_) for x_ in x]
    return square_loss(y, predictions)


max_steps = 70
opt = qml.AdamOptimizer(0.3)
batch_size = 25
cst = [cost(weights, x, target_y)]  # initial cost

for step in range(max_steps):

    # select batch of data
    batch_index = np.random.randint(0, len(x), (batch_size,))
    x_batch = x[batch_index]
    y_batch = target_y[batch_index]

    # update the weights by one optimizer step
    weights, _, _ = opt.step(cost, weights, x_batch, y_batch)

    # save, and possibly print, the current cost
    c = cost(weights, x, target_y)
    cst.append(c)
    if (step + 1) % 10 == 0:
        print("Cost at step {0:3}: {1}".format(step + 1, c))

In [None]:
predictions = [parallel_quantum_model(weights, x_) for x_ in x]

plt.plot(x, target_y, c="black")
plt.scatter(x, target_y, facecolor="white", edgecolor="black")
plt.plot(x, predictions, c="blue")
plt.ylim(-1, 1)
plt.show()

In [None]:
plt.plot(range(len(cst)), cst)
plt.ylabel("Cost")
plt.xlabel("Step")
plt.show()

Note :

To reproduce the right column in Figure 4 from the paper, use the
correct ground truth, $r=3$ and
[\`trainable\_block\_layers=3] as well as sufficiently
many training steps. The amount of steps depends on the initial weights
and other hyperparameters, and in some settings training may not
converge to zero error at all.


Part III: Sampling Fourier coefficients
=======================================


When we use a trainable ansatz above, it is possible that even with
enough repetitions of the data-encoding Pauli rotation, the quantum
model cannot fit the circuit, since the expressivity of quantum models
also depends on the Fourier coefficients the model can create.

Figure 5 in shows Fourier coefficients from quantum models sampled from
a model family defined by an ansatz for the trainable circuit block. For
this we need a function that numerically computes the Fourier
coefficients of a periodic function f with period $2 \pi.$


In [None]:
def fourier_coefficients(f, K):
    """
    Computes the first 2*K+1 Fourier coefficients of a 2*pi periodic function.
    """
    n_coeffs = 2 * K + 1
    t = np.linspace(0, 2 * np.pi, n_coeffs, endpoint=False)
    y = np.fft.rfft(f(t)) / t.size
    return y

Define your quantum model
=========================


Now we need to define a quantum model. This could be any model, using a
qubit or continuous-variable circuit, or one of the quantum models from
above. We will use a slight derivation of the `parallel_qubit_model()`
from above, this time using the
`~.pennylane.templates.BasicEntanglerLayers`{.interpreted-text
role="class"} ansatz:


In [None]:
from pennylane.templates import BasicEntanglerLayers

scaling = 1
n_qubits = 4

dev = qml.device("default.qubit", wires=n_qubits)


def S(x):
    """Data encoding circuit block."""
    for w in range(n_qubits):
        qml.RX(scaling * x, wires=w)


def W(theta):
    """Trainable circuit block."""
    BasicEntanglerLayers(theta, wires=range(n_qubits))


@qml.qnode(dev)
def quantum_model(weights, x):

    W(weights[0])
    S(x)
    W(weights[1])

    return qml.expval(qml.PauliZ(wires=0))

It will also be handy to define a function that samples different random
weights of the correct size for the model.


In [None]:
n_ansatz_layers = 1


def random_weights():
    return 2 * np.pi * np.random.random(size=(2, n_ansatz_layers, n_qubits))

Now we can compute the first few Fourier coefficients for samples from
this model. The samples are created by randomly sampling different
parameters using the `random_weights()` function.


In [None]:
n_coeffs = 5
n_samples = 100


coeffs = []
for i in range(n_samples):

    weights = random_weights()

    def f(x):
        return np.array([quantum_model(weights, x_) for x_ in x])

    coeffs_sample = fourier_coefficients(f, n_coeffs)
    coeffs.append(coeffs_sample)

coeffs = np.array(coeffs)
coeffs_real = np.real(coeffs)
coeffs_imag = np.imag(coeffs)

Let\'s plot the real vs. the imaginary part of the coefficients. As a
sanity check, the $c_0$ coefficient should be real, and therefore have
no contribution on the y-axis.


In [None]:
n_coeffs = len(coeffs_real[0])

fig, ax = plt.subplots(1, n_coeffs, figsize=(15, 4))

for idx, ax_ in enumerate(ax):
    ax_.set_title(r"$c_{}$".format(idx))
    ax_.scatter(
        coeffs_real[:, idx],
        coeffs_imag[:, idx],
        s=20,
        facecolor="white",
        edgecolor="red",
    )
    ax_.set_aspect("equal")
    ax_.set_ylim(-1, 1)
    ax_.set_xlim(-1, 1)


plt.tight_layout(pad=0.5)
plt.show()

Playing around with different quantum models, you will find that some
quantum models create different distributions over the coefficients than
others. For example `BasicEntanglingLayers` (with the default Pauli-X
rotation) seems to have a structure that forces the even Fourier
coefficients to zero, while `StronglyEntanglingLayers` will have a
non-zero variance for all supported coefficients.

Note also how the variance of the distribution decreases for growing
orders of the coefficients---an effect linked to the convergence of a
Fourier series.


Note :


To reproduce the results from Figure 5 you have to change the ansatz (no
unitary, `BasicEntanglerLayers` or `StronglyEntanglingLayers`, and set
`n_ansatz_layers` either to $1$ or $5$). The `StronglyEntanglingLayers`
requires weights of shape `size=(2, n_ansatz_layers, n_qubits, 3)`.


Continuous-variable model
=========================

Ref. mentions that a phase rotation in continuous-variable quantum
computing has a spectrum that supports *all* Fourier frequecies. To play
with this model, we finally show you the code for a continuous-variable
circuit. For example, to see its Fourier coefficients run the cell
below, and then re-run the two cells above.


In [None]:
var = 2
n_ansatz_layers = 1
dev_cv = qml.device("default.gaussian", wires=1)


def S(x):
    qml.Rotation(x, wires=0)


def W(theta):
    """Trainable circuit block."""
    for r_ in range(n_ansatz_layers):
        qml.Displacement(theta[0], theta[1], wires=0)
        qml.Squeezing(theta[2], theta[3], wires=0)


@qml.qnode(dev_cv)
def quantum_model(weights, x):
    W(weights[0])
    S(x)
    W(weights[1])
    return qml.expval(qml.QuadX(wires=0))


def random_weights():
    return np.random.normal(size=(2, 5 * n_ansatz_layers), loc=0, scale=var)

Note :


To find out what effect so-called \"non-Gaussian\" gates like the `Kerr`
gate have, you need to install the [strawberryfields
plugin](https://pennylane-sf.readthedocs.io/en/latest/) and change the
device to

``` {.python}
dev_cv = qml.device('strawberryfields.fock', wires=1, cutoff_dim=50)
```



## Equivariant Quantum Machine learning

In the following, we will denote elements of a symmetry group $G$ with
$g \in G.$ $G$ could be for instance the rotation group $SO(3),$ or the
permutation group $S_n.$ Groups are often easier understood in terms of
their representation $V_g : \mathcal{V} \rightarrow \mathcal{V}$ which
maps group elements to invertible linear operations, i.e. to $GL(n),$ on
some vector space $\mathcal{V}.$ We call a function
$f: \mathcal{V} \rightarrow \mathcal{W}$ *invariant* with respect to the
action of the group, if

$$f(V_g(v)) = f(v),  \text{  for all } g \in G.$$

The concept of *equivariance* is a bit weaker, as it only requires the
function to *commute* with the group action, instead of remaining
constant. In mathematical terms, we require that

$$f(V_g(v)) = \mathcal{R}_g(f(v)),  \text{  for all } g \in G,$$

with $\mathcal{R}$ being a representation of $G$ on the vector space
$\mathcal{W}.$ These concepts are important in machine learning, as they
tell us how the internal structure of the data, described by the group,
is conserved when passing through the model. In the remaining, we will
refer to $\mathcal{V}$ and $V_g$ as the data space and the
representation on it, respectively, and $\mathcal{W}$ and
$\mathcal{R}_g$ as the qubit space and the symmetry action on it,
respectively.

Now that we have the basics, we will focus on the task at hand: building
an equivariant quantum neural network for chemistry!

We use a [quantum reuploading
model](https://pennylane.ai/qml/demos/tutorial_expressivity_fourier_series/),
which consists of a variational ansatz $M_\Theta(\mathcal{X})$ applied
to some initial state $|\psi_0\rangle.$ Here, $\mathcal{X}$ denotes the
description of a molecular configuration, i.e., the set of Cartesian
coordinates of the atoms. The quantum circuit is given by

$$M_\Theta(\mathcal{X}) = \left[ \prod_{d=D}^1 \Phi(\mathcal{X}) \mathcal{U}_d(\vec{\theta}_d) \right] \Phi(\mathcal{X}),$$

and depends on both data $\mathcal{X}$ and trainable parameters
$\Theta = \{\vec{\theta}_d\}_{d=1}^D.$ It is built by interleaving
parametrized trainable layers $U_d(\vec{\theta}_d)$ with data encoding
layers $\Phi(\mathcal{X}).$ The corresponding quantum function
$f_{\Theta}(\mathcal{X})$ is then given by the expectation value of a
chosen observable $O$

$$f_\Theta(\mathcal{X}) = \langle \psi_0 | M_\Theta(\mathcal{X})^\dagger O M_\Theta(\mathcal{X}) |\psi_0 \rangle.$$

For the cases of a diatomic molecule (e.g. $LiH$) and a triatomic
molecule of two atom types (e.g. $H_2O$), panel (a) of the following
figure displays the descriptions of the chemical systems by the
Cartesian coordinates of their atoms, while the general circuit
formulation of the corresponding symmetry-invariant VQLM for these cases
is shown in panel (b). Note that we will only consider the triatomic
molecule $H_2O$ in the rest of this demo.

![](Hands_on_8_images/siVQLM_monomer.jpg)

An overall invariant model is composed of four ingredients: an invariant
initial state, an equivariant encoding layer, equivariant trainable
layers, and finally an invariant observable. Here, equivariant encoding
means that applying the symmetry transformation first on the atomic
configuration $\mathcal{X}$ and then encoding it into the qubits
produces the same results as first encoding $\mathcal{X}$ and then
letting the symmetry act on the qubits, i.e.,

$$\Phi(V_g[\mathcal{X}]) = \mathcal{R}_g \Phi(\mathcal{X}) \mathcal{R}_g^\dagger,$$

where $V_g$ and $\mathcal{R}_g$ denote the symmetry representation on
the data and qubit level, respectively.

For the trainable layer, equivariance means that the order of applying
the symmetry and the parametrized operations does not matter:

$$\left[\mathcal{U}_d(\vec{\theta}_d), \mathcal{R}_g\right]=0.$$

Furthermore, we need to find an invariant observable
$O = \mathcal{R}_g O \mathcal{R}_g^\dagger$ and an initial state
$|\psi_0\rangle = \mathcal{R}_g |\psi_0\rangle,$ i.e., which can absorb
the symmetry action. Putting all this together results in a
symmetry-invariant VQLM as required.

In this demo, we will consider the example of a triatomic molecule of
two atom types, such as a water molecule. In this case, the system is
invariant under translations, rotations, and the exchange of the two
hydrogen atoms. Translational symmetry is included by taking the central
atom as the origin. Therefore, we only need to encode the coordinates of
the two identical *active* atoms, which we will call $\vec{x}_1$ and
$\vec{x}_2.$

Let's implement the model depicted above!


# Implementation of the VQLM

We start by importing the libraries that we will need.


In [None]:
import pennylane as qml
import numpy as np

import jax

jax.config.update("jax_platform_name", "cpu")
jax.config.update("jax_enable_x64", True)

from jax import numpy as jnp

import scipy
import matplotlib.pyplot as plt
import sklearn

Let us construct Pauli matrices, which are used to build the
Hamiltonian.


In [None]:
X = np.array([[0, 1], [1, 0]])
Y = np.array([[0, -1.0j], [1.0j, 0]])
Z = np.array([[1, 0], [0, -1]])

sigmas = jnp.array(np.array([X, Y, Z]))  # Vector of Pauli matrices
sigmas_sigmas = jnp.array(
    np.array(
        [
            np.kron(X, X),
            np.kron(Y, Y),
            np.kron(Z, Z),
        ]  # Vector of tensor products of Pauli matrices
    )
)

We start by considering **rotational invariance** and building an
initial state invariant under rotation, such as the singlet state
$|S\rangle = \frac{|01⟩−|10⟩}{\sqrt{2}}.$ A general $2n$-invariant state
can be obtained by taking $n$-fold tensor product.


In [None]:
def singlet(wires):
    # Encode a 2-qubit rotation-invariant initial state, i.e., the singlet state.

    qml.Hadamard(wires=wires[0])
    qml.PauliZ(wires=wires[0])
    qml.PauliX(wires=wires[1])
    qml.CNOT(wires=wires)

Next, we need a rotationally equivariant data embedding. We choose to
encode a three-dimensional data point $\vec{x}\in \mathbb{R}^3$ via

$$\Phi(\vec{x}) = \exp\left( -i\alpha_\text{enc} [xX + yY + zZ] \right),$$

where we introduce a trainable encoding angle
$\alpha_\text{enc}\in\mathbb{R}.$ This encoding scheme is indeed
equivariant since embedding a rotated data point is the same as
embedding the original data point and then letting the rotation act on
the qubits:
$\Phi(r(\psi,\theta,\phi)\vec{x}) = U(\psi,\theta,\phi) \Phi(\vec{x}) U(\psi,\theta,\phi)^\dagger.$
For this, we have noticed that any rotation on the data level can be
parametrized by three angles $V_g = r(\psi,\theta,\phi),$ which can also
be used to parametrize the corresponding single-qubit rotation
$\mathcal{R}_g = U(\psi,\theta,\phi),$ implemented by the usual
[qml.rot](https://docs.pennylane.ai/en/stable/code/api/pennylane.Rot.html)
operation. We choose to encode each atom twice in parallel, resulting in
higher expressivity. We can do so by simply using this encoding scheme
twice for each active atom (the two Hydrogens in our case):

$$\Phi(\vec{x}_1, \vec{x}_2) = \Phi^{(1)}(\vec{x}_1) \Phi^{(2)}(\vec{x}_2) \Phi^{(3)}(\vec{x}_1) \Phi^{(4)}(\vec{x}_2).$$


In [None]:
def equivariant_encoding(alpha, data, wires):
    # data (jax array): cartesian coordinates of atom i
    # alpha (jax array): trainable scaling parameter

    hamiltonian = jnp.einsum("i,ijk", data, sigmas)  # Heisenberg Hamiltonian
    U = jax.scipy.linalg.expm(-1.0j * alpha * hamiltonian / 2)
    qml.QubitUnitary(U, wires=wires, id="E")

Finally, we require an equivariant trainable map and an invariant
observable. We take the Heisenberg Hamiltonian, which is rotationally
invariant, as an inspiration. We define a single summand of it,
$H^{(i,j)}(J) = -J\left( X^{(i)}X^{(j)} + Y^{(i)}Y^{(j)} + Z^{(i)}Z^{(j)} \right),$
as a rotationally invariant two-qubit operator and choose

$$O = X^{(0)}X^{(1)} + Y^{(0)}Y^{(1)} + Z^{(0)}Z^{(1)}$$

as our observable.

Furthermore, we can obtain an equivariant parametrized operator by
exponentiating this Heisenberg interaction:

$$RH^{(i,j)}(J) = \exp\left( -iH^{(i,j)}(J) \right),$$

where $J\in\mathbb{R}$ is a trainable parameter. By combining this
exponentiated operator for different pairs of qubits, we can design our
equivariant trainable layer

$$\mathcal{U}(\vec{j}) = RH^{(1,2)}(j_1) RH^{(3,4)}(j_2) RH^{(2,3)}(j_3)$$

In the case of a triatomic molecule of two atom types, we need to modify
the previous VQLM to additionally take into account the **invariance
under permutations of the same atom types**.

Interchanging two atoms is represented on the data level by simply
interchanging the corresponding coordinates,
$V_g = \sigma(\vec{x}_1, \vec{x}_2) = (\vec{x}_2, \vec{x}_1).$ On the
Hilbert space this is represented by swapping the corresponding qubits,
$\mathcal{R}_g = U(i,j) = SWAP(i,j).$

The singlet state is not only rotationally invariant but also
permutationally invariant under swapping certain qubit pairs, so we can
keep it. The previous embedding scheme for one data point can be
extended for embedding two atoms and we see that this is indeed not only
rotationally equivariant but also equivariant with respect to
permutations, since encoding two swapped atoms is just the same as
encoding the atoms in the original order and then swapping the qubits:
$\Phi\left( \sigma(\vec{x}_1, \vec{x}_2) \right) = SWAP(i,j) \Phi(\vec{x}_1, \vec{x}_2) SWAP(i,j).$
Again, we choose to encode each atom twice as depicted above.

For the invariant observable $O,$ we note that our Heisenberg
interaction is invariant under the swapping of the two involved qubits,
therefore we can make use of the same observable as before.

For the equivariant parametrized layer we need to be careful when it
comes to the selection of qubit pairs in order to obtain equivariance,
i.e., operations that commute with the swappings. This is fulfilled by
coupling only the qubits with are neighbors with respect to the
1-2-3-4-1 ring topology, leading to the following operation:

$$\mathcal{U}(\vec{j}) = RH^{(1,2)}(j_1) RH^{(3,4)}(j_2) RH^{(2,3)}(j_3) RH^{(1,4)}(j_3)$$

In code, we have:


In [None]:
def trainable_layer(weight, wires):
    hamiltonian = jnp.einsum("ijk->jk", sigmas_sigmas)
    U = jax.scipy.linalg.expm(-1.0j * weight * hamiltonian)
    qml.QubitUnitary(U, wires=wires, id="U")


# Invariant observable
Heisenberg = [
    qml.PauliX(0) @ qml.PauliX(1),
    qml.PauliY(0) @ qml.PauliY(1),
    qml.PauliZ(0) @ qml.PauliZ(1),
]
Observable = qml.Hamiltonian(np.ones((3)), Heisenberg)

It has been observed that a small amount of **symmetry-breaking** (SB)
can improve the convergence of the VQLM. We implement it by adding a
small rotation around the $z$-axis.


In [None]:
def noise_layer(epsilon, wires):
    for _, w in enumerate(wires):
        qml.RZ(epsilon[_], wires=[w])

When setting up the model, the hyperparameters such as the number of
repetitions of encoding and trainable layers have to be chosen suitably.
In this demo, we choose six layers ($D=6$) and one repetition of
trainable gates inside each layer ($B=1$) to reduce long runtimes. Note
that this choice differs from the original paper, so the results therein
will not be fully reproduced within this demo. We start by defining the
relevant hyperparameters and the VQLM.


In [None]:
############ Setup ##############
D = 6  # Depth of the model
B = 1  # Number of repetitions inside a trainable layer
rep = 2  # Number of repeated vertical encoding

active_atoms = 2  # Number of active atoms
# Here we only have two active atoms since we fixed the oxygen (which becomes non-active) at the origin
num_qubits = active_atoms * rep

In [None]:
dev = qml.device("default.qubit", wires=num_qubits)


@qml.qnode(dev, interface="jax")
def vqlm(data, params):

    weights = params["params"]["weights"]
    alphas = params["params"]["alphas"]
    epsilon = params["params"]["epsilon"]

    # Initial state
    for i in range(rep):
        singlet(wires=np.arange(active_atoms * i, active_atoms * (1 + i)))

    # Initial encoding
    for i in range(num_qubits):
        equivariant_encoding(
            alphas[i, 0], jnp.asarray(data, dtype=complex)[i % active_atoms, ...], wires=[i]
        )

    # Reuploading model
    for d in range(D):
        qml.Barrier()

        for b in range(B):
            # Even layer
            for i in range(0, num_qubits - 1, 2):
                trainable_layer(weights[i, d + 1, b], wires=[i, (i + 1) % num_qubits])

            # Odd layer
            for i in range(1, num_qubits, 2):
                trainable_layer(weights[i, d + 1, b], wires=[i, (i + 1) % num_qubits])

        # Symmetry-breaking
        if epsilon is not None:
            noise_layer(epsilon[d, :], range(num_qubits))

        # Encoding
        for i in range(num_qubits):
            equivariant_encoding(
                alphas[i, d + 1],
                jnp.asarray(data, dtype=complex)[i % active_atoms, ...],
                wires=[i],
            )

    return qml.expval(Observable)

Simulation for the water molecule
=================================

We start by downloading the
[dataset](https://zenodo.org/records/2634098), which we have prepared
for convenience as a Python ndarray. In the following, we will load,
preprocess and split the data into a training and testing set, following
standard practices.


In [None]:
# Load the data
energy = np.load("eqnn_force_field_data/Energy.npy")
forces = np.load("eqnn_force_field_data/Forces.npy")
positions = np.load(
    "eqnn_force_field_data/Positions.npy"
)  # Cartesian coordinates shape = (nbr_sample, nbr_atoms,3)
shape = np.shape(positions)

### Scaling the energy to fit in [-1,1]
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler((-1, 1))

energy = scaler.fit_transform(energy)
forces = forces * scaler.scale_


# Placing the oxygen at the origin
data = np.zeros((shape[0], 2, 3))
data[:, 0, :] = positions[:, 1, :] - positions[:, 0, :]
data[:, 1, :] = positions[:, 2, :] - positions[:, 0, :]
positions = data.copy()

forces = forces[:, 1:, :]  # Select only the forces on the hydrogen atoms since the oxygen is fixed


# Splitting in train-test set
indices_train = np.random.choice(np.arange(shape[0]), size=int(0.8 * shape[0]), replace=False)
indices_test = np.setdiff1d(np.arange(shape[0]), indices_train)

E_train, E_test = (energy[indices_train, 0], energy[indices_test, 0])
F_train, F_test = forces[indices_train, ...], forces[indices_test, ...]
data_train, data_test = (
    jnp.array(positions[indices_train, ...]),
    jnp.array(positions[indices_test, ...]),
)

We will know define the cost function and how to train the model using
Jax. We will use the mean-square-error loss function. To speed up the
computation, we use the decorator `@jax.jit` to do just-in-time
compilation for this execution. This means the first execution will
typically take a little longer with the benefit that all following
executions will be significantly faster, see the [Jax docs on
jitting](https://jax.readthedocs.io/en/latest/jax-101/02-jitting.html).


In [None]:
from jax.example_libraries import optimizers

# We vectorize the model over the data points
vec_vqlm = jax.vmap(vqlm, (0, None), 0)


# Mean-squared-error loss function
@jax.jit
def mse_loss(predictions, targets):
    return jnp.mean(0.5 * (predictions - targets) ** 2)


# Make prediction and compute the loss
@jax.jit
def cost(weights, loss_data):
    data, E_target, F_target = loss_data
    E_pred = vec_vqlm(data, weights)
    l = mse_loss(E_pred, E_target)

    return l


# Perform one training step
@jax.jit
def train_step(step_i, opt_state, loss_data):

    net_params = get_params(opt_state)
    loss, grads = jax.value_and_grad(cost, argnums=0)(net_params, loss_data)

    return loss, opt_update(step_i, grads, opt_state)


# Return prediction and loss at inference times, e.g. for testing
@jax.jit
def inference(loss_data, opt_state):

    data, E_target, F_target = loss_data
    net_params = get_params(opt_state)

    E_pred = vec_vqlm(data, net_params)
    l = mse_loss(E_pred, E_target)

    return E_pred, l

**Parameter initialization:**

We initiliase the model at the identity by setting the initial
parameters to 0, except the first one which is chosen uniformly. This
ensures that the circuit is shallow at the beginning and has less chance
of suffering from the barren plateau phenomenon. Moreover, we disable
the symmetry-breaking strategy, as it is mainly useful for larger
systems.


In [None]:
np.random.seed(42)
weights = np.zeros((num_qubits, D, B))
weights[0] = np.random.uniform(0, np.pi, 1)
weights = jnp.array(weights)

# Encoding weights
alphas = jnp.array(np.ones((num_qubits, D + 1)))

# Symmetry-breaking (SB)
np.random.seed(42)
epsilon = jnp.array(np.random.normal(0, 0.001, size=(D, num_qubits)))
epsilon = None  # We disable SB for this specific example
epsilon = jax.lax.stop_gradient(epsilon)  # comment if we wish to train the SB weights as well.


opt_init, opt_update, get_params = optimizers.adam(1e-2)
net_params = {"params": {"weights": weights, "alphas": alphas, "epsilon": epsilon}}
opt_state = opt_init(net_params)
running_loss = []

We train our VQLM using stochastic gradient descent.


In [None]:
num_batches = 5000  # number of optimization steps
batch_size = 256  # number of training data per batch


for ibatch in range(num_batches):
    # select a batch of training points
    batch = np.random.choice(np.arange(np.shape(data_train)[0]), batch_size, replace=False)

    # preparing the data
    loss_data = data_train[batch, ...], E_train[batch, ...], F_train[batch, ...]
    loss_data_test = data_test, E_test, F_test

    # perform one training step
    loss, opt_state = train_step(num_batches, opt_state, loss_data)

    # computing the test loss and energy predictions
    E_pred, test_loss = inference(loss_data_test, opt_state)
    running_loss.append([float(loss), float(test_loss)])

Let us inspect the results. The following figure displays the training
(in red) and testing (in blue) loss during the optimization. We observe
that they are on top of each other, meaning that the model is training
and generalising properly to the unseen test set.


In [None]:
history_loss = np.array(running_loss)

fontsize = 12
plt.figure(figsize=(4, 4))
plt.plot(history_loss[:, 0], "r-", label="training error")
plt.plot(history_loss[:, 1], "b-", label="testing error")

plt.yscale("log")
plt.xlabel("Optimization Steps", fontsize=fontsize)
plt.ylabel("Mean Squared Error", fontsize=fontsize)
plt.legend(fontsize=fontsize)
plt.tight_layout()
plt.show()

## Energy predictions


We first inspect the quality of the energy predictions. The exact test
energy points are shown in black, while the predictions are in red. On
the left, we see the exact data against the predicted ones (so the red
points should be in the diagonal line), while the right plots show the
energy as a scatter plot. The model is able to make fair predictions,
especially near the equilibrium position. However, a few points in the
higher energy range could be improved, e.g. by using a deeper model as
in the original paper.


In [None]:
plt.figure(figsize=(4, 4))
plt.title("Energy predictions", fontsize=fontsize)
plt.plot(energy[indices_test], E_pred, "ro", label="Test predictions")
plt.plot(energy[indices_test], energy[indices_test], "k.-", lw=1, label="Exact")
plt.xlabel("Exact energy", fontsize=fontsize)
plt.ylabel("Predicted energy", fontsize=fontsize)
plt.legend(fontsize=fontsize)
plt.tight_layout()
plt.show()

## Force predictions

As stated at the beginning, we are interested in obtaining the forces to
drive MD simulations. Since we have access to the potential energy
surface, the forces are directly available by taking the gradient

$$F_{i,j} = -\nabla_{\mathcal{X}_{ij}} E(\mathcal{X}, \Theta),$$

where $\mathcal{X}_{ij}$ contains the $j$ coordinate of the $i$-th atom,
and $\Theta$ are the trainable parameters. In our framework, we can
simply do the following. We note that we do not require the mixed terms
of the Jacobian, which is why we select the diagonal part using
`numpy.einsum`.


In [None]:
opt_params = get_params(opt_state)  # Obtain the optimal parameters
gradient_coordinates = jax.jacobian(
    vec_vqlm, argnums=0
)  # Compute the gradient with respect to the Cartesian coordinates

pred_forces = gradient_coordinates(jnp.array(positions.real), opt_params)
pred_forces = -np.einsum(
    "iijk->ijk", np.array(pred_forces)
)  # We are only interested in the diagonal part of the Jacobian

fig, axs = plt.subplots(2, 3)

fig.suptitle("Force predictions", fontsize=fontsize)
for k in range(2):
    for l in range(3):

        axs[k, l].plot(forces[indices_test, k, l], forces[indices_test, k, l], "k.-", lw=1)
        axs[k, l].plot(forces[indices_test, k, l], pred_forces[indices_test, k, l], "r.")

axs[0, 0].set_ylabel("Hydrogen 1")
axs[1, 0].set_ylabel("Hydrogen 2")
for _, a in enumerate(["x", "y", "z"]):
    axs[1, _].set_xlabel("{}-axis".format(a))

plt.tight_layout()
plt.show()

In this series of plots, we can see the predicted forces on the two
Hydrogen atoms in the three $x,$ $y$ and $z$ directions. Again, the
model does a fairly good job. The few points which are not on the
diagonal can be improved using some tricks, such as incorporating the
forces in the loss function.


## Conclusions

In this demo, we saw how to implement a symmetry-invariant VQLM to learn
the energy and forces of small chemical systems and trained it for the
specific example of water. The strong points with respect to
symmetry-agnostic techniques are better generalization, more accurate
force predictions, resilience to small data corruption, and reduction in
classical pre- and postprocessing, as supported by the original paper.

Further work could be devoted to studying larger systems by adopting a
more systematic fragmentation as discussed in the original paper. As an
alternative to building symmetry-invariant quantum architectures, the
symmetries could instead be incorporated into the training routine, such
as recently proposed by. Finally, symmetry-aware models could be used to
design quantum symmetry functions, which in turn could serve as
symmetry-invariant descriptors of the chemical systems within classical
deep learning architectures, which can be easily operated and trained at
scale.


## References

1. Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar Veličković (2021). Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv:2104.13478
2. Quynh T. Nguyen, Louis Schatzki, Paolo Braccia, Michael Ragone, Patrick J. Coles, Frédéric Sauvage, Martín Larocca, and M. Cerezo (2022). Theory for Equivariant Quantum Neural Networks. arXiv:2210.08566
3. Andrea Skolik, Michele Cattelan, Sheir Yarkoni,Thomas Baeck and Vedran Dunjko (2022). Equivariant quantum circuits for learning on weighted graphs. arXiv:2205.06109
4. Quynh T. Nguyen, Louis Schatzki, Paolo Braccia, Michael Ragone, Patrick J. Coles, Frédéric Sauvage, Martín Larocca and Marco Cerezo (2022). Theory for Equivariant Quantum Neural Networks. arXiv:2210.08566
5. Johannes Jakob Meyer, Marian Mularski, Elies Gil-Fuster, Antonio Anna Mele, Francesco Arzani, Alissa Wilms, Jens Eisert (2022). Exploiting symmetry in variational quantum machine learning. arXiv:2205.06217
6. Andrea Skolik, Michele Cattelan, Sheir Yarkoni,Thomas Baeck and Vedran Dunjko (2022). Equivariant quantum circuits for learning on    weighted graphs.[arXiv:2205.06109](https://arxiv.org/abs/2205.06109)
7.  Quynh T. Nguyen, Louis Schatzki, Paolo Braccia, Michael Ragone, Patrick J. Coles, Frédéric Sauvage, Martín Larocca and Marco Cerezo     (2022). Theory for Equivariant Quantum Neural Networks. [arXiv:2210.08566](https://arxiv.org/abs/2210.08566)
8. Isabel Nha Minh Le, Oriel Kiss, Julian Schuhmacher, Ivano Tavernelli, Francesco Tacchino, “Symmetry-invariant quantum machine learning force fields”, arXiv:2311.11362, 2023.
9. Oriel Kiss, Francesco Tacchino, Sofia Vallecorsa, Ivano Tavernelli, “Quantum neural networks force fields generation”, Mach.Learn.: Sci. Technol. 3 035004, 2022.
10. Johannes Jakob Meyer, Marian Mularski, Elies Gil-Fuster, Antonio Anna Mele, Francesco Arzani, Alissa Wilms, Jens Eisert, “Exploiting Symmetry in Variational Quantum Machine Learning”, PRX Quantum 4,010328, 2023.
11. David Wierichs, Richard D. P. East, Martín Larocca, M. Cerezo, Nathan Killoran, “Symmetric derivatives of parametrized quantum circuits”, arXiv:2312.06752, 2023.


