# `QuantumOps` performance intro

This notebook introduces some of `QuantumOps` with an emphasis on performance aspects.

Tell `PyCall` to use our virtual Python environment

In [1]:
ENV["PYCALL_JL_RUNTIME_PYTHON"] = Sys.which("python")

# import all default symbols for interactive demo
using QuantumOps
using QuantumOps: AbstractOp
using BenchmarkTools
import LinearAlgebra
import SparseArrays

# We also import I, X, Y, Z for convenience
using QuantumOps.Paulis: I, X, Y, Z

### Pauli

These, `I, X, Y, Z`, are bound to instances of the `Pauli` type, representing single-qubit operators.

In [2]:
(I, X, Y, Z) == Pauli.((0, 1, 2, 3)) ## The '.' broadcasts over the following elements.

true

Julia has a large number of standard interfaces and functions for numeric and algebraic types. I follow these when possible. For example, `Matrix` is used to construct a dense, heap-allocated matrix from an object. So I defined a method for it.

In [3]:
print(Matrix.(Pauli.(0:3)))

Matrix[[1.0 0.0; 0.0 1.0], [0.0 1.0; 1.0 0.0], ComplexF64[0.0 + 0.0im 0.0 - 1.0im; 0.0 + 1.0im 0.0 + 0.0im], [1.0 0.0; 0.0 -1.0]]

`Pauli` is in this type hierarchy.

In [4]:
Pauli <: AbstractPauli <: AbstractOp

true

Only a very small amount of code depends on the internals of `Pauli`.
Almost everything is coded against `AbstractOp` and `AbstractPauli`. The developer almost never encounters the implementation of `Pauli`, and the user never does.
But, if you want, you can see it this way.

In [5]:
dump(X)

QuantumOps.Paulis.Pauli
  hi: Bool false
  lo: Bool true


The notation `X[i, j]` calls `getindex(X, i, j)` which, for `AbstractPauli`, looks up the elements in stack allocated arrays.
This is faster than indexing into a heap allocated (that is, every-day, dynamic) array:

In [6]:
m = rand(2, 2) # Ordinary `Matrix`
@btime $m[1, 1];  # @btime is like %timeit. The "$" tests how it would perform in a compiled function

  1.816 ns (0 allocations: 0 bytes)


In [7]:
@btime X[1, 1];  ## This includes time to look up what matrix corresponds to `X`

  1.397 ns (0 allocations: 0 bytes)


Some linear algebra has been implemented.

In Julia, multiplying two small matrices is faster than multiplying two numpy matrices is Python. The Python call includes an overhead.
Multiplying `QuantumOps.Paulis.Pauli` by a `Matrix` is even a bit faster because looking up elements of a `Pauli` is faster.

In [8]:
mY = Matrix(Y) # convert Y to a `Matrix`

@btime Y * $m

  31.548 ns (1 allocation: 128 bytes)


2×2 Matrix{ComplexF64}:
 0.0-0.393175im  0.0-0.278712im
 0.0+0.944613im  0.0+0.0856094im

In [9]:
@btime $mY * $m

  167.643 ns (1 allocation: 128 bytes)


2×2 Matrix{ComplexF64}:
 0.0-0.393175im  0.0-0.278712im
 0.0+0.944613im  0.0+0.0856094im

Another example:

In [10]:
@btime LinearAlgebra.eigvals(Z)

  19.825 ns (1 allocation: 80 bytes)


2-element Vector{Float64}:
 -1.0
  1.0

`20`ns is the time required to copy the array of eigenvalues.

### `PauliTerm`

`PauliTerm` represents a tensor product of Pauli operators (or a single one) and keeps track of a coefficient, including a phase.

### Compare `PauliTerm` with qiskit

In [11]:
using PyCall
qi = pyimport("qiskit.quantum_info");

Here, we compare multiplication of two Pauli strings with both libraries. We see how the time scales with string length.

In [12]:
function get_julia_python_terms(n_qubits)
    xj = PauliTerm(rand(Pauli, n_qubits))
    yj = PauliTerm(rand(Pauli, n_qubits))
    xp = qi.random_pauli(n_qubits)
    yp = qi.random_pauli(n_qubits)
    return (xj, yj, xp, yp)
end

n_qubits = 10
(xj, yj, xp, yp) = get_julia_python_terms(n_qubits)

# `QuantumOps`
@btime $xj * $yj

  45.550 ns (1 allocation: 80 bytes)


10-factor QuantumOps.PauliTerm{Vector{QuantumOps.Paulis.Pauli}, Complex{Int64}}:
YYXIIIYZIX * (0 - 1im)

In [13]:
# qiskit
@btime $xp.compose($yp)   ## @btime gives same times as %timeit in python cli

  17.112 μs (4 allocations: 256 bytes)


PyObject Pauli('IIYYIYZZXI')

#### For mulitplying 10-qubit Pauli strings, `QuantumOps` is about 300 times faster than qiskit.

In [14]:
julia_time = @belapsed $xj * $xj
qiskit_time = @belapsed $xp.compose($yp)

qiskit_time / julia_time

329.11914408213033

Asymptotically, qiskit is about three times faster than `QuantumOps`. But, it takes a while to get there. For 1000-qubit strings `QuantumOps` is still eight times faster. I have some ideas regarding why python is faster than Julia here, but I am not at all sure. Also, there is a big, ~12 micro-s constant term in the python times. It might be worth trying to reduce this.

In [15]:
n_qubits = 1000
(xj, yj, xp, yp) = get_julia_python_terms(n_qubits)

julia_time = @belapsed $xj * $xj
qiskit_time = @belapsed $xp.compose($yp)

qiskit_time / julia_time

9.941453940066593

Here are $10^4$ qubits. Julia is still faster, but they are comparable.

In [16]:
n_qubits = 10^4
(xj, yj, xp, yp) = get_julia_python_terms(n_qubits)

julia_time = @belapsed $xj * $xj
qiskit_time = @belapsed $xp.compose($yp)

qiskit_time / julia_time

1.8762629890341425

### `PauliSum`

A `PauliSum` represents a sum of `PauliTerm`s, sorted in a canonical order.

In [17]:
n_qubits = 10
n_terms = 10
ps = PauliSum(rand(Pauli, (n_terms, n_qubits)), randn(n_terms))

10x10 QuantumOps.PauliSum{Vector{Vector{QuantumOps.Paulis.Pauli}}, Vector{Float64}}:
IZZZZZXZZX * 0.8611519091242552
XIZZYYYZZZ * -0.21067779500713502
XXZIIXIZXI * 0.3131118964796491
XYZYIIYIYZ * -0.1137000255499288
XYZZZIIZXZ * 0.6876866226528037
YIIXIXXXZZ * -0.5254748605290915
YYIYIIIIZX * 0.30068065416003104
YYZXIYXIII * -2.2059166978944567
YYZXZIIXXY * -0.5575695590816557
ZIIYYXIYZZ * -2.243952183412215

In [18]:
x = ps[5]
(x, typeof(x))

(10-factor QuantumOps.PauliTerm{Vector{QuantumOps.Paulis.Pauli}, Float64}:
XYZZZIIZXZ * 0.6876866226528037, QuantumOps.PauliTerm{Vector{QuantumOps.Paulis.Pauli}, Float64})

`add!` adds a `PauliTerm` in place. It does a sorted search to find the correct location.

In [19]:
add!(ps, -x)  ## add the additive inverse of a term

9x10 QuantumOps.PauliSum{Vector{Vector{QuantumOps.Paulis.Pauli}}, Vector{Float64}}:
IZZZZZXZZX * 0.8611519091242552
XIZZYYYZZZ * -0.21067779500713502
XXZIIXIZXI * 0.3131118964796491
XYZYIIYIYZ * -0.1137000255499288
YIIXIXXXZZ * -0.5254748605290915
YYIYIIIIZX * 0.30068065416003104
YYZXIYXIII * -2.2059166978944567
YYZXZIIXXY * -0.5575695590816557
ZIIYYXIYZZ * -2.243952183412215

The length of the sum is now 9 rather than 10.

In [20]:
length(ps)

9

In [21]:
n_qubits = 10
n_terms = 10
ps = PauliSum(rand(Pauli, (n_terms, n_qubits)), randn(n_terms));
size(ps)

(10, 10)

In [22]:
x = copy(ps[1])

10-factor QuantumOps.PauliTerm{Vector{QuantumOps.Paulis.Pauli}, Float64}:
IXIXXZXYYX * 1.0799836853519675

In [23]:
@btime add!($ps, $x);

  62.788 ns (0 allocations: 0 bytes)


That seems a bit slow.

#### Pauli decomposition

Construct the Pauli decomposition of a matrix.

In [24]:
m = rand(4, 4)
s = PauliSum(m)

16x2 QuantumOps.PauliSum{Vector{Vector{QuantumOps.Paulis.Pauli}}, Vector{ComplexF64}}:
II * (0.47535035018999316 + 0.0im)
IX * (0.3262739924485931 + 0.0im)
IY * (0.0 + 0.18552696151796855im)
IZ * (-0.07362927391444071 + 0.0im)
XI * (0.13546531376952592 + 0.0im)
XX * (0.380063991813433 + 0.0im)
XY * (0.0 + 0.009783092127528453im)
XZ * (-0.043113348704585164 + 0.0im)
YI * (0.0 + 0.11088439771867864im)
YX * (0.0 - 0.0008181280004781188im)
YY * (0.2836762402956309 - 0.0im)
YZ * (0.0 - 0.027218075093257077im)
ZI * (0.10605558526440695 + 0.0im)
ZX * (0.2063647614493851 + 0.0im)
ZY * (0.0 + 0.08736974609672504im)
ZZ * (0.13685446784451108 + 0.0im)

Check that the decomposition is correct.

In [25]:
m ≈ Matrix(s)

true

Doing this decomposition is exponentially expensive. Here we compare the performance of Qiskit QI vs. QuantumOps.

In [26]:
n = 7
m = rand(2^n, 2^n);
julia_time = @belapsed PauliSum($m)

0.082784806

In [27]:
qi_op = qi.Operator(m)
qi_time = @elapsed qi.SparsePauliOp.from_operator(qi_op)

8.499687047

Ratio of times to do Pauli decomposition for random 7-qubit matrix

In [28]:
qi_time / julia_time

102.67206577738432

### Parametric types and composability

#### Z4Group

I implemented a type `Z4Group` that represents `(i, -1, -i, 1)`. This can be used to represent the Pauli group. The type `Z4Group` becomes part of the type of the term, which aids the compiler in devirtualizing and inlining.

In [29]:
t = PauliTerm(:XXY, Z4Group(im))
(t, typeof(t))

(+i XXY, QuantumOps.PauliTerm{Vector{QuantumOps.Paulis.Pauli}, QuantumOps.Z4Groups.Z4Group})

In [30]:
v = PauliTerm(:ZXZ, Z4Group(1))

+1 ZXZ

In [31]:
t * v

+i YIX

#### Z4Group0

More interesting is `Z4Group0` which is `Z4Group` augmented by another `Bool` representing zero. This can represent `(0, im, -im, 1, -1)`. It supports multiplication of elements, but is only closed under addition where at least one operand is `0`. It will error if you don't respect this. This quasi-algebra is enough to represent and compute kronecker products of Pauli matrices. The structure is this

In [32]:
dump(Z4Group0(1))

QuantumOps.Z4Group0s.Z4Group0
  z4: QuantumOps.Z4Groups.Z4Group
    imag: Bool false
    minus: Bool false
  zero: Bool true


Note that this is a nested composite type. Nontheless an array of these is packed, with each element taking three bytes. Here is a packed two-dimensional array of `Z4Group0`.

In [33]:
a = rand(Z4Group0, (3,5))
a

3×5 Matrix{QuantumOps.Z4Group0s.Z4Group0}:
 0   -i  +1  -i  -1
 -1  -i  -1  -i  0
 +i  0   -1  +i  +1

In [34]:
sizeof(a)  ## (3 x 5) x 3 bytes

45

We see that computation with `Z4Group0` can be as fast as or faster than `Complex{Int}`.

In [35]:
a = rand(Z4Group0, 10^5);
@btime reduce(*, a)

  63.837 μs (1 allocation: 16 bytes)


0

In [36]:
anum = [convert(Complex{Int}, x) for x in a];
@btime reduce(*, anum)

  132.493 μs (1 allocation: 32 bytes)


0 + 0im

I use `Z4Group0` in Kronecker products.

In [37]:
kron([Z4Group0.(m) for m in Matrix.([X, Y, Z])]...)

8×8 Matrix{QuantumOps.Z4Group0s.Z4Group0}:
 0   0   0   0   0   0   -i  0
 0   0   0   0   0   0   0   +i
 0   0   0   0   +i  0   0   0
 0   0   0   0   0   -i  0   0
 0   0   -i  0   0   0   0   0
 0   0   0   +i  0   0   0   0
 +i  0   0   0   0   0   0   0
 0   -i  0   0   0   0   0   0

In [38]:
operators = rand(Pauli, 4)
print(operators)

XYYI

In [39]:
mats = Matrix.(operators)
z40mats = [Z4Group0.(m) for m in mats];

size(kron(mats...))

(16, 16)

In [40]:
@btime kron($mats...);

  743.509 ns (3 allocations: 5.52 KiB)


In [41]:
@btime kron($z40mats...);

  666.608 ns (3 allocations: 1.12 KiB)


Here, the time to do the calcuations with usual 16-byte complex numbers is the same. But, when converting a `PauliSum` to a matrix I use `ThreadsX.sum` over the terms, which is a dropin replacement for `sum` that does intelligent threading. When I use `Z4Group0` I get a significant improvement in performance, perhaps because of fewer cache misses.

### sympy

In [42]:
@pyimport sympy
(x, t) = sympy.symbols("x t")

(PyObject x, PyObject t)

We use a symbolic coefficient

In [43]:
term = PauliTerm("XXYZ", x + t)

4-factor QuantumOps.PauliTerm{Vector{QuantumOps.Paulis.Pauli}, PyCall.PyObject}:
XXYZ * (PyObject t + x)

In [44]:
term^3

4-factor QuantumOps.PauliTerm{Vector{QuantumOps.Paulis.Pauli}, PyCall.PyObject}:
XXYZ * (PyObject (t + x)**3)

The type of coefficient is encoded in the type of the `PauliTerm`.

In [45]:
typeof(term)

PauliTerm{Vector{Pauli}, PyObject} (alias for QuantumOps.OpTerm{QuantumOps.Paulis.Pauli, Array{QuantumOps.Paulis.Pauli, 1}, PyCall.PyObject})

#### `Symbolics`

This is another symbolic libarary.

The following is disabled because of errors due to changes in packages.

using Symbolics
#----------------------------------------------------------------------------

@variables a b c;
#----------------------------------------------------------------------------

# Create a `PauliSum` with symbolic coefficients

term1 = PauliTerm("XZ", a + b)
term2 = PauliTerm("ZX", b + c)

psum = term1^3 + term2
#----------------------------------------------------------------------------

# We convert the `PauliSum` with symbolic coefficients to a `Matrix`.
# No additional code is necessary to support this feature.

symmat = Matrix(psum)
#----------------------------------------------------------------------------

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*