# T Complexity Comparison Gates

<p style="text-align: center;"><a href="mailto:noureldinyosri@gmail.com">Noureldin Yosri</a></p>
<p style="text-align: center;">May 2023</p>

## Abstract
Quantum comparison gates can be split into two categories: quantum-classical when comparing a quantum number to a classical number and quantum-quantum when comparing two quantum numbers.

Comparison gates are important building block for quantum algorithms, yet they are always relegated to the appendicies usually with figures showing how they work on a small number of qubits. This leads to cases where researchers potentially waste time reinvents on already existing methods.

The intention of this notebook is to serve as documentary of the current state of the art for comparison gates while expalining how they work both algorithmically and programmatically.

**note:** Most of the ideas explained here are built into [cirq-ft](https://github.com/quantumlib/Cirq/tree/master/cirq-ft). [Cirq](https://github.com/quantumlib/Cirq/)'s fault tolerant sublibrary.

## Introduction
The current optimal implementation of a reversible oracle for comparing two quantum numbers of $n$ qubits each is $8n + \mathcal{O}(1)$ and is given in the supplementary materials of [Berry et al., 2018](https://doi.org/10.1038/s41534-018-0071-5). Their implementation uses divide and conquer technique to create a binary tree of depth $\log_2{(n)}$ whose leafs are the qubits of the numbers with intermediate values stored in $\mathcal{O}(n)$ ancillas. Each non leaf node uses two CSWAP operations and the value of the comparison is computed from the root node using one Toffoli. 

When this decomposition is applied to the case where one input is a list of qubits and the the other is a classical number (i.e. a string of bits) this decomposition gives a $6n + \mathcal{O}(1)$ T complexity since at the level right above the leafs half of the CSWAP operations collapse to either identity or SWAP, thus the number of CSWAPs becomes $\approx \frac{n}{2} + 2 \times \frac{n}{2} = \frac{3n}{2}$, leadining to $4 \times \frac{3n}{2} = 6n$ T operations.

However this is not the optimal way -in terms of T count- to compare a quantum number to a classical number which is $4n + \mathcal{O}(1)$ T operations. The way to do can be found in appendix H of [Berry et al., 2019](
https://doi.org/10.48550/arXiv.1902.02134) and is based on reducing the problem to subtraction. The reduction in T count comes at the cost of linear depth as opposed to the logarithmic depth from the $6n$ decomposition.

In what follows I explain the T optimal ways for comparison in different cases.

## Equality as a special case
Before we proceed to the comparison oracle we take a look at the equality oracle actually as a special case with a T complexity of only $4n + \mathcal{O}(1)$, as it can be implemented as a qubit-wise And operation. This qubit wise And operation itself can be done using only $4n + \mathcal{O}(1)$ as per [Babbush et al., 2018](https://doi.org/10.1103%2Fphysrevx.8.041015) and [Craig Gidney](https://algassert.com/post/1903).

### Quantum-Classical Case
In the quantum classical case all we need to do is to compute qubit/bit wise equality
$$(q_0 = b_0) \wedge \cdots \wedge (q_{n-1} = b_{n-1})$$

In `cirq-ft` this is can be accomplished using the `And` gate as `And(cv=bits)` where `bits` are the bits of the classical number.

### Quantum-Quantum Case
In the quantum-quantum case the equality is still bitwise $$(q_0 = p_0) \wedge \cdots \wedge (q_{n-1} = p_{n-1})$$

This is the same as the and $$(q_0 \oplus p_0 = 0) \wedge \cdots \wedge (q_{n-1}\oplus p_{n-1} = 0)$$

Where $\oplus$ is the binary xor operation. This allows us to use `cirq_ft`'s `And` on the result of the xor.

In [None]:
from typing import Sequence

try:
    import cirq
    import cirq_ft
except:
    !pip install --pre cirq
    !pip install --pre cirq-ft
    import cirq
    import cirq_ft

In [None]:
def equality_oracle_quantum_classical(B: int, A: Sequence[cirq.Qid], z: cirq.Qid) -> cirq.OP_TREE:
    # Returns a decomposition of the oracle O_B |A>z> = |A>|z^(A == B)> in only 4n T operations.
    bits = list(reversed([(B >> i) & 1 for i in range(len(A))]))

    ancilla = cirq.NamedQubit.range(len(bits) - 1, prefix='ancilla')
    yield cirq_ft.And(cv=bits).on(
        *A, *ancilla
    )  # `ancilla[-1]` now has the result of equality. uses 4n T operations.

    yield cirq.CNOT(ancilla[-1], z)  # update result qubit.

    yield cirq_ft.And(cv=bits, adjoint=True).on(
        *A, *ancilla
    )  # Restore the qubits to their original states.

As an example we construct the equality gate for checking if a 3 registers are equal to 5. First we print the decomposition of the gate followed by the result of running the gate on each of 8 possiblities individually and finally the result of running the gate on the uniform superposition of all 8 possibilities.

In [None]:
classical_number = 5  # Classical Number to compare with.
quantum_number = cirq.NamedQubit.range(3, prefix='qn')  # The qubit that will hold quantum numbers.
z = cirq.NamedQubit('z')  # The qubit that will hold comparison result.
equality_circuit = cirq.Circuit(equality_oracle_quantum_classical(classical_number, quantum_number, z))
equality_circuit

In [None]:
def format_dirac(s: str, n: int, quantum_classical: bool = False) -> str:
    """Reformats a dirac vector on as |input qubits|ancilla qubits|result qubit>"""
    if quantum_classical:
        return s[: n + 1] + '|' + s[n + 1 : -2] + '|' + s[-2:]
    return s[: n + 1] + '|' + s[n + 1 : 2*n+1] + '|' + s[2*n+1:-2] + '|' + s[-2:]

def check_each_possibility(c, quantum_classical: bool = True):
    sim = cirq.Simulator()
    data_qubits = [q for q in c.all_qubits() if 'ancilla' not in q.name and q.name != 'z']
    data_qubits.sort()
    if quantum_classical:
        n_qubits = len(quantum_number)
    else:
        n_qubits = len(data_qubits)//2
    qubit_order = list(data_qubits)
    qubit_order += [q for q in c.all_qubits() if q not in qubit_order + [z]]
    qubit_order += [z]
    for v in range(1 << len(data_qubits)):
        bits = [(v >> i) & 1 for i in range(len(data_qubits) - 1, -1, -1)]
        bits += (len(qubit_order) - len(data_qubits)) * [0]
        result = sim.simulate(c, qubit_order=qubit_order, initial_state=bits)
        print(
            f'final state vector of {v} compared to {classical_number}',
            format_dirac(result.dirac_notation(), n_qubits, quantum_classical),
        )


check_each_possibility(equality_circuit)

As this is a quantum circuit, it's important to check that it works with superpositions as well. This is why we will check the uniform superposition.

In [None]:
def check_uniform_superposition(c, quantum_classical: bool = True):
    data_qubits = [q for q in c.all_qubits() if 'ancilla' not in q.name and q.name != 'z']
    data_qubits.sort()
    if quantum_classical:
        n_qubits = len(quantum_number)
    else:
        n_qubits = len(data_qubits)//2
    c = cirq.Circuit(cirq.H.on_each(data_qubits) + [c])
    sim = cirq.Simulator()
    qubit_order = list(data_qubits)
    qubit_order += [q for q in c.all_qubits() if q not in qubit_order + [z]]
    qubit_order += [z]
    result = sim.simulate(c, qubit_order=qubit_order)
    result = result.dirac_notation()
    final = []
    for s in result.split('|'):
        if '⟩' not in s:
            final.append(s)
            continue
        parts = s.split('⟩')
        parts[0] = format_dirac('|' + parts[0] + '⟩', n_qubits, quantum_classical)
        final.append(''.join(parts))
    print('Acting on the uniform superposition of all states we get:')
    print('\t', ''.join(final))


check_uniform_superposition(equality_circuit)

And for the quantum-quantum case we have.

In [None]:
def equality_oracle_quantum_quantum(A: Sequence[cirq.Qid], B: Sequence[cirq.Qid], z: cirq.Qid) -> cirq.OP_TREE:
    # Returns a decomposition of the oracle O |A>|B>|z> = |A>|B>|z^(A == B)> in only 4n T operations.

    ancilla = cirq.NamedQubit.range(len(A) - 1, prefix='ancilla')
    yield cirq.CNOT.on_each(zip(A, B))  # Store the bitwise xor in B.
    yield cirq_ft.And(cv=(0,)*len(B)).on(
        *B, *ancilla
    )  # `ancilla[-1]` now has the result of equality. uses 4n T operations.

    yield cirq.CNOT(ancilla[-1], z)  # update result qubit.

    yield cirq_ft.And(cv=(0,)*len(B), adjoint=True).on(
        *B, *ancilla
    )  # Reverse the And operation.

    yield cirq.CNOT.on_each(zip(A, B))  # Restore the qubits to their original states.


As we did before we construct the gate to compare two 2-qubit numbers. First we print the decomposition of the gate followed by the result of running the gate on each of 16 possiblities individually and finally the result of running the gate on the uniform superposition of all 16 possibilities.

In [None]:
first_quantum_number = cirq.NamedQubit.range(2, prefix='P')  # The qubit that will hold quantum numbers.
second_quantum_number = cirq.NamedQubit.range(2, prefix='Q')  # The qubit that will hold quantum numbers.
quantum_quantum_equality = cirq.Circuit(equality_oracle_quantum_quantum(first_quantum_number, second_quantum_number, z))
quantum_quantum_equality

In [None]:
# Now we check individual possiblities.
check_each_possibility(quantum_quantum_equality, quantum_classical=False)

In [None]:
# Finally we check the uniform super position.
check_uniform_superposition(quantum_quantum_equality, quantum_classical=False)

**Notice that the ancilla qubit are always clean at the end of execution and that the input qubits are not affected**

## The quantum-classical comparator with $4n + \mathcal{O}(1)$ T gates

### Inspiration
We will only consider the comparison oracles for less than since the $\leq$ oracle has one extra clifford operation (a CNOT) and the greater than oracle has exactly 2 extra clifford operations, the same CNOT and an X.

While [Berry et al., 2019](https://doi.org/10.48550/arXiv.1902.02134) reduced the problem to subtraction, we take a different path and noting that comparing two numbers of equal size is the same finding which of them is lexicographically smaller, this problem is usually solved sequentially and is essentially a finite state machine of $n + 3$ states each having two transitions. The result is an almost identical decomposition (up to clifford operations) with the same T complexity.

More concretely, consider how C/C++ [std::strcmp](https://en.cppreference.com/w/cpp/string/byte/strcmp) compares two sequences $A$ and $B$ of equal length $n$. The function scans the sequences from left to right until it the first index $i^*$ where they differ and returns $A_{i^*} < B_{i^*}$.

Implicitly this algorithm has $n + 3$ states $\{e_0, \ldots, e_n\} \cup \{L, R\}$ where being in the $e_k$ state means the prefixes of length $k$ are equal. with transitions being the states governed by:

\begin{equation}
\begin{split}
    e_k \rightarrow e_{k+1} \textit{ if } u_k = v_k \\
    e_k \rightarrow L \textit{ if } u_k < v_k \\
    e_k \rightarrow R \textit{ if } u_k > v_k \\
\end{split}
\end{equation}

When the result of comparison between individual indicies becomes probabilistic these states form a Markov decision process with three terminal states $\{L, e_n, R\}$. This gives us an inspiration for a new implementation.

### Algorithm
We start by allocating $n+1$ qubits representing the $e_0, \ldots, e_n$ states and then scan the qubit register and number from left to right.

if the current bit is zero then we only need to compute the $e_k \rightarrow e_{k+1}$ transition since the qubit can't be less than zero. otherwise we need to compute the transition as well as the $e_k → L$ transition.

In [None]:
def less_than(B: int, A: Sequence[cirq.Qid], z: cirq.Qid) -> cirq.OP_TREE:
    # Returns a decomposition of the oracle O_B |A>z> = |A>|z^(A < B)> in only 4n T operations.
    bits = [(B >> i) & 1 for i in range(len(A) - 1, -1, -1)]

    adjoint = []

    es = cirq.LineQid.range(len(A) + 1, dimension=2)
    ek = es.pop(0)

    # Initially our belief is that the numbers are equal.
    yield cirq.X(ek)
    adjoint.append(cirq.X(ek))

    for q, b, ekp1 in zip(A, bits, es):
        if b:
            yield cirq.X(q)
            adjoint.append(cirq.X(q))

            # Temporarily hold e_k and not q
            yield cirq_ft.And().on(q, ek, ekp1)
            adjoint.append(cirq_ft.And(adjoint=True).on(q, ek, ekp1))

            # e_{k+1} currently has are_equal so far and (q != b)
            # which is equivalent to: Is the current prefix of the qubits < the prefix of B and the previous prefix equal?
            yield cirq.CNOT(ekp1, z)

            yield cirq.CNOT(ek, ekp1)  # Now e_{k+1} has the prefix equality.
            adjoint.append(cirq.CNOT(ek, ekp1))
        else:
            # e_{k+1} = e_k and not q
            yield cirq_ft.And(cv=[1, 0]).on(ek, q, ekp1)
            adjoint.append(cirq_ft.And(cv=[1, 0], adjoint=True).on(ek, q, ekp1))

        ek = ekp1

    yield from reversed(adjoint)

As we did before we construct the less than gate for checking if a 3 registers are less than 5. First we print the decomposition of the gate followed by the result of running the gate on each of 8 possiblities individually and finally the result of running the gate on the uniform superposition of all 8 possibilities.

In [None]:
less_than_circuit = cirq.Circuit(less_than(classical_number, quantum_number, z))
less_than_circuit

In [None]:
# Now we check individual possiblities.
check_each_possibility(less_than_circuit)

In [None]:
# Finally we check the uniform super position.
check_uniform_superposition(less_than_circuit)

**And as before notice that the ancilla qubit are always clean at the end of execution and that the input qubits are not affected**

## Improving the constant

$$\renewcommand{\ket}[1]{|#1\rangle}$$
The implementation above has T complexity of exactly $4n$ since there are exactly $n$ And gates each uses $4$ Ts. Note however that the first of them is not actually needed since one of its inputs is in the $\ket{1}$ state so it collapses to either identity or `cirq.X` depending on the most significant bit of $B$. This gives a T complexity of $4(n-1) = 4n - 4$.

## Drawback
This decomposition has the currently optimal T complexity for the quantum-classical case however it has linear depth. We will see later a way that has logarithmic depth but with $6n + \mathcal{O}(1)$ T complexity.

## The Quantum-Quantum Comparator $8n + \mathcal{O}(1)$ T gates

For the quantum-classical case we scanned the numbers sequentially from left to right to compute the comparison result, a similar strategy can be use to create a decomposition that has $12n - 8$ T complexity and linear depth. We won't go in depth into that way but refer the reader to  my incomplete [PR](https://github.com/quantumlib/Qualtran/pull/205). Instead we discuss the superior decomposition propsed by [Berry et al., 2018](https://doi.org/10.1038/s41534-018-0071-5) which has $8n + \mathcal{O}(1)$ T complexiy and logarithmic depth.


Before we got into the implementation detail of this gate we will discuss two special cases $n=1$ and $n=2$ because as we will se later the general case is built out of these cases.

### $n = 1$ Case

Given 2 qubits, we would like a way to represent the three cases of equality, less than, greater than in a reversible way. This can be done using two extra qubits `less_than` and `greater_than` which will hold the less than and greater than while equality will be held in the second operand (qubit). In other words implementing the operator
$$O\ket{p}\ket{q}\ket{0}\ket{0} \rightarrow \ket{p}\ket{p=q} \ket{p < q} \ket{p > q}$$

The case $x < y$ happens only when $x=0 \wedge y=1$ so it's simply a controlled-controlled-not.

```py3
cirq.X(p)  # Flip p
cirq.CCNOT(p, q, less_than)
cirq.X(p)  # Restore p
```

The case $p = q$ happens only when the xor sum is zero.

```py3
cirq.CNOT(p, q)  # Store xor sum in q.
cirq.X(q)  # Flip q.
```


The case $p > q$ can be done in same manner as the first case, however we don't need to since it's simpyly not equality and not less than. this saves us the cost of one CCNOT gate (i.e. Toffoli gate = 4 T gates) so the operations

```py3
cirq..X(greater_than)  # Initial belief is that p > q
cirq.CNOT(q, geater_than)  # But not if p=q
cirq.CNOT(less_tham, greater_than)  # OR if p < q.
```

Doing these operation in this order correctly updates the qubits. A slightly different circuit is given in paper which we implemented in `cirq_ft` as `cirq_ft.SingleQubitCompare` and has a cost of only one CCNOT (i.e. Toffoli = 4T). `cirq_ft.SingleQubitCompare` is implemented using measurement based uncomputation so that the adjoint of this circuit has T count of zero.

### $n=2$ Case
In this case the two qubit numbers $P$ and $Q$ can be written as
$$
P = 2*p_1 + p_0\\
Q = 2*q_1 + q_0
$$

We will do this in two steps the first is to update the qubits such that $\textit{sign}(P - Q) = \textit{sign}(p_0^f - q_0^f)$ where $p_0^f$ and $q_0^f$ are the final values of $p_0$ and $q_0$ respectively. This way the comparison result between the two numbers can be extracted using `cirq_ft.SingleQubitCompare` applied on `p_0^f` and `q_0^f`.

In  [Berry et al., 2018](https://doi.org/10.1038/s41534-018-0071-5) the circuit that does the first step is called `COMPARE2` (FIG. 1 in the supplementary materials). 

Now notice that for $n > 2$ we can group the qubits in $\frac{n}{2}$ pairs and use our COMPARE2 circuit to compare them, in other words:
$$
   \textit{Compare2}((P_0, P_1), (Q_0, Q_1)), \cdots, \textit{Compare2}((P_{n-2}, P_{n-1}), (Q_{n-2}, Q_{n-1}))
$$