# Screening Task for QOSF Mentorship Program

Dmytro Fedoriaka, October 2024.

*This is my solution to a screening task of cohort 10 of the [Quantum Open Source Foundation](https://qosf.org/) Mentoship Program.*

## Problem statement

Implement a quantum circuit on 5 qubits that represents the state vector

$$| \psi \rangle  = 
\frac{1}{2}(| 22 \rangle + | 17 \rangle  + | 27 \rangle  + | 12 \rangle )=
\frac{1}{2}(| 10110 \rangle + | 10001 \rangle  + | 11011 \rangle  + | 01100 \rangle )$$

using basis_gates [X,H,Rz,CX] and architecture where these qubits are pairwise connected: `[(0,1),(0,4),(1,4),(4,2),(4,3),(2,3)]`.

Below is the code describing the state to implement:

In [1]:
import numpy as np
size = 5
state_values = [22,17,27,12]
state_vector = [0]*2**size
for s in state_values:
  print(np.binary_repr(s,size))
  state_vector[s] = 0.5
np.asarray(state_vector)

10110
10001
11011
01100


array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.5,
       0. , 0. , 0. , 0. , 0.5, 0. , 0. , 0. , 0. , 0.5, 0. , 0. , 0. ,
       0. , 0.5, 0. , 0. , 0. , 0. ])

## Solution method

The given state has a special form $\frac{1}{2} \sum_{i=0}^{3} |s_i \rangle$. Let's call such state a "4-balanced" state. This solution will find circuits only for 4-balanced states. We assume that initial state is $|00000 \rangle$ and use Little-endian notation.

With two qubits we could get state $\frac{1}{2} (|00 \rangle+|01 \rangle+|10 \rangle+|11 \rangle) = |++ \rangle$ by acting with Hadamard gate on each qubit. For larger number of qubits we can try to act with 2 Hadamard gates on some 2 qubits, getting 4-balanced state, and then apply a permutation operator (i.e. an operator whose unitary matrix is a permutation matrix):

$$ U_{CIRCUIT} = H_{i_1} \otimes H_{i_2} \otimes U_{PERM}$$

Consider set of all 4-balanced states. For 5 qubits, there are only $C_{32}^{4}=35960$ of these states. Any permutation operator maps 4-balanced state into a 4-balanced state. Gates $X$ and $CNOT$ are permutation operators. So, we can consider a graph where vertices are 4-balanced states and there is an edge between two states if one can be mapped to other by X or CNOT (taking into account only those CNOTs that are allowed by the given architecture). Then we can try to find a shortest path from $| \psi \rangle$ to any of vertexes
$H_{i_1} \otimes H_{i_2} |00000 \rangle$. 

Unfortunately, for given $| \psi \rangle$ this will not work because all states $H_{i_1} \otimes H_{i_2} |00000 \rangle$ are unreachable from $| \psi \rangle$ in this graph. 

However, we can extend the set of permutation gates with the Toffoli gate (also known as CCNOT gate). This is a permutation gate and we can apply it to any triple of qubits that are fully connected. This gate is not in the set of allowed gates, but there is well-known decomposition of this gate into gates $H, T, T^\dagger $ and $CNOT$ gates (see [Wikipedia](https://en.wikipedia.org/wiki/Toffoli_gate)). And we know that $T = e^{i \pi/8} R_z(\pi / 4)$, $T^\dagger = e^{-i \pi/8} R_z(-\pi / 4)$, so up to global phase we can replace $T$ with   $R_z(\pi / 4)$ and $T^\dagger$ with $R_z(-\pi / 4)$. This way we might get a state that differs from desired state in a global phase, but this will be the same physical state.

Now we need to take into account that Toffoli is much more expensive gate than X and CCNOT. Its circuit made of elementary gates has depth 11. So, now our graph needs to be weighted: X and CCNOT gate have weight 1, and Toffoli gate has weight 11. We can use Dijkstra algorithm to find shortest paths from $| \psi \rangle$ to all reachable states, and then pick one of states $H_{i_1} \otimes H_{i_2} |00000 \rangle$ that has the shortest distance from $|\psi \rangle$.

So, here is the decomposition algorithm:
* Construct permutations for all allowed X, CNOT and Toffoli gates.
* Construct a graph of 4-balanced states.
* Find shortest path from $|\psi \rangle$ to any of $H_{i_1} \otimes H_{i_2} |00000 \rangle$  states.
* Restore gates from the path (in reversed order), and prepend two Hadamard gates corresponding to the final state.
* If the solution contains Toffoli gates, expand them to elementary gates ($H, CNOT, R_z$).

## Solution

This solution is self-contained and doesn't use any quantum computing libraries. For simplicity, the answer is given as list of gate names.

In [2]:
import numpy as np
from itertools import product
import heapq
import time

N = 5 # Number of qubits.

def get_bit(state_number, bit_id):
  # Little endian.
  return (state_number>>(N-1-bit_id)) % 2

def mask_for_bit(bit_id):
  return 1<<(N-1-bit_id)

def is_permutation(p):
  return list(sorted(p)) == list(range(2**N))

def permutation_for_X(target_bit):
  assert 0 <= target_bit < N
  target_bit_mask = mask_for_bit(target_bit)
  perm = np.array([i^target_bit_mask for i in range(2**N)])
  assert is_permutation(perm)
  return perm

def permutation_for_CNOT(ctrl_bit, target_bit):
  assert 0 <= ctrl_bit < N
  assert 0 <= target_bit < N
  assert ctrl_bit != target_bit
  target_bit_mask = mask_for_bit(target_bit)
  perm = np.array([s ^ (target_bit_mask * get_bit(s, ctrl_bit)) for s in range(2**N)])
  assert is_permutation(perm)
  return perm

def permutation_for_Toffoli(ctrl_bit_1, ctrl_bit_2, target_bit):
  target_bit_mask = mask_for_bit(target_bit)
  perm = np.array([s ^ (target_bit_mask * get_bit(s, ctrl_bit_1) * get_bit(s, ctrl_bit_2))
                   for s in range(2**N)])
  assert is_permutation(perm)
  return perm
    
def basis_ids_to_mask(basis_ids):
  return sum(1<<i for i in basis_ids)

def mask_to_basis_ids(mask):
  return [i for i in range(2**N) if (mask>>i)%2==1]  

# Finds a circuit implementing state 0.5(|s0>+|s1>+|s2>+|s3>), where basis_ids_to_implement=[s0,s1,s2,s3].
# This circuit will contain only H (exatly two), X, CNOT and Toffoli gates.
# `coupling` is list of pairs of qubit indexes, denoting coupled qubits.
# Only CNOTs on coupled qubits will be used. 
# Toffoli gate will be used only if all 3 qubits the gate acts on are pairwise coupled.
def find_circuit(basis_ids_to_implement, couplings=None):
  assert len(basis_ids_to_implement) == 4
  for i in basis_ids_to_implement:
     assert 0 <= i < 2**N
  
  couplings = couplings or []
  couplings_matrix = np.zeros((N,N), dtype=bool)
  for i, j in couplings:
    couplings_matrix[i,j]=couplings_matrix[j,i]=True

  # Prepare all permutations for X, CNOT and Toffoli gates.
  gate_names = []
  gate_perms = []
  gate_costs = []
  for i1 in range(5):
    gate_names.append(f"X({i1})")
    gate_perms.append(permutation_for_X(i1))
    gate_costs.append(1)
  for i1, i2 in product(list(range(N)),list(range(N))):
    if couplings_matrix[i1,i2]:
      gate_names.append(f"CNOT({i1},{i2})")
      gate_perms.append(permutation_for_CNOT(i1, i2))
      gate_costs.append(1)
  for i1, i2, i3 in product(list(range(N)),list(range(N)),list(range(N))):
    if couplings_matrix[i1,i2] and couplings_matrix[i2,i3] and couplings_matrix[i1,i3] and i1<i2:
      gate_names.append(f"Toffoli({i1},{i2},{i3})")
      gate_perms.append(permutation_for_Toffoli(i1, i2, i3))
      gate_costs.append(11) # Depth of circuit (of H,CNOT and Rz) implementing Toffoli.

  def generate_transitions(cur_state):
    basis_ids = mask_to_basis_ids(cur_state)
    for gate_id, perm in enumerate(gate_perms):
      next_state = sum(1<<perm[j] for j in basis_ids)
      yield next_state, gate_id  

  # Prepare final states (H_i ⊗ H_j).
  final_states = dict()    
  for i1, i2 in product(list(range(N)),list(range(N))):
    if i1 >= i2:
      continue
    mask = mask_for_bit(i1) | mask_for_bit(i2)
    state = [s for s in range(2**N) if (s | mask) == mask]
    assert len(state) == 4
    final_states[basis_ids_to_mask(state)] = [f"H({i1})",f"H({i2})"]

  # Dijkstra algorithm.
  initial_state = basis_ids_to_mask(basis_ids_to_implement)
  dist = dict()
  prev_state_and_gate = dict() 
  dist[initial_state] = 0
  pq = []
  heapq.heappush(pq, (0,initial_state))
  while len(pq)>0:
    cur_dist = pq[0][0]
    cur_state = pq[0][1]
    heapq.heappop(pq)
    if cur_dist != dist[cur_state]:
      continue
    for next_state, gate_id in generate_transitions(cur_state):
      next_dist = dist[cur_state] + gate_costs[gate_id]
      if next_state not in dist or next_dist < dist[next_state]:
        dist[next_state] = next_dist
        prev_state_and_gate[next_state] = (cur_state, gate_id)
        heapq.heappush(pq, (next_dist, next_state))

  # Restore the shortest path.
  min_dist = None
  best_final_state = None
  for final_state in final_states.keys():
    if final_state in dist and (min_dist is None or dist[final_state] < min_dist):
      min_dist = dist[final_state]
      best_final_state = final_state  
  print("Explored %d states." % (len(dist)))    
  if best_final_state is None:
    print("Circuit not found.")
    return None
  print(f"Found circuit of depth {min_dist+1}.")  # Add 1 to account for step with H gates.
  ans = []
  state = best_final_state
  while state != initial_state:
    state, gate_id = prev_state_and_gate[state]
    ans.append(gate_names[gate_id])
  return final_states[best_final_state] + ans
      
t0 = time.time() 
circuit = find_circuit(state_values, couplings=[(0,1),(0,4),(1,4),(4,2),(4,3),(2,3)])
print("ANSWER:", circuit)
print("Time %.02fs" % (time.time() - t0))

Explored 35960 states.
Found circuit of depth 17.
ANSWER: ['H(1)', 'H(3)', 'CNOT(3,4)', 'CNOT(1,4)', 'Toffoli(1,4,0)', 'X(0)', 'CNOT(4,2)', 'X(4)']
Time 1.07s


## Verification

First, let's show the resulting circuit with Toffoli gate and check that it implements the required state.

Cirq is used only for verification and to display the circuit. Cirq only does very simple optimization of compacting gates, so multiple gates can be done at one step, which reduces circuit depth. Below I am manually constructing the circuit using output of my decomposition algorithm.

In [3]:
import cirq

def verify_circuit(ct):
  print(ct)
  vec = cirq.final_state_vector(ct)
  print("Circuit depth:", len(cirq.Circuit(ct.all_operations())))  
  print("Non-zero coefficients for basis states: ", [i for i in range(2**N) if abs(vec[i])>1e-7])
  assert cirq.equal_up_to_global_phase(state_vector, vec, atol=1e-7)
  print("OK")
  
ct = cirq.Circuit()
q = cirq.LineQubit.range(N) 
ct.append(cirq.H.on(q[1]))
ct.append(cirq.H.on(q[3]))
ct.append(cirq.CNOT.on(q[3], q[4]))
ct.append(cirq.CNOT.on(q[1], q[4]))
ct.append(cirq.TOFFOLI.on(q[1], q[4], q[0]))
ct.append(cirq.X.on(q[0]))
ct.append(cirq.CNOT.on(q[4], q[2]))
ct.append(cirq.X.on(q[4]))
verify_circuit(ct)

0: ───────────────X───X───────
                  │
1: ───H───────@───@───────────
              │   │
2: ───────────┼───┼───X───────
              │   │   │
3: ───H───@───┼───┼───┼───────
          │   │   │   │
4: ───────X───X───@───@───X───
Circuit depth: 6
Non-zero coefficients for basis states:  [12, 17, 22, 27]
OK


Now, let's expand the Toffoli gate using the known decomposition.

In [4]:
def implement_toffoli(q0,q1,q2):
  H = cirq.H
  CX = cirq.CNOT
  T = cirq.Rz(rads=np.pi/4)
  return [
    H.on(q2),CX.on(q1,q2),T.on(q2)**-1,CX.on(q0,q2),T.on(q2),CX.on(q1,q2),
    T.on(q2)**-1,CX.on(q0,q2),T.on(q1),T.on(q2),CX.on(q0,q1),H.on(q2),T.on(q0),T.on(q1)**-1,CX.on(q0,q1)
  ] 

ct = cirq.Circuit()
q = cirq.LineQubit.range(N) 
ct.append(cirq.H.on(q[1]))
ct.append(cirq.H.on(q[3]))
ct.append(cirq.CNOT.on(q[3], q[4]))
ct.append(cirq.CNOT.on(q[1], q[4]))
ct += implement_toffoli(q[1], q[4], q[0])
ct.append(cirq.X.on(q[0]))
ct.append(cirq.CNOT.on(q[4], q[2]))
ct.append(cirq.X.on(q[4]))
verify_circuit(ct)

0: ───H───────────X───Rz(-0.25π)───X───Rz(0.25π)───X───Rz(-0.25π)───X───Rz(0.25π)───H────────────X───────────
                  │                │               │                │
1: ───H───────@───┼────────────────@───────────────┼────────────────@───@───────────Rz(0.25π)────@───────────
              │   │                                │                    │                        │
2: ───────────┼───┼────────────────────────────────┼────────────────────┼────────────────────────┼───X───────
              │   │                                │                    │                        │   │
3: ───H───@───┼───┼────────────────────────────────┼────────────────────┼────────────────────────┼───┼───────
          │   │   │                                │                    │                        │   │
4: ───────X───X───@────────────────────────────────@───Rz(0.25π)────────X───────────Rz(-0.25π)───X───@───X───
Circuit depth: 15
Non-zero coefficients for basis states:  [12, 17, 22, 27]

## Conclusion

* We found a circuit implementing given state of depth 15, having 22 gates (9 CNOT gates, 4 Hadamard gates, 2 X gates and 7 Rz gates).
* The Dijkstra algorithm reported that it explored all 35960 states, so the graph is connected. This means that the proposed algorithm can be used to decompose any 4-balanced state (under given architecture). This is not true if we remove the Toffoli gate.