# Main Idea of Algorithm

$$\newcommand{\ket}[1]{\left|{#1}\right\rangle}
\newcommand{\bra}[1]{\left\langle{#1}\right|}
\newcommand{\braket}[2]{\left\langle{#1}|{#2}\right\rangle}$$
We observe that a board is clearable by 3 beams if and only if does not contain asteroids that form a permutation matrix. Further, it can also be seen that if there are atmost 6 asteroids, atmost 2 permutation matrices are present.

Keeping this in mind, we make an oracle which multiplies each address state with $e^{\frac{2n}3 i\pi} $, where $n$ denotes the number of permutation matrices that are present. We use this as our grover oracle.


This works as follows:
*  Since there is atmost 1 unsolvable board, we know that the address corresponding to the unsolvable boord gets multiplied by either $e^{\frac{2}3 i\pi}$ or $e^{\frac{4}3 i\pi}=e^{-\frac{2}3 i\pi}$. Notice that both of these are conjugates. The main intuition behind this working is that either of the above two are closer to $-1$ than $1$. To concretely show that we do obtain amplitude amplification, observe that

 \begin{align}
 &(1-2\ket{\psi}\bra\psi)\left(\ket{\psi}-\frac1 4 \ket a+e^{\pm\frac{2}3 i\pi}\frac1 4 \ket a\right)\\
 =&\left(\ket{\psi}-\frac1 4 \ket a+e^{\pm\frac{2}3 i\pi}\frac1 4 \ket a\right)-2\ket\psi\bra\psi\left(\ket{\psi}-\frac1 4 \ket a+e^{\pm\frac{2}3 i\pi}\frac1 4 \ket a\right)\\
 =&\left(1- 2\braket{\psi}{\psi}+\frac 1 2 \braket{\psi}{a} -e^{\pm\frac{2}3 i\pi}\frac1 2 \braket{\psi}{a}\right)\ket\psi -\frac1 4 \ket a+e^{\pm\frac{2}3 i\pi}\frac1 4 \ket a\\
 =&\left(\frac {1-e^{\pm\frac{2}3 i\pi}} {8}-1\right)\ket\psi- \frac {1-e^{\pm\frac{2}3 i\pi}}{4}\ket a
 \end{align}
 
 where $\ket\psi$ denotes the standard equal superposition state, and $\ket a$ denotes the basis state correpoding to the address of the unsolvable board. The probability to obtain $\ket a$ after measurement now becomes:
 $$ \left|\left(\frac {1-e^{\pm\frac{2}3 i\pi}} {8}-1\right)\braket a \psi- \frac {1-e^{\pm\frac{2}3 i\pi}}{4}\braket a a\right|^2=
 \left|\left(\frac {1-e^{\pm\frac{2}3 i\pi}} {32}-\frac 1 4\right)- \frac {1-e^{\pm\frac{2}3 i\pi}}{4}\right|^2$$
$$=\frac{\left|15-7e^{\pm\frac{2}3 i\pi}\right|^2}{32^2}\approx 0.37 $$

## Proofs:


1.  We prove the first statement using Hall's Marriage Theorem.

    It can be clearly seen that if there are 4 asteroids in a permutation matrix configuration, then the board is not solvable by just 3 beams.

    Now we prove the other implication. Suppose the board is not solvable by just 3 beams.

    Then, consider consider the set of rows and columns of the boards as vertices, and consider every asteroid as an edge between the row and column it belongs to.

    Clearly, the graph formed is a bipartite graph.

    Now, we prove that in this graph obtained, every set of $k$ columns is directly connected to atleast $k$ rows, for every $k\in\{1,2,3,4\}$. 

    Suppose otherwise. Then, for some board unclearable by just 3 beams, there is some set of $k$ columns that are directly connected to atmost $k-1$ rows. This means, that for those $k$ columns, $k-1$ horizontal beams ($1$ through each of the $k-1$ rows) can destroy all the asteroids contained in these $k$ columns. Now, for the rest of the asteroids, one can fire $4-k$ vertical beams (one through each of the remaining columns) to clear the rest of the rest of the asteroids. 

    Thus, with only $(4-k)+(k-1)=3$ beams, one can clear out all the asteroids, which is a contradiction.

    Thus, for every board not clearable by $3$ beams, in the corresponding graph,  every set of $k$ columns is directly connected to atleast $k$ rows, for every $k\in\{1,2,3,4\}$. 

    Now, with this, Hall's Marriage Theorem implies the existence of a perfect bipartite matching between the set of rows and columns. Now, in this matching, there are $4$ edges, and every edge corresponds to an asteroid, and among these, no two share a same row or a column, as it is a perfect matching. But this is exactly a permutation matrix. Thus, for every board unsolvable by 3 beams, the board contains 4 asteroids in a permutation matrix configuration.

    Thus, we have proved the first statement. Note that this statement generalises readily to $N\times N$ boards (and $N-1$ beams). In fact, it can be shown that Hall's Marriage theorem can be derived from the general result.

2.  Now, we prove the second statement.

    Suppose there is a board with atmost $6$ asteroids, and there are atleast $2$ distict choices of $4$ asteroids which form a permutation matrix. We prove that there are no more.

    Let us denote one of the permuation matrices by $P_1$
    
    Let the board be represented by the $4\times 4$ matrix $A=P_1+B$, where  $B$ is a matrix with $6-4=2$ cells $1$, and all others $0$, and it does not share any have any $1$ in the same position as that in $P_1$. (The $1$'s in $A$ denote the presence of an asteroid in that position).

    Since there are are more than one permuation matrices in $A$, consider another permuation matrix $P_2\ne P_1$ contained inside $A$. 
    
    Now, clearly, if we fix $3$ positions of $1$'s of any permutation matrix, the final position is uniquely determined, as one just considers the element in the uncovered row and column.
    
    Thus, for any two distinct permuation matrices, one contains $1$'s in atleast two cells where the other does not, and vice versa. Thus, $P_2$ must contain btoh the  $1$'s from $B$. Now, there can only be 2 $1$'s in $P_1$ which do not share any row or column with any of the $1$'s in $B$, as each $1$ in $P_1$ has a distinct row and colum. 
 
     Now, since we need atleast $2$ more $1$'s to form any permuation matrix containing $B$, since $A=P_1+B$, there atmost one way of forming a permuation matrix $P_2$ containing $B$.
     
     Hence, given $P_1$, if $P_2$ exists, then it is unique. Thus, there can only be atmost $2$ permuation matrix configurations formed by the asteroids on the board.

# QRAM Implementation:
 We use 4 qubits to store the address, and 3 extra work qubits. Also, we use RCCX gates instead of CCX or MCT's in order to keep the costs low. We can do this as we will be appending QRAM$^{-1}$ after the computation.
 
Since we iterate through the addresses, we follow the order [15,14,12,13,9,8,10,11,3,2,0,1,5,4,6,7] as it forms a grey code (a specific one which tries to ensure that the more significant bits have lesser changes). This is is so that one is able to use previous computations in order to have lesser cost.

We go forward with creating the QRAM, with the main idea being that at any point of time we will use the least number of gates ( avoiding the use of even RCCX as much as possible).

Below is this initial QRAM implementation, we will keep adding more optimizations to it

In [6]:
problem_set = \
    [[['0', '2'], ['1', '0'], ['1', '2'], ['1', '3'], ['2', '0'], ['3', '3']],
    [['0', '0'], ['0', '1'], ['1', '2'], ['2', '2'], ['3', '0'], ['3', '3']],
    [['0', '0'], ['1', '1'], ['1', '3'], ['2', '0'], ['3', '2'], ['3', '3']],
    [['0', '0'], ['0', '1'], ['1', '1'], ['1', '3'], ['3', '2'], ['3', '3']],
    [['0', '2'], ['1', '0'], ['1', '3'], ['2', '0'], ['3', '2'], ['3', '3']],
    [['1', '1'], ['1', '2'], ['2', '0'], ['2', '1'], ['3', '1'], ['3', '3']],
    [['0', '2'], ['0', '3'], ['1', '2'], ['2', '0'], ['2', '1'], ['3', '3']],
    [['0', '0'], ['0', '3'], ['1', '2'], ['2', '2'], ['2', '3'], ['3', '0']],
    [['0', '3'], ['1', '1'], ['1', '2'], ['2', '0'], ['2', '1'], ['3', '3']],
    [['0', '0'], ['0', '1'], ['1', '3'], ['2', '1'], ['2', '3'], ['3', '0']],
    [['0', '1'], ['0', '3'], ['1', '2'], ['1', '3'], ['2', '0'], ['3', '2']],
    [['0', '0'], ['1', '3'], ['2', '0'], ['2', '1'], ['2', '3'], ['3', '1']],
    [['0', '1'], ['0', '2'], ['1', '0'], ['1', '2'], ['2', '2'], ['2', '3']],
    [['0', '3'], ['1', '0'], ['1', '3'], ['2', '1'], ['2', '2'], ['3', '0']],
    [['0', '2'], ['0', '3'], ['1', '2'], ['2', '3'], ['3', '0'], ['3', '1']],
    [['0', '1'], ['1', '0'], ['1', '2'], ['2', '2'], ['3', '0'], ['3', '1']]]

In [2]:
import qiskit
from qiskit import *
test_set=problem_set# for testing out the circuit with different inputs
qram=QuantumCircuit(23)# first 4 for address, 16 per cell, 4 ancillia
# manually defining the grey code sequence, and the corresponding values

order=['1111','1110','1100','1101',
       '1001','1000','1010','1011',
       '0011','0010','0000','0001',
       '0101','0100','0110','0111']
num=[15,14,12,13,9,8,10,11,3,2,0,1,5,4,6,7]
# We begin to EXPLICITYLY state all the operations
# we denote the bit in first 4 registers by a,b,c,d respectively
# for example, 21: a nb c denotes that 21'st qubit now stores a AND (NOT b) AND C



qram.rccx(0,1,20)
#20 : a b
qram.rccx(2,20,21)
#21 : a b c
qram.rccx(3,21,22)
#22: a b c d
for i in test_set[15]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)

#22: a b c nd
for i in test_set[14]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.x(3)
#3:nd

qram.rccx(3,21,22)
#22:0
qram.cx(20,21)
#21:a b nc
qram.rccx(3,21,22)
#22:a b nc nd
for i in test_set[12]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:a b nc d
for i in test_set[13]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.x(3)
#3:d
qram.rccx(3,21,22)
#22: 0
qram.x(2)
#2:nc
qram.rccx(2,20,21)
#21:0
qram.cx(0,20)
#20:a nb
qram.rccx(2,20,21)
#21:a nb nc
qram.rccx(3,21,22)
#22:a nb nc d 
for i in test_set[9]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:a nb nc nd
for i in test_set[8]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.x(3)
#3: nd

qram.rccx(3,21,22)
#22;0
qram.cx(20,21)
#21:a nb c
qram.rccx(3,21,22)
#22:a nb c nd
for i in test_set[10]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:a nb c d
for i in test_set[11]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))

qram.x(3)
#3:d
qram.rccx(3,21,22)
#22:0
qram.x(2)
#2:c
qram.rccx(2,20,21)
#21:0
qram.x(1)
#1:nb
qram.cx(1,20)
#20:na nb
qram.rccx(2,20,21)
#21:na nb c
qram.rccx(3,21,22)
#22:na nb c d
for i in test_set[3]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)

#22:na nb c nd
for i in test_set[2]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))

qram.x(3)
#3:nd
qram.rccx(3,21,22)
#22:0
qram.cx(20,21)
#21:na nb nc
qram.rccx(3,21,22)
#22:na nb nc nd
for i in test_set[0]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:na nb nc d
for i in test_set[1]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))

qram.x(3)
#3:d
qram.rccx(3,21,22)
#22:0
qram.x(2)
#2:nc
qram.rccx(2,20,21)
#21: 0
qram.x(0)
#0:na
qram.cx(0,20)
#20: na b
qram.rccx(2,20,21)
#21: na b nc
qram.rccx(3,21,22)
#22:na b nc d
for i in test_set[5]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:na b nc nd
for i in test_set[4]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.x(3)
#3:nd
qram.rccx(3,21,22)
#22:0
qram.cx(20,21)
#21:na b c
qram.rccx(3,21,22)
#22:na b c nd
for i in test_set[6]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)

#22:na b c d
for i in test_set[7]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.x(3)
#3:d
qram.rccx(3,21,22)
#22:0
qram.x(2)
#2:c
qram.rccx(2,20,21)
#21:0
qram.x(1)
#1:b
qram.rccx(0,1,20)
#20:0

Now, we observe the following
1.  RCCX gate is not symmetric wrt control gates, and when one applies rccx($a,b,t$) and rccx($c,b,t$) without any additional gate applied on $b$ and $t$ between these two applications of rccx, then 2 cx gates and 3 u3 gates cancel out. This is true even when $a$ and $c$ denote the same qubit and there are some gates applied on that qubit between the application of the rccx's. So, we define two new gates which would replace rccx gates under the given conditions. We call these fhrccx and shrccx( more details about these in the definition in the code). Note: we had to flip some rccx(a,b,c)'s to rccx(b,a,c)'s (i.e, interchanged the control qubtis) for this cancellation to take place. This is okay as anyways we were using the RCCX gate as an alternative to CCX.
2.  We remove the last uncomputation of the 3 work qubits as we wouldnt be needing those qubits, and they would anyways be uncomputed by appending QRAM$^{-1}$
    
3.  As defined in the code, we observe that 
    ```python
    fhrccx(qram,3,21,22)
    qram.cx(20,21)
    shrccx(qram,3,21,22)
    ```
    can be replaced by
    ```python
    qram.rccx(3,20,22)
    qram.cx(20,21)
    ```
    This is because the operation defined by the 3 cx gates( unroll the former to observe them) in the middle of the former can be constructed in a different way using just 2 cx gates. And then, one can observe that the final construction resembles an RCCX gate with an extra cx gate. We do similar optimizations at other places too( as seen in the code below).
    
The final QRAM implementation is presented below:

In [None]:
import qiskit
from qiskit.circuit import QuantumRegister,ClassicalRegister,QuantumCircuit
import numpy
from numpy import pi
#qram definition, will be using grey codes for better gate synthesis


test_set=problem_set# for testing out the circuit with different inputs
qram=QuantumCircuit(23)# first 4 for address, 16 per cell, 4 ancillia
# manually defining the grey code sequence, and the corresponding values
# I end up not using it in the code, but it is here for reference for the order I am iterating in
order=['1111','1110','1100','1101',
       '1001','1000','1010','1011',
       '0011','0010','0000','0001',
       '0101','0100','0110','0111']
num=[15,14,12,13,9,8,10,11,3,2,0,1,5,4,6,7]
# We begin to EXPLICITYLY state all the operations
# we denote the bit in first 4 registers by a,b,c,d respectively
# note that the register values mentioned are for when fhrccx and shrccx gates are not used

def fhrccx(qc,unchanged,changed,target):
    #for testing
    #qc.rccx(changed,unchanged,target)
    #return

    #when 1 ctrl qubit and target qubit is unchanged between 2 rccx's, 3 gates cancel out.
    #this function defines the first uncancelled half
    #qc.h(target)
    #qc.t(target)#:replacing these 2 with corresponding U3 gate

    qc.u3(pi/2,pi/4,pi,target)
    qc.cx(unchanged,target)
    qc.tdg(target)
    qc.cx(changed,target)
    return
def shrccx(qc,unchanged,changed,target):
    #qc.rccx(changed,unchanged,target)
    #return
    #when 1 ctrl qubit and target qubit is unchanged between 2 rccx's, 3 gates cancel out.
    #this function defines the second uncancelled half
    qc.cx(changed,target)
    qc.t(target)
    qc.cx(unchanged,target)
    #qc.tdg(target)
    #qc.h(target)#: again, replaced
    qc.u3(pi/2,0,3*pi/4,target)
    return
def conj_rccx(qc,a,b,target):
    #when a single qubit is target qubit for a series of rccx's, H and T gates cancel, except at extremes
    qc.cx(a,target)
    qc.tdg(target)
    qc.cx(b,target)
    qc.t(target)
    qc.cx(a,target)
    return


qram.rccx(0,1,20)
#20 : a b
qram.rccx(2,20,21)
#21 : a b c
qram.rccx(3,21,22)
#22: a b c d
for i in test_set[15]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)

#22: a b c nd
for i in test_set[14]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.x(3)
#3:nd
'''
we notice that the below is just rccx+extra cx
#qram.rccx(3,21,22):replaced by the gate below
fhrccx(qram,3,21,22)
#22:0
qram.cx(20,21)
#21:a b nc
#qram.rccx(3,21,22):replaced by below
shrccx(qram,3,21,22)
'''
qram.rccx(3,20,22)
qram.cx(20,21)
#22:a b nc nd
for i in test_set[12]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:a b nc d
for i in test_set[13]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.x(3)
#3:d
#qram.rccx(3,21,22):replaced by below
fhrccx(qram,3,21,22)
#22: 0
qram.x(2)
#2:nc

#qram.rccx(2,20,21):replaced by below
'''
fhrccx(qram,2,20,21)
#21:0
qram.cx(0,20)
#20:a nb
#qram.rccx(2,20,21):replaced by below
shrccx(qram,2,20,21)
'''
qram.rccx(2,0,21)
qram.cx(0,20)

#21:a nb nc
#qram.rccx(3,21,22):replaced by below
shrccx(qram,3,21,22)
#22:a nb nc d 
for i in test_set[9]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:a nb nc nd
for i in test_set[8]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.x(3)
#3: nd

#qram.rccx(3,21,22)
'''
fhrccx(qram,3,21,22)
#22;0
qram.cx(20,21)
#21:a nb c
#qram.rccx(3,21,22)
shrccx(qram,3,21,22)
'''
qram.rccx(3,20,22)
qram.cx(20,21)

#22:a nb c nd
for i in test_set[10]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:a nb c d
for i in test_set[11]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))

qram.x(3)
#3:d
#qram.rccx(3,21,22)
fhrccx(qram,3,21,22)
#22:0
qram.x(2)
#2:c
#qram.rccx(2,20,21)
qram.x(1)
#1:nb
'''
fhrccx(qram,2,20,21)
#21:0
qram.cx(1,20)
#20:na nb
#qram.rccx(2,20,21)
shrccx(qram,2,20,21)
'''
qram.rccx(2,1,21)
qram.cx(1,20)

#21:na nb c
#qram.rccx(3,21,22)
shrccx(qram,3,21,22)
#22:na nb c d
for i in test_set[3]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)

#22:na nb c nd
for i in test_set[2]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))

qram.x(3)
#3:nd
#qram.rccx(3,21,22)
'''
fhrccx(qram,3,21,22)
#22:0
qram.cx(20,21)
#21:na nb nc
#qram.rccx(3,21,22)
shrccx(qram,3,21,22)
'''
qram.rccx(3,20,22)
qram.cx(20,21)

#22:na nb nc nd
for i in test_set[0]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:na nb nc d
for i in test_set[1]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))

qram.x(3)
#3:d
#qram.rccx(3,21,22)
fhrccx(qram,3,21,22)
#22:0
qram.x(2)
#2:nc

qram.x(0)
#0:na
'''
#qram.rccx(2,20,21)
fhrccx(qram,2,20,21)
#21: 0
qram.cx(0,20)
#20: na b
#qram.rccx(2,20,21)
shrccx(qram,2,20,21)
'''
qram.rccx(2,0,21)
qram.cx(0,20)

#21: na b nc
#qram.rccx(3,21,22)
shrccx(qram,3,21,22)
#22:na b nc d
for i in test_set[5]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)
#22:na b nc nd
for i in test_set[4]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.x(3)
#3:nd
#qram.rccx(3,21,22)
'''
fhrccx(qram,3,21,22)
#22:0
qram.cx(20,21)
#21:na b c
#qram.rccx(3,21,22)
shrccx(qram,3,21,22)
'''
qram.rccx(3,20,22)
qram.cx(20,21)

#22:na b c nd
for i in test_set[6]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
qram.cx(21,22)

#22:na b c d
for i in test_set[7]:
    qram.cx(22,4+4*int(i[0])+int(i[1]))
#qram.x(3)
#3:d
#qram.rccx(3,21,22)
#22:0
#qram.x(2):pointless
#2:c
#qram.rccx(2,20,21):pointless
#21:0
#qram.x(1):pointless
#1:b
#qram.rccx(0,1,20):removed cause pointless
#20:0

# Oracle Implementation

Since we have to multiply a phase of $e^{\frac 2 3 i\pi}$ for each permutation matrix we obtain, we try to go over all the permuation matrices. The standard way would have had been to use triply controlled Z gates 24 times, but we realise that if we break down the computation, a lot of computations can be reused for different permuatations.

For refererring to each cell of the board, this is the numbering I use

$$\begin{bmatrix}
0&1&2&3\\
4&5&6&7\\
8&9&10&11\\
12&13&14&15\\
\end{bmatrix}$$

Now, notice, notice that for the 2 permuation matrices containing cells $1,6,8,15$ and $1,6,12,11$ respectively, presence of both of them depends on presence of asteroids in the cells $1$ and $6$. Generalising this directly, we can get 12 pairs of permuations, both sharing a common computation (in this case, it would be computation of $a_6 \land a_1$, where $a_i$ denotes if asteroid is present in cell $i$ or not).

However, we see that we can do much better.

Consider the cells $1,2,5,6,8,11,12,5$. Contained within these cells are 4 permutation matrices, the presence of which can be computed if we are given $a_1\land a_6$, $a_2\land a_5$, $a_8\land a_{15}$, $a_{12}\land a_{11}$

Hence, for each such group of cells( all of these have been explicitly mentioned in the code), we use 4 rccx gates to compute these 4 intermediate computations, storing the answers in 4 new work qubits, and then using cu1 gates to multiply the appropriate phase, and then uncompute using the same 4 qubits.

Note that the way we iterate through the groups of cells, 4 of the cells remain the same between any two consecutive groups of cells( its easy to find such an order if one thinks about it for a while).

This allows us to again use the fhrccx and shrccx trick mentioned in the QRAM section.

I didn't save any previous iterations of this code, so I only have the final version with me right now. Below is the code for the current oracle implementation( note that there would be some remnants of an older algorithm implementation in the comments, since I had just overwritten over the previous code, apologies for that)



In [3]:
compute=QuantumCircuit(24)# requires only 22, but we do this so that appending is easier 
# qubit 22 allocated for even permutations, and 23 for odd

# computation( and decomputation) for permuation matrices containing 0,2,4,6,9,11,13,15
#compute.rccx(0,6,18)
#compute.rccx(4,2,19)
#compute.rccx(9,15,20)
#compute.rccx(11,13,21)
#matching these up with the latter part
compute.rccx(0,6,18)
#fhrccx(compute,6,0,18)
compute.rccx(4,2,19)
#fhrccx(compute,2,4,19)
compute.rccx(15,9,20)
#fhrccx(compute,9,15,20)
compute.rccx(11,13,21)
#fhrccx(compute,13,11,21)

#non-cancellation of h and t at the beginning
#compute.h([22,23])
#compute.t([22,23])


''' doing the cz implementation
conj_rccx(compute,18,20,23)
conj_rccx(compute,19,21,23)
conj_rccx(compute,18,21,22)
conj_rccx(compute,19,20,22)
'''
compute.cu1(2*pi/3,18,20)
compute.cu1(2*pi/3,19,21)
compute.cu1(2*pi/3,18,21)
compute.cu1(2*pi/3,19,20)

#compute.rccx(0,6,18)
fhrccx(compute,6,0,18)

#compute.rccx(4,2,19)
fhrccx(compute,2,4,19)
#compute.rccx(9,15,20)
fhrccx(compute,9,15,20)
#compute.rccx(11,13,21)
fhrccx(compute,13,11,21)

# computation( and decomputation) for permuation matrices containing 2,3,6,7,8,9,12,13
#compute.rccx(3,6,18)
shrccx(compute,6,3,18)
#compute.rccx(7,2,19)
shrccx(compute,2,7,19)
#compute.rccx(9,12,20)
shrccx(compute,9,12,20)
#compute.rccx(8,13,21)
shrccx(compute,13,8,21)

#compute.rccx(18,20,22)
#compute.rccx(19,21,22)
#compute.rccx(18,21,23)
#compute.rccx(19,20,23)

#conj_rccx(compute,18,20,22)
#conj_rccx(compute,19,21,22)
#conj_rccx(compute,18,21,23)
#conj_rccx(compute,19,20,23)

compute.cu1(2*pi/3,18,20)
compute.cu1(2*pi/3,19,21)
compute.cu1(2*pi/3,18,21)
compute.cu1(2*pi/3,19,20)
#compute.rccx(3,6,18)
fhrccx(compute,6,3,18)
#compute.rccx(7,2,19)
fhrccx(compute,2,7,19)
#compute.rccx(9,12,20)
fhrccx(compute,12,9,20)
#compute.rccx(8,13,21)
fhrccx(compute,8,13,21)

# computation( and decomputation) for permuation matrices containing 1,2,5,6,8,11,12,15
#compute.rccx(1,6,18)
shrccx(compute,6,1,18)
#compute.rccx(5,2,19)
shrccx(compute,2,5,19)
#compute.rccx(11,12,20)
shrccx(compute,12,11,20)
#compute.rccx(8,15,21)
shrccx(compute,8,15,21)

#compute.rccx(18,20,23)
#compute.rccx(19,21,23)
#compute.rccx(18,21,22)
#compute.rccx(19,20,22)

#conj_rccx(compute,18,20,23)
#conj_rccx(compute,19,21,23)
#conj_rccx(compute,18,21,22)
#conj_rccx(compute,19,20,22)

compute.cu1(2*pi/3,18,20)
compute.cu1(2*pi/3,19,21)
compute.cu1(2*pi/3,18,21)
compute.cu1(2*pi/3,19,20)

#compute.rccx(1,6,18)
fhrccx(compute,1,6,18)
#compute.rccx(5,2,19)
fhrccx(compute,5,2,19)
#compute.rccx(11,12,20)
fhrccx(compute,11,12,20)
#compute.rccx(8,15,21)
fhrccx(compute,15,8,21)

# computation( and decomputation) for permuation matrices containing 0,1,4,5,10,11,14,15
#compute.rccx(1,4,18)
shrccx(compute,1,4,18)
#compute.rccx(5,0,19)
shrccx(compute,5,0,19)
#compute.rccx(11,14,20)
shrccx(compute,11,14,20)
#compute.rccx(10,15,21)
shrccx(compute,15,10,21)

#compute.rccx(18,20,22)
#compute.rccx(19,21,22)
#compute.rccx(18,21,23)
#compute.rccx(19,20,23)

#conj_rccx(compute,18,20,22)
#conj_rccx(compute,19,21,22)
#conj_rccx(compute,18,21,23)
#conj_rccx(compute,19,20,23)

compute.cu1(2*pi/3,18,20)
compute.cu1(2*pi/3,19,21)
compute.cu1(2*pi/3,18,21)
compute.cu1(2*pi/3,19,20)

#compute.rccx(1,4,18)
fhrccx(compute,1,4,18)
#compute.rccx(5,0,19)
fhrccx(compute,5,0,19)
#compute.rccx(11,14,20)
fhrccx(compute,14,11,20)
#compute.rccx(10,15,21)
fhrccx(compute,10,15,21)

# computation( and decomputation) for permuation matrices containing 1,3,5,7,8,10,12,14
#compute.rccx(1,7,18)
shrccx(compute,1,7,18)
#compute.rccx(5,3,19)
shrccx(compute,5,3,19)
#compute.rccx(8,14,20)
shrccx(compute,14,8,20)
#compute.rccx(10,12,21)
shrccx(compute,10,12,21)

#compute.rccx(18,20,23)
#compute.rccx(19,21,23)
#compute.rccx(18,21,22)
#compute.rccx(19,20,22)

#conj_rccx(compute,18,20,23)
#conj_rccx(compute,19,21,23)
#conj_rccx(compute,18,21,22)
#conj_rccx(compute,19,20,22)


compute.cu1(2*pi/3,18,20)
compute.cu1(2*pi/3,19,21)
compute.cu1(2*pi/3,18,21)
compute.cu1(2*pi/3,19,20)

#compute.rccx(1,7,18)
fhrccx(compute,7,1,18)
#compute.rccx(5,3,19)
fhrccx(compute,3,5,19)
#compute.rccx(8,14,20)
fhrccx(compute,14,8,20)
#compute.rccx(10,12,21)
fhrccx(compute,10,12,21)
# computation( and decomputation) for permuation matrices containing 0,3,4,7,9,10,13,14
#compute.rccx(0,7,18)
shrccx(compute,7,0,18)
#compute.rccx(4,3,19)
shrccx(compute,3,4,19)
#compute.rccx(9,14,20)
shrccx(compute,14,9,20)
#compute.rccx(10,13,21)
shrccx(compute,10,13,21)


#compute.rccx(18,20,22)
#compute.rccx(19,21,22)
#compute.rccx(18,21,23)
#compute.rccx(19,20,23)

#conj_rccx(compute,18,20,22)
#conj_rccx(compute,19,21,22)
#conj_rccx(compute,18,21,23)
#conj_rccx(compute,19,20,23)


compute.cu1(2*pi/3,18,20)
compute.cu1(2*pi/3,19,21)
compute.cu1(2*pi/3,18,21)
compute.cu1(2*pi/3,19,20)

#uncomputing cause this the only one now
compute.rccx(0,7,18)
#shrccx(compute,7,0,18)
compute.rccx(4,3,19)
#shrccx(compute,3,4,19)
compute.rccx(9,14,20)
#shrccx(compute,14,9,20)
compute.rccx(13,10,21)
#shrccx(compute,10,13,21)
#non cancellation at end of 22,23
#compute.tdg([22,23])
#compute.h([22,23])

# Main Code

Here is whole of the code put together( note that there could be some comments corresponding to a previous version of the algorithm, please ignore those)


In [2]:
def week3_ans_func(problem_set):
    ##### build your quantum circuit here
    ##### In addition, please make it a function that can solve the problem even with different inputs (problem_set). We do validation with different inputs. 
    
    import qiskit
    from qiskit.circuit import QuantumRegister,ClassicalRegister,QuantumCircuit
    import numpy
    from numpy import pi
    #qram definition, will be using grey codes for better gate synthesis


    test_set=problem_set# for testing out the circuit with different inputs
    qram=QuantumCircuit(23)# first 4 for address, 16 per cell, 4 ancillia
    # manually defining the grey code sequence, and the corresponding values
    # I end up not using it in the code, but it is here for reference for the order I am iterating in
    order=['1111','1110','1100','1101',
           '1001','1000','1010','1011',
           '0011','0010','0000','0001',
           '0101','0100','0110','0111']
    num=[15,14,12,13,9,8,10,11,3,2,0,1,5,4,6,7]
    # We begin to EXPLICITYLY state all the operations
    # we denote the bit in first 4 registers by a,b,c,d respectively
    # note that the register values mentioned are for when fhrccx and shrccx gates are not used

    def fhrccx(qc,unchanged,changed,target):
        #for testing
        #qc.rccx(changed,unchanged,target)
        #return
        
        #when 1 ctrl qubit and target qubit is unchanged between 2 rccx's, 3 gates cancel out.
        #this function defines the first uncancelled half
        #qc.h(target)
        #qc.t(target)#:replacing these 2 with corresponding U3 gate
        
        qc.u3(pi/2,pi/4,pi,target)
        qc.cx(unchanged,target)
        qc.tdg(target)
        qc.cx(changed,target)
        return
    def shrccx(qc,unchanged,changed,target):
        #qc.rccx(changed,unchanged,target)
        #return
        #when 1 ctrl qubit and target qubit is unchanged between 2 rccx's, 3 gates cancel out.
        #this function defines the second uncancelled half
        qc.cx(changed,target)
        qc.t(target)
        qc.cx(unchanged,target)
        #qc.tdg(target)
        #qc.h(target)#: again, replaced
        qc.u3(pi/2,0,3*pi/4,target)
        return
    def conj_rccx(qc,a,b,target):
        #when a single qubit is target qubit for a series of rccx's, H and T gates cancel, except at extremes
        qc.cx(a,target)
        qc.tdg(target)
        qc.cx(b,target)
        qc.t(target)
        qc.cx(a,target)
        return


    qram.rccx(0,1,20)
    #20 : a b
    qram.rccx(2,20,21)
    #21 : a b c
    qram.rccx(3,21,22)
    #22: a b c d
    for i in test_set[15]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.cx(21,22)

    #22: a b c nd
    for i in test_set[14]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.x(3)
    #3:nd
    '''
    we notice that the below is just rccx+extra cx
    #qram.rccx(3,21,22):replaced by the gate below
    fhrccx(qram,3,21,22)
    #22:0
    qram.cx(20,21)
    #21:a b nc
    #qram.rccx(3,21,22):replaced by below
    shrccx(qram,3,21,22)
    '''
    qram.rccx(3,20,22)
    qram.cx(20,21)
    #22:a b nc nd
    for i in test_set[12]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.cx(21,22)
    #22:a b nc d
    for i in test_set[13]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.x(3)
    #3:d
    #qram.rccx(3,21,22):replaced by below
    fhrccx(qram,3,21,22)
    #22: 0
    qram.x(2)
    #2:nc
    
    #qram.rccx(2,20,21):replaced by below
    '''
    fhrccx(qram,2,20,21)
    #21:0
    qram.cx(0,20)
    #20:a nb
    #qram.rccx(2,20,21):replaced by below
    shrccx(qram,2,20,21)
    '''
    qram.rccx(2,0,21)
    qram.cx(0,20)
    
    #21:a nb nc
    #qram.rccx(3,21,22):replaced by below
    shrccx(qram,3,21,22)
    #22:a nb nc d 
    for i in test_set[9]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.cx(21,22)
    #22:a nb nc nd
    for i in test_set[8]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.x(3)
    #3: nd

    #qram.rccx(3,21,22)
    '''
    fhrccx(qram,3,21,22)
    #22;0
    qram.cx(20,21)
    #21:a nb c
    #qram.rccx(3,21,22)
    shrccx(qram,3,21,22)
    '''
    qram.rccx(3,20,22)
    qram.cx(20,21)
    
    #22:a nb c nd
    for i in test_set[10]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.cx(21,22)
    #22:a nb c d
    for i in test_set[11]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))

    qram.x(3)
    #3:d
    #qram.rccx(3,21,22)
    fhrccx(qram,3,21,22)
    #22:0
    qram.x(2)
    #2:c
    #qram.rccx(2,20,21)
    qram.x(1)
    #1:nb
    '''
    fhrccx(qram,2,20,21)
    #21:0
    qram.cx(1,20)
    #20:na nb
    #qram.rccx(2,20,21)
    shrccx(qram,2,20,21)
    '''
    qram.rccx(2,1,21)
    qram.cx(1,20)
    
    #21:na nb c
    #qram.rccx(3,21,22)
    shrccx(qram,3,21,22)
    #22:na nb c d
    for i in test_set[3]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.cx(21,22)

    #22:na nb c nd
    for i in test_set[2]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))

    qram.x(3)
    #3:nd
    #qram.rccx(3,21,22)
    '''
    fhrccx(qram,3,21,22)
    #22:0
    qram.cx(20,21)
    #21:na nb nc
    #qram.rccx(3,21,22)
    shrccx(qram,3,21,22)
    '''
    qram.rccx(3,20,22)
    qram.cx(20,21)
    
    #22:na nb nc nd
    for i in test_set[0]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.cx(21,22)
    #22:na nb nc d
    for i in test_set[1]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))

    qram.x(3)
    #3:d
    #qram.rccx(3,21,22)
    fhrccx(qram,3,21,22)
    #22:0
    qram.x(2)
    #2:nc
    
    qram.x(0)
    #0:na
    '''
    #qram.rccx(2,20,21)
    fhrccx(qram,2,20,21)
    #21: 0
    qram.cx(0,20)
    #20: na b
    #qram.rccx(2,20,21)
    shrccx(qram,2,20,21)
    '''
    qram.rccx(2,0,21)
    qram.cx(0,20)
    
    #21: na b nc
    #qram.rccx(3,21,22)
    shrccx(qram,3,21,22)
    #22:na b nc d
    for i in test_set[5]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.cx(21,22)
    #22:na b nc nd
    for i in test_set[4]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.x(3)
    #3:nd
    #qram.rccx(3,21,22)
    '''
    fhrccx(qram,3,21,22)
    #22:0
    qram.cx(20,21)
    #21:na b c
    #qram.rccx(3,21,22)
    shrccx(qram,3,21,22)
    '''
    qram.rccx(3,20,22)
    qram.cx(20,21)
    
    #22:na b c nd
    for i in test_set[6]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    qram.cx(21,22)

    #22:na b c d
    for i in test_set[7]:
        qram.cx(22,4+4*int(i[0])+int(i[1]))
    #qram.x(3)
    #3:d
    #qram.rccx(3,21,22)
    #22:0
    #qram.x(2):pointless
    #2:c
    #qram.rccx(2,20,21):pointless
    #21:0
    #qram.x(1):pointless
    #1:b
    #qram.rccx(0,1,20):removed cause pointless
    #20:0
    '''this section was for testing qram circuit

    num=[15,14,12,13,9,8,10,11,3,2,0,1,5,4,6,7]# for easy reference while testing for bugs
    init=[0,1,0,0]
    test_qram=QuantumCircuit(23,16)
    index=0
    for i,j in enumerate(init):
        if j==1:
            test_qram.x(i)
        index= index+ j*(2**(3-i))

    out_str=[]
    for i in test_set[index]:
        out_str=out_str+[(4*int(i[0])+int(i[1]))]
    print(out_str)

    test_qram.append(qram,ind[0:23])
    for i in range(16):
        test_qram.measure(15-i+4,i)
    backend=Aer.get_backend('qasm_simulator')
    job=execute(test_qram,backend,shots=100)
    print(job.result().get_counts())
    #qram.draw(output='mpl',filename='qram.png')
    pass_ = Unroller(['u3', 'cx'])
    pm = PassManager(pass_)
    newcirc = pm.run(qram) 
    print(newcirc.count_ops())
    #newcirc.draw(output='mpl')
    '''
    #qram.draw('mpl',filename='qram_optim.png')

    #Now we begin the main computation
    # the main idea is as follows:
    #Consider the given board as a matrix of 6 1's and 10 0's.
    # It can be proved that a board can be cleared by 3 beams if and only if the matrix of 0's
    #and 1's does not have 4 1's which, pairwise, dont share a row or a column (it basically does not contain a permutation matrix))
    #The above can be seen by either Hall's marriage theorem, or by trial and error
    #(Hall's marriage theorem gives a more general answer for n x n boards)
    #Now, given that there are only 6 1's, it can further be proved that atmost 1 even or 1 odd permutation matrix is contained
    #(Again the general result for n x n boards and n+2 1's is also valid)


    # So, we just check all 24=4! (the number of permuation matrices of size 4 x 4)  possible combinations
    # Some optimisation is done too, mainly first reducing the number of AND operations required( naively one would do 3 AND operations 24 times)
    # But we will make use of the fact that some computations are common( and that we have some qubits left)

    #Since exactly one even permutation matrix can be contained, we only need a single bit to compute the number of even permutation matrices
    # Same for odd permutation matrices
    compute=QuantumCircuit(24)# requires only 22, but we do this so that appending is easier 
    # qubit 22 allocated for even permutations, and 23 for odd

    # computation( and decomputation) for permuation matrices containing 0,2,4,6,9,11,13,15
    #compute.rccx(0,6,18)
    #compute.rccx(4,2,19)
    #compute.rccx(9,15,20)
    #compute.rccx(11,13,21)
    #matching these up with the latter part
    compute.rccx(0,6,18)
    #fhrccx(compute,6,0,18)
    compute.rccx(4,2,19)
    #fhrccx(compute,2,4,19)
    compute.rccx(15,9,20)
    #fhrccx(compute,9,15,20)
    compute.rccx(11,13,21)
    #fhrccx(compute,13,11,21)
    
    #non-cancellation of h and t at the beginning
    #compute.h([22,23])
    #compute.t([22,23])


    ''' doing the cz implementation
    conj_rccx(compute,18,20,23)
    conj_rccx(compute,19,21,23)
    conj_rccx(compute,18,21,22)
    conj_rccx(compute,19,20,22)
    '''
    compute.cu1(2*pi/3,18,20)
    compute.cu1(2*pi/3,19,21)
    compute.cu1(2*pi/3,18,21)
    compute.cu1(2*pi/3,19,20)
    
    #compute.rccx(0,6,18)
    fhrccx(compute,6,0,18)
    
    #compute.rccx(4,2,19)
    fhrccx(compute,2,4,19)
    #compute.rccx(9,15,20)
    fhrccx(compute,9,15,20)
    #compute.rccx(11,13,21)
    fhrccx(compute,13,11,21)

    # computation( and decomputation) for permuation matrices containing 2,3,6,7,8,9,12,13
    #compute.rccx(3,6,18)
    shrccx(compute,6,3,18)
    #compute.rccx(7,2,19)
    shrccx(compute,2,7,19)
    #compute.rccx(9,12,20)
    shrccx(compute,9,12,20)
    #compute.rccx(8,13,21)
    shrccx(compute,13,8,21)

    #compute.rccx(18,20,22)
    #compute.rccx(19,21,22)
    #compute.rccx(18,21,23)
    #compute.rccx(19,20,23)
    
    #conj_rccx(compute,18,20,22)
    #conj_rccx(compute,19,21,22)
    #conj_rccx(compute,18,21,23)
    #conj_rccx(compute,19,20,23)
    
    compute.cu1(2*pi/3,18,20)
    compute.cu1(2*pi/3,19,21)
    compute.cu1(2*pi/3,18,21)
    compute.cu1(2*pi/3,19,20)
    #compute.rccx(3,6,18)
    fhrccx(compute,6,3,18)
    #compute.rccx(7,2,19)
    fhrccx(compute,2,7,19)
    #compute.rccx(9,12,20)
    fhrccx(compute,12,9,20)
    #compute.rccx(8,13,21)
    fhrccx(compute,8,13,21)

    # computation( and decomputation) for permuation matrices containing 1,2,5,6,8,11,12,15
    #compute.rccx(1,6,18)
    shrccx(compute,6,1,18)
    #compute.rccx(5,2,19)
    shrccx(compute,2,5,19)
    #compute.rccx(11,12,20)
    shrccx(compute,12,11,20)
    #compute.rccx(8,15,21)
    shrccx(compute,8,15,21)

    #compute.rccx(18,20,23)
    #compute.rccx(19,21,23)
    #compute.rccx(18,21,22)
    #compute.rccx(19,20,22)

    #conj_rccx(compute,18,20,23)
    #conj_rccx(compute,19,21,23)
    #conj_rccx(compute,18,21,22)
    #conj_rccx(compute,19,20,22)
    
    compute.cu1(2*pi/3,18,20)
    compute.cu1(2*pi/3,19,21)
    compute.cu1(2*pi/3,18,21)
    compute.cu1(2*pi/3,19,20)
    
    #compute.rccx(1,6,18)
    fhrccx(compute,1,6,18)
    #compute.rccx(5,2,19)
    fhrccx(compute,5,2,19)
    #compute.rccx(11,12,20)
    fhrccx(compute,11,12,20)
    #compute.rccx(8,15,21)
    fhrccx(compute,15,8,21)

    # computation( and decomputation) for permuation matrices containing 0,1,4,5,10,11,14,15
    #compute.rccx(1,4,18)
    shrccx(compute,1,4,18)
    #compute.rccx(5,0,19)
    shrccx(compute,5,0,19)
    #compute.rccx(11,14,20)
    shrccx(compute,11,14,20)
    #compute.rccx(10,15,21)
    shrccx(compute,15,10,21)

    #compute.rccx(18,20,22)
    #compute.rccx(19,21,22)
    #compute.rccx(18,21,23)
    #compute.rccx(19,20,23)

    #conj_rccx(compute,18,20,22)
    #conj_rccx(compute,19,21,22)
    #conj_rccx(compute,18,21,23)
    #conj_rccx(compute,19,20,23)
    
    compute.cu1(2*pi/3,18,20)
    compute.cu1(2*pi/3,19,21)
    compute.cu1(2*pi/3,18,21)
    compute.cu1(2*pi/3,19,20)
    
    #compute.rccx(1,4,18)
    fhrccx(compute,1,4,18)
    #compute.rccx(5,0,19)
    fhrccx(compute,5,0,19)
    #compute.rccx(11,14,20)
    fhrccx(compute,14,11,20)
    #compute.rccx(10,15,21)
    fhrccx(compute,10,15,21)

    # computation( and decomputation) for permuation matrices containing 1,3,5,7,8,10,12,14
    #compute.rccx(1,7,18)
    shrccx(compute,1,7,18)
    #compute.rccx(5,3,19)
    shrccx(compute,5,3,19)
    #compute.rccx(8,14,20)
    shrccx(compute,14,8,20)
    #compute.rccx(10,12,21)
    shrccx(compute,10,12,21)

    #compute.rccx(18,20,23)
    #compute.rccx(19,21,23)
    #compute.rccx(18,21,22)
    #compute.rccx(19,20,22)

    #conj_rccx(compute,18,20,23)
    #conj_rccx(compute,19,21,23)
    #conj_rccx(compute,18,21,22)
    #conj_rccx(compute,19,20,22)

    
    compute.cu1(2*pi/3,18,20)
    compute.cu1(2*pi/3,19,21)
    compute.cu1(2*pi/3,18,21)
    compute.cu1(2*pi/3,19,20)
    
    #compute.rccx(1,7,18)
    fhrccx(compute,7,1,18)
    #compute.rccx(5,3,19)
    fhrccx(compute,3,5,19)
    #compute.rccx(8,14,20)
    fhrccx(compute,14,8,20)
    #compute.rccx(10,12,21)
    fhrccx(compute,10,12,21)
    # computation( and decomputation) for permuation matrices containing 0,3,4,7,9,10,13,14
    #compute.rccx(0,7,18)
    shrccx(compute,7,0,18)
    #compute.rccx(4,3,19)
    shrccx(compute,3,4,19)
    #compute.rccx(9,14,20)
    shrccx(compute,14,9,20)
    #compute.rccx(10,13,21)
    shrccx(compute,10,13,21)


    #compute.rccx(18,20,22)
    #compute.rccx(19,21,22)
    #compute.rccx(18,21,23)
    #compute.rccx(19,20,23)
    
    #conj_rccx(compute,18,20,22)
    #conj_rccx(compute,19,21,22)
    #conj_rccx(compute,18,21,23)
    #conj_rccx(compute,19,20,23)
    
    
    compute.cu1(2*pi/3,18,20)
    compute.cu1(2*pi/3,19,21)
    compute.cu1(2*pi/3,18,21)
    compute.cu1(2*pi/3,19,20)
    
    #uncomputing cause this the only one now
    compute.rccx(0,7,18)
    #shrccx(compute,7,0,18)
    compute.rccx(4,3,19)
    #shrccx(compute,3,4,19)
    compute.rccx(9,14,20)
    #shrccx(compute,14,9,20)
    compute.rccx(13,10,21)
    #shrccx(compute,10,13,21)
    #non cancellation at end of 22,23
    #compute.tdg([22,23])
    #compute.h([22,23])

    ind=range(30)
    qc=QuantumCircuit(28,4)
    qc.h([0,1,2,3])

    qc.append(qram,range(23))
    qc.append(compute,[4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,23,24,25,26,27,22])
    #qc.x([26,27])
    #qc.cz(26,27)
    #qc.x([26,27])

    #qc.append(compute.inverse(),ind[4:28])
    qc.append(qram.inverse(),range(23))
    #for i in range((3000-458)): Added this line to check with 10k threshold
        #qc.x(0)
    
    qc.h([0,1,2,3])
    qc.x([0,1,2,3])
    
    qc.rccx(0,1,4)
    qc.rccx(2,3,5)
    #qc.crz(pi/6,4,5)
    qc.cz(4,5)
    qc.rccx(0,1,4)
    qc.rccx(2,3,5)
    
    
    qc.x([0,1,2,3])
    qc.h([0,1,2,3])
    
    
    qc.measure(0,0)
    qc.measure(1,1)
    qc.measure(2,2)
    qc.measure(3,3)

    #backend = Aer.get_backend('qasm_simulator')#QasmSimulator(method='statevector_gpu')
    #job=execute(qc,backend,shots=15)
    #print(job.result().get_counts())
    #pass_ = Unroller(['u3', 'cx'])
    #pm = PassManager(pass_)
    #newcirc = pm.run(qc) 
    #print(newcirc.count_ops())

    #### Code for Grover's algorithm with iterations = 1 will be as follows. 
    #### for i in range(1):
    ####   oracle()
    ####   diffusion()
    
    return qc

# Additional thoughts

After the challenge was over, I had though of this new solution, which would get an even lesser success probability, but would still get some amplitude amplification.

With 6 asteroids, one could prove it further that atmost 1 even permutation and 1 odd permuation matrix would be present.

Using this, one can change the phase multiplication to: $-1$ for each even permuation, and $e^{\frac {i\pi}2}$ for each odd permuatation. This ensures that each solvable board is multiplied by only $1$, and the unsolvable board is multiplied by either $-1$, $e^{\frac{i\pi} 2} $, or $e^{-\frac{i\pi} 2}$. One will be able to get an amplitude amplification in all three cases( again since each of those phases are closer to $-1$ than $1$ is to $-1$.

For implementing this, for each group of cells, one would need 2 cu1 gates and 2 cz gates, so we lose an additional 2 cx gates and 2 u3 gates per group of cells, leading to total loss of 12 cx gates and 12 u3 gates, leading to a cost reduction of 132, breaching the 5k mark.