# Grover's search algorithm

If we can implement an oracle which verifies candidate solutions to a problem, then Grover's search algorithm will allow us to test them all in parallel to find solutions.

For example, we have an unsorted list of $N = 2^n$ objects, and we want to find the one which returns $1$ when the others all return $0$. Classically we would have to check them all, or at least keep checking until we happened upon the "winning" solution $\omega$. Grover's algorithm works for oracles which multiply $\omega$ by $-1$ while leaving the other solutions as they are. For example, if we have three qubits and $\omega = \text{101}$, our oracle will have the matrix:

$$
U_\omega = 
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
\end{bmatrix}
\begin{aligned}
\\
\\
\\
\\
\\
\\
\leftarrow \omega = \text{101}\\
\\
\\
\\
\end{aligned}
$$

https://learning.quantum.ibm.com/tutorial/grovers-algorithm

The [3Blue1Brown youtube channel](https://www.youtube.com/@3blue1brown) has a couple of videos about Grover's algorithm:

[But what is quantum computing? (Grover's Algorithm)](https://youtu.be/RQWpF2Gb-gU)

[Where my explanation of Groverâ€™s algorithm failed](https://youtu.be/Dlsa9EBKDGI)

# Getting started

In a jupyter python notebook, ctrl-enter will run a python code block. Generally they should be self-contained although some will only make sense if they are run in order. "a" will add a block above, "b" will add a block below, "m" will convert a block from python code to "markup" (i.e. text) although you can also write $\LaTeX$ in a markup box. "dd" deletes a block. "y" converts a block back to code.

See http://localhost:8888/view/000-installing-qiskit.html for installation instructions.

API references:

https://docs.quantum.ibm.com/api/qiskit

https://docs.quantum.ibm.com/api/qiskit-ibm-runtime

https://docs.quantum.ibm.com/api/qiskit-ibm-provider

This notebook has been updated for ```qiskit 2.2```

You should also make sure you have the file ```vector_to_latex.py``` in the same folder

In [None]:
from qiskit import QuantumCircuit, transpile
from qiskit.visualization import plot_bloch_vector, plot_bloch_multivector, plot_distribution, plot_histogram, array_to_latex
from qiskit.result import Result
from qiskit.quantum_info import Statevector, Operator
from qiskit_aer import Aer
from math import sqrt, pi

### use vector_to_latex code from qiskit 0.44
from vector_to_latex import *

We can create this unitary directly, although see below for how we would generate it from a gate-based oracle.

In [None]:
Uw = Operator([[1,0,0,0,0,0,0,0],[0,1,0,0,0,0,0,0],[0,0,1,0,0,0,0,0],[0,0,0,1,0,0,0,0],[0,0,0,0,1,0,0,0],[0,0,0,0,0,-1,0,0],[0,0,0,0,0,0,1,0],[0,0,0,0,0,0,0,1]])
array_to_latex(Uw, prefix="U_\\omega = ")

In [None]:
f_in = '101'

qc = QuantumCircuit(3)
if f_in[0]=='1': qc.x(0)
if f_in[1]=='1': qc.x(1)
if f_in[2]=='1': qc.x(2)
statein = Statevector(qc)
display(array_to_latex(statein, prefix="\\text{Initial state = }"))
#print("Initial state")
#display(statein.draw(output = 'latex'))
#display(Math(vector_to_latex(statein)))

In [None]:
qc.append(Uw,[0,1,2])
qc.draw()

In [None]:
state = Statevector(qc)
display(array_to_latex(state, prefix="\\text{Final state = }"))
#print("Final state")
#display(state.draw(output = 'latex'))
#display(Math(vector_to_latex(state)))

To implement Grover's algorithm, we run the oracle on the fully superposed state $|s\rangle = |+++\rangle$.

In [None]:
qc = QuantumCircuit(3)
qc.h(0)
qc.h(1)
qc.h(2)
statein = Statevector(qc)
display(array_to_latex(statein, prefix="\\text{Initial state = }"))
#print("Initial state")
#display(statein.draw(output = 'latex'))
#display(Math(vector_to_latex(statein)))

In [None]:
qc.append(Uw,[0,1,2])
ostate = Statevector(qc)
array_to_latex(ostate, prefix="\\text{Oracle output state = }")
#print("Oracle output state")
#display(statein.draw(output = 'latex'))
#display(Math(vector_to_latex(ostate)))

We can see that the coefficient corresponding to the state we are looking for has changed sign. Now we invert this state about its mean, also known as "Grover Diffusion". The [unitary which does this](https://docs.quantum.ibm.com/api/qiskit/qiskit.circuit.library.GroverOperator) is $U_s = 2|s\rangle\langle s| - \mathbb{1}$ (see below for how this looks).

In [None]:
qc.barrier()
# Apply transformation |s> -> |00..0> (H-gates)
for qubit in range(3):
    qc.h(qubit)
# Apply transformation |00..0> -> |11..1> (X-gates)
for qubit in range(3):
    qc.x(qubit)
# Do multi-controlled-Z gate
qc.h(2)
qc.ccx(0,1,2) # toffoli
qc.h(2)
# Apply transformation |11..1> -> |00..0>
for qubit in range(3):
    qc.x(qubit)
# Apply transformation |00..0> -> |s>
for qubit in range(3):
    qc.h(qubit)
    
qc.barrier()

svsim = Aer.get_backend('statevector_simulator')
dstate = Statevector(qc)
array_to_latex(dstate, prefix="\\text{Grover diffused state = }")
#print("Grover diffused state")
#display(dstate.draw(output = 'latex'))
#display(Math(vector_to_latex(dstate)))

Now the unwanted states have coefficients of $(-)1/\sqrt{32}$ and the $\omega$ state has clearly increased in magnitude.

In [None]:
qc = QuantumCircuit(3)
qc.h(0)
qc.h(1)
qc.h(2)
qc.append(Uw,[0,1,2])
qc.barrier()
# Apply transformation |s> -> |00..0> (H-gates)
for qubit in range(3):
    qc.h(qubit)
# Apply transformation |00..0> -> |11..1> (X-gates)
for qubit in range(3):
    qc.x(qubit)
# Do multi-controlled-Z gate
qc.h(2)
qc.ccx(0,1,2) # toffoli
qc.h(2)
# Apply transformation |11..1> -> |00..0>
for qubit in range(3):
    qc.x(qubit)
# Apply transformation |00..0> -> |s>
for qubit in range(3):
    qc.h(qubit)    

qc.measure_all()
qc.draw()

In [None]:
svsim = Aer.get_backend('statevector_simulator')
state = svsim.run(qc).result().get_statevector()
results = svsim.run(qc, shots=100, memory=False).result()
counts = results.get_counts()
plot_histogram(counts)

The algorithm has found $\omega$. To find $\omega$ with better certainty we could run the oracle and the diffuser twice:

In [None]:
qc = QuantumCircuit(3)
qc.h(0)
qc.h(1)
qc.h(2)
qc.append(Uw,[0,1,2])
qc.barrier()
# Apply transformation |s> -> |00..0> (H-gates)
for qubit in range(3):
    qc.h(qubit)
# Apply transformation |00..0> -> |11..1> (X-gates)
for qubit in range(3):
    qc.x(qubit)
# Do multi-controlled-Z gate
qc.h(2)
qc.ccx(0,1,2) 
qc.h(2)
# Apply transformation |11..1> -> |00..0>
for qubit in range(3):
    qc.x(qubit)
# Apply transformation |00..0> -> |s>
for qubit in range(3):
    qc.h(qubit)    
    
qc.barrier()
qc.append(Uw,[0,1,2])
qc.barrier()

svsim = Aer.get_backend('statevector_simulator')
dstate = svsim.run(qc).result().get_statevector()
array_to_latex(dstate, prefix="\\text{After second application of U = }")

In [None]:
qc = QuantumCircuit(3)
qc.h(0)
qc.h(1)
qc.h(2)
qc.append(Uw,[0,1,2])
qc.barrier()
# Apply transformation |s> -> |00..0> (H-gates)
for qubit in range(3):
    qc.h(qubit)
# Apply transformation |00..0> -> |11..1> (X-gates)
for qubit in range(3):
    qc.x(qubit)
# Do multi-controlled-Z gate
qc.h(2)
qc.ccx(0,1,2) 
qc.h(2)
# Apply transformation |11..1> -> |00..0>
for qubit in range(3):
    qc.x(qubit)
# Apply transformation |00..0> -> |s>
for qubit in range(3):
    qc.h(qubit)    

qc.barrier()
qc.append(Uw,[0,1,2])
qc.barrier()
# Apply transformation |s> -> |00..0> (H-gates)
for qubit in range(3):
    qc.h(qubit)
# Apply transformation |00..0> -> |11..1> (X-gates)
for qubit in range(3):
    qc.x(qubit)
# Do multi-controlled-Z gate
qc.h(2)
qc.ccx(0,1,2) 
qc.h(2)
# Apply transformation |11..1> -> |00..0>
for qubit in range(3):
    qc.x(qubit)
# Apply transformation |00..0> -> |s>
for qubit in range(3):
    qc.h(qubit)    

qc.measure_all()
qc.draw()

In [None]:
svsim = Aer.get_backend('statevector_simulator')
state = svsim.run(qc).result().get_statevector()
results = svsim.run(qc, shots=100, memory=False).result()
counts = results.get_counts()
plot_histogram(counts)

The classical approach would have been to look through all the $N$ possible states one by one until we found the solution, so the time taken would be of the order $N$. For a small number of qubits it seems as though the correct result is immediately apparent but as this example shows, we have a better chance of actually measuring the correct result if we run the oracle and diffuser a second time. 

In general it turns out that to have a good probability of measuring the correct result in the case of a large number of qubits, we need to repeatedly run the oracle and diffuser, such that the algorithm scales as $\sqrt{N}$. 

It is often the case that we can quickly check whether we have the correct result, so we could accept some small chance of getting the wrong answer and having to run the whole thing again.

The kinds of problem in which the only way to find a solution is to try all the possibilities, but if we have an answer we can quickly (polynomial time) check if it is correct, is "NP-complete". It is not clear if Grover's algorithm provides a way to solve these.

https://cnot.io/quantum_algorithms/grover/using_grovers_algorithm.html

## Counterfactual quantum computation

The physicist blogger [Sabine Hossenfelder](http://backreaction.blogspot.com/2022/08/how-to-compute-with-computer-that.html) explains that it seems to be possible to find out what a computer will do without actually running it: [Counterfactual quantum computation through quantum interrogation](https://www.nature.com/articles/nature04523). This seems to be related to the [bomb tester](https://en.wikipedia.org/wiki/Elitzur%E2%80%93Vaidman_bomb_tester).

## Diffuser codes

The diffuser is  $U_s = 2|s\rangle\langle s| - \mathbb{1}$. $|s\rangle$ is the fully superposed state so the outer product $|s\rangle\langle s|$ creates a fully superposed matrix, then the diagonal is then picked out with the opposite sign, $- \mathbb{1}$. These codes are taken from [the Qiskit textbook](https://learning.quantum.ibm.com/tutorial/grovers-algorithm).

## The 2-bit diffuser:

In [None]:
qc = QuantumCircuit(2)
qc.h([0,1])
qc.z([0,1])
qc.cz(0,1)
qc.h([0,1])
usim = Aer.get_backend('unitary_simulator')
unitary = usim.run(qc).result().get_unitary()
array_to_latex(unitary, prefix="U_s = \n")

In [None]:
qc.draw()

## The 3-bit diffuser:

In [None]:
qc = QuantumCircuit(3)
qc.barrier()
# Apply transformation |s> -> |00..0> (H-gates)
for qubit in range(3):
    qc.h(qubit)
# Apply transformation |00..0> -> |11..1> (X-gates)
for qubit in range(3):
    qc.x(qubit)
# Do multi-controlled-Z gate
qc.h(2)
qc.ccx(0,1,2) # toffoli
qc.h(2)
# Apply transformation |11..1> -> |00..0>
for qubit in range(3):
    qc.x(qubit)
# Apply transformation |00..0> -> |s>
for qubit in range(3):
    qc.h(qubit)
    
qc.barrier()
usim = Aer.get_backend('unitary_simulator')
unitary = usim.run(qc).result().get_unitary()
array_to_latex(unitary, prefix="U_s = \n")

In [None]:
qc.draw()

## Creating the unitary from a conventional quantum oracle

A more natural way to create this unitary might be to start with a more conventional oracle which gives $1$ on the output qubit $q_3$ for the input $101$ on $q_0$, $q_1$, $q_2$. To do this in general we can use a multi-controlled $X$ gate. Within the oracle, whichever qubit needs to be $1$ can be passed directly through, while we apply $x$ to any qubit which needs to be $0$. $X$ is applied a second time to make the oracle reversible and hide its effects on the qubits.

In [None]:
from qiskit.circuit.library import MCXGate
mcx = MCXGate(3)

In [None]:
f_in = '101'

qc = QuantumCircuit(4)
if f_in[0]=='1': qc.x(0)
if f_in[1]=='1': qc.x(1)
if f_in[2]=='1': qc.x(2)

qc.barrier()
qc.x(1)
qc.append(mcx,[0,1,2,3])
qc.x(1)
qc.barrier()

qc.draw()

In [None]:
svsim = Aer.get_backend('statevector_simulator')
state = svsim.run(qc).result().get_statevector()
result = svsim.run(qc, shots=1, memory=True).result()
print('results: {}'.format(result.get_counts()))
if len(result.get_counts()) > 1:
    raise('superposition')

measure = (list(result.get_counts().keys())[0])[::-1] # reverse to fit qiskit's qubit ordering
print('input: {}{}{}'.format(measure[0],measure[1],measure[2]))
print('output: {}'.format(measure[3]))

If we put output qubit into $|-\rangle$ before we run the oracle, and then put it back into $|0\rangle$ at the end to make the state vectors easier to compare:

In [None]:
f_in = '101'

qc = QuantumCircuit(4)
if f_in[0]=='1': qc.x(0)
if f_in[1]=='1': qc.x(1)
if f_in[2]=='1': qc.x(2)
    
statein = Statevector(qc)
# array_to_latex(statein, prefix="\\text{Initial state = }")
print("Initial state")
display(statein.draw(output = 'latex'))
display(Math(vector_to_latex(statein)))

In [None]:
qc = QuantumCircuit(4)
if f_in[0]=='1': qc.x(0)
if f_in[1]=='1': qc.x(1)
if f_in[2]=='1': qc.x(2)

qc.x(3)
qc.h(3)

qc.barrier()
qc.x(1)
qc.append(mcx,[0,1,2,3])
qc.x(1)
qc.barrier()
qc.h(3)
qc.x(3)

qc.draw()

In [None]:
state = Statevector(qc)
# array_to_latex(statein, prefix="\\text{Initial state = }")
# array_to_latex(state, prefix="\\text{Final state = }")
print("Initial state")
display(statein.draw(output = 'latex'))
display(Math(vector_to_latex(statein)))
print("Final state")
display(state.draw(output = 'latex'))
display(Math(vector_to_latex(state)))


So we can try Grover's algorithm with this oracle. We put the oracle's input qubits into $|s\rangle = |+++\rangle$ and the oracle's output qubit into $|-\rangle$, then apply the 3-qubit diffuser to the input qubits. Finally we measure the final state of the input qubits. For simplicity we ignore the oracle's output qubit but we could of course perform $XH$ to put this back into $|0\rangle$ as we did before.

In [None]:
qc = QuantumCircuit(4,3)
qc.x(3)
for q in range(4):
    qc.h(q)

qc.barrier()
qc.x(1)
qc.append(mcx,[0,1,2,3])
qc.x(1)
qc.barrier()

# Apply transformation |s> -> |00..0> (H-gates)
for qubit in range(3):
    qc.h(qubit)
# Apply transformation |00..0> -> |11..1> (X-gates)
for qubit in range(3):
    qc.x(qubit)
# Do multi-controlled-Z gate
qc.h(2)
qc.ccx(0,1,2) 
qc.h(2)
# Apply transformation |11..1> -> |00..0>
for qubit in range(3):
    qc.x(qubit)
# Apply transformation |00..0> -> |s>
for qubit in range(3):
    qc.h(qubit)

qc.barrier()

for q in range(3):
    qc.measure(q,q)
    
qc.draw()

In [None]:
svsim = Aer.get_backend('statevector_simulator')
state = svsim.run(qc).result().get_statevector()
results = svsim.run(qc, shots=100, memory=False).result()
counts = results.get_counts()
plot_histogram(counts)

The algorithm has again found $\omega$. We can also run this twice to improve the chances that we get the correct solution:

In [None]:
qc = QuantumCircuit(4,3)
qc.x(3)
for q in range(4):
    qc.h(q)

qc.barrier()
qc.x(1)
qc.append(mcx,[0,1,2,3])
qc.x(1)
qc.barrier()

# Apply transformation |s> -> |00..0> (H-gates)
for qubit in range(3):
    qc.h(qubit)
# Apply transformation |00..0> -> |11..1> (X-gates)
for qubit in range(3):
    qc.x(qubit)
# Do multi-controlled-Z gate
qc.h(2)
qc.ccx(0,1,2) 
qc.h(2)
# Apply transformation |11..1> -> |00..0>
for qubit in range(3):
    qc.x(qubit)
# Apply transformation |00..0> -> |s>
for qubit in range(3):
    qc.h(qubit)

qc.barrier()
qc.x(1)
qc.append(mcx,[0,1,2,3])
qc.x(1)
qc.barrier()

# Apply transformation |s> -> |00..0> (H-gates)
for qubit in range(3):
    qc.h(qubit)
# Apply transformation |00..0> -> |11..1> (X-gates)
for qubit in range(3):
    qc.x(qubit)
# Do multi-controlled-Z gate
qc.h(2)
qc.ccx(0,1,2) 
qc.h(2)
# Apply transformation |11..1> -> |00..0>
for qubit in range(3):
    qc.x(qubit)
# Apply transformation |00..0> -> |s>
for qubit in range(3):
    qc.h(qubit)

qc.barrier()

for q in range(3):
    qc.measure(q,q)
    
qc.draw()

In [None]:
svsim = Aer.get_backend('statevector_simulator')
state = svsim.run(qc).result().get_statevector()
results = svsim.run(qc, shots=100, memory=False).result()
counts = results.get_counts()
plot_histogram(counts)

In these cases we have used a fairly simple unitary or oracle in which we directly encoded the solution we wanted to find. But in general, if we have a problem which we want to solve in terms of logical relationships between the qubits, we could set up these logic tests as an oracle and Grover's algorithm would find the inputs which satisfy the test.

## Unitaries

The unitary of the three-input multi-control $X$ is

In [None]:
qc = QuantumCircuit(4)
qc.append(mcx,[0,1,2,3])
usim = Aer.get_backend('unitary_simulator')
unitary = usim.run(qc).result().get_unitary()
array_to_latex(unitary, prefix="\\text{MCX = }\n", max_size=16)

The unitary of our $101$ oracle is

In [None]:
qc = QuantumCircuit(4)
qc.x(1)
qc.append(mcx,[0,1,2,3])
qc.x(1)
usim = Aer.get_backend('unitary_simulator')
unitary = usim.run(qc).result().get_unitary()
array_to_latex(unitary, prefix="\\text{101 Oracle = }\n", max_size=16)

We can see how the $X$ gates change which components of the statevector are swapped over. 

This is how our oracle is actually built:

In [None]:
from qiskit import transpile

qc = QuantumCircuit(4)
qc.x(1)
qc.append(mcx,[0,1,2,3])
qc.x(1)
qc.draw()

In [None]:
qc.decompose().draw()

In [None]:
qc.decompose().decompose().draw()

In [None]:
qc.decompose().decompose().decompose().draw()

And here's how it would be transpiled to the IBM hardware:

In [None]:
new_qc = transpile(qc, basis_gates=['rz', 'sx', 'x', 'cx'])

new_qc.draw()

Our "phase oracle" $U_\omega$ decomposes as

In [None]:
qc = QuantumCircuit(3)
qc.append(Uw,[0,1,2])
new_qc = transpile(qc, basis_gates=['rz', 'sx', 'x', 'cx'])
qc.decompose().decompose().decompose().decompose().draw()

... and transpiles as:

In [None]:
new_qc.draw()

## A simple two-qubit example:

In [None]:
win=2 # 0,1,2,3
iw=np.ones(4)
for i in range(4):
    if i==win:
        iw[i]=-1.0
        
print(iw)
print()

I2=np.eye(4)

U2matrix = I2*iw

print(U2matrix)

U2 = Operator(U2matrix)

#U2 = Operator([[1,0,0,0],[0,1,0,0],[0,0,-1,0],[0,0,0,1]]) # to manually create the unitary
array_to_latex(U2, prefix="U_2 = ")

We note that the case for 3 being the winning state is the unitary corresponding to a $CZ$ gate:

In [None]:
qc = QuantumCircuit(2)
qc.cz(0,1)
display(qc.draw())
usim = Aer.get_backend('unitary_simulator')
unitary = usim.run(qc).result().get_unitary()
array_to_latex(unitary, prefix="U_s = \n")

We can change the winning state by applying $X$ gates to one or both of the qubits:

In [None]:
qc = QuantumCircuit(2)
if win==0 or win==2:
    qc.x(0)

if win==0 or win==1:
    qc.x(1)

qc.cz(0,1)

if win==0 or win==2:
    qc.x(0)

if win==0 or win==1:
    qc.x(1)

display(qc.draw())
usim = Aer.get_backend('unitary_simulator')
unitary = usim.run(qc).result().get_unitary()
array_to_latex(unitary, prefix="U_s = \n")

For example, for the case of $2 = |q_1 q_0 \rangle = |1 0\rangle$ being the winning state, we apply $I \otimes X$ both before and after the $CZ$ gate.

$$I \otimes X = 
\begin{bmatrix} X & 0 \\
               0 & X\\
\end{bmatrix} = 
\begin{bmatrix} 0 & 1 & 0 & 0 \\
               1 & 0 & 0 & 0 \\
               0 & 0 & 0 & 1 \\
               0 & 0 & 1 & 0 \\
\end{bmatrix}
$$

This has the desired effect of swapping rows 3 and 4 in the $CZ$ matrix:

$$CZ (I \otimes X) = 
\begin{bmatrix} 1 & 0 & 0 & 0 \\
               0 & 1 & 0 & 0 \\
               0 & 0 & 1 & 0 \\
               0 & 0 & 0 & -1 \\
\end{bmatrix} \begin{bmatrix} 0 & 1 & 0 & 0 \\
               1 & 0 & 0 & 0 \\
               0 & 0 & 0 & 1 \\
               0 & 0 & 1 & 0 \\
\end{bmatrix}
 = \begin{bmatrix} 0 & 1 & 0 & 0 \\
               1 & 0 & 0 & 0 \\
               0 & 0 & 0 & 1 \\
               0 & 0 & -1 & 0 \\
\end{bmatrix}$$

$$(I \otimes X) CZ (I \otimes X) = \begin{bmatrix} 0 & 1 & 0 & 0 \\
               1 & 0 & 0 & 0 \\
               0 & 0 & 0 & 1 \\
               0 & 0 & 1 & 0 \\
\end{bmatrix}\begin{bmatrix} 0 & 1 & 0 & 0 \\
               1 & 0 & 0 & 0 \\
               0 & 0 & 0 & 1 \\
               0 & 0 & -1 & 0 \\
\end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \\
               0 & 1 & 0 & 0 \\
               0 & 0 & -1 & 0 \\
               0 & 0 & 0 & 1 \\
\end{bmatrix}$$

In [None]:
qcix = QuantumCircuit(2)
qcix.x(0)
qcix.id(1)
usim = Aer.get_backend('unitary_simulator')
IX = usim.run(qcix).result().get_unitary()

qccz = QuantumCircuit(2)
qccz.cz(0,1)
CZ = usim.run(qccz).result().get_unitary()

print('I otimes X:')
print(np.real(IX))
print()
print('CZ(I otimes X):')
print(np.real((CZ.data)@(IX.data)))
print()
print('(I otimes X)CZ(I otimes X):')
print(np.real((IX.data)@(CZ.data)@(IX.data)))
print()

print('Our unitary:')
print(U2matrix)


In order to complete the circuit we need to implement the additional reflection $U_s = 2|s\rangle\langle s| - \mathbb{1}$. Since this is a reflection about $|s\rangle$, we want to add a negative phase to every state orthogonal to $|s\rangle$. 

One way we can do this is to use the operation that transforms the state $|s\rangle \rightarrow |0\rangle$, which we already know is the Hadamard gate applied to each qubit:

$$H^{\otimes n}|s\rangle = |0\rangle$$

Then we apply a circuit that adds a negative phase to the states orthogonal to $|0\rangle$:

$$U_0 \frac{1}{2}\left( \lvert 00 \rangle + \lvert 01 \rangle + \lvert 10 \rangle + \lvert 11 \rangle \right) = \frac{1}{2}\left( \lvert 00 \rangle - \lvert 01 \rangle - \lvert 10 \rangle - \lvert 11 \rangle \right)$$

i.e. the signs of each state are flipped except for $\lvert 00 \rangle$. As can easily be verified, one way of implementing $U_0$ is the following circuit:

In [None]:
qcru = QuantumCircuit(2)
qcru.z(0)
qcru.z(1)
qcru.cz(0,1)
qcru.draw()

Finally, we do the operation that transforms the state $|0\rangle \rightarrow |s\rangle$ (the H-gate again):

$$H^{\otimes n}U_0 H^{\otimes n} = U_s$$

The complete circuit for $U_s$ looks like this:

In [None]:
qcdiff = QuantumCircuit(2)
qcdiff.h([0,1])
qcdiff.z([0,1])
qcdiff.cz(0,1)
qcdiff.h([0,1])
qcdiff.draw()

In [None]:
qc = QuantumCircuit(2)
qc.h(0)
qc.h(1)
qc.append(U2,[0,1])
qc.h([0,1])
qc.z([0,1])
qc.cz(0,1)
qc.h([0,1])
display(qc.draw())
svsim = Aer.get_backend('statevector_simulator')
dstate = svsim.run(qc).result().get_statevector()
array_to_latex(dstate, prefix="\\text{output = }")

Full 2-qubit circuit:

In [None]:
win=2 # 0,1,2,3
iw=np.ones(4)
for i in range(4):
    if i==win:
        iw[i]=-1.0
        
#print(iw)
#print()

I2=np.eye(4)

U2matrix = I2*iw

U2 = Operator(U2matrix)

qc = QuantumCircuit(2)
qc.h(0)
qc.h(1)
#qc.append(U2,[0,1])

qc.barrier()
if win==0 or win==2:
    qc.x(0)

if win==0 or win==1:
    qc.x(1)

qc.cz(0,1)

if win==0 or win==2:
    qc.x(0)

if win==0 or win==1:
    qc.x(1)
    
qc.barrier()

qc.h([0,1])
qc.z([0,1])
qc.cz(0,1)
qc.h([0,1])
display(qc.draw())
svsim = Aer.get_backend('statevector_simulator')
dstate = svsim.run(qc).result().get_statevector()
array_to_latex(dstate, prefix="\\text{output = }")

In [None]:
import time
from sys import modules
from IPython.display import HTML, display
import qiskit
from qiskit.utils import local_hardware_info
html = "<h3>Version Information</h3>"
html += "<table>"
html += "<tr><th>Software</th><th>Version</th></tr>"

packages = {"qiskit": qiskit.__version__}
qiskit_modules = {module.split(".")[0] for module in modules.keys() if "qiskit" in module}

for qiskit_module in qiskit_modules:
    packages[qiskit_module] = getattr(modules[qiskit_module], "__version__", None)

for name, version in packages.items():
    if version:
        html += f"<tr><td><code>{name}</code></td><td>{version}</td></tr>"

html += "<tr><td colspan='2'>%s</td></tr>" % time.strftime("%a %b %d %H:%M:%S %Y %Z")
html += "</table>"

display(HTML(html))