Author: Edoardo Gabrielli

Contact: gabrielli.1693726@studenti.uniroma1.it

Based on the work of P.Sudheer Kumar Reddy and G. Saini for the paper "Design of Carry Select Adder with Online Testability Using Reversible Gates, 2019".

In [2]:
from qiskit import QuantumCircuit, transpile, QuantumRegister, ClassicalRegister, Aer

# Define Gates

### Peres Gate

In [8]:
"""
Input: a, b, c

Output:
P = a
Q = a + b
R = a*b + c
"""
c = QuantumCircuit(3)
c.ccx(0,1,2)
c.cx(0,1)
peres = c.to_gate(label="Peres")
c.draw()

In [9]:
def generate_peres(a,b,c):
    circuit = QuantumCircuit(3, 0)
    if (a == '1'):
        circuit.x(0)
    if (b == '1'):
        circuit.x(1)
    if (c == '1'):
        circuit.x(2)
    circuit.ccx(0,1,2)
    circuit.cx(0,1)
    circuit.measure_all()
    circuit = circuit.reverse_bits()
    return circuit

inputs = ['0', '1']
for i in range(len(inputs)):
    for j in range(len(inputs)):
        for k in range(len(inputs)):
            c = generate_peres(inputs[i], inputs[j], inputs[k])
            usim = Aer.get_backend('unitary_simulator')
            transpiled = transpile(c, backend=usim)
            backend = Aer.get_backend('aer_simulator')
            job = backend.run(transpiled, shots=1, memory=True)
            output = job.result().get_memory()[0]
            print("in: "+inputs[i]+inputs[j]+inputs[k]+" r: "+output)

in: 000 r: 000
in: 001 r: 001
in: 010 r: 010
in: 011 r: 011
in: 100 r: 110
in: 101 r: 111
in: 110 r: 101
in: 111 r: 100


In [10]:
c.draw()

### Full Adder

In [302]:
"""
Input: 
a = 1st operand, 
b = 2nd operand, 
c = 0, 
d = c_in

Output: 
P = garbage, 
Q = garbage, 
R = sum, 
T = carry
"""
c = QuantumCircuit(4)
c.append(peres, [0,1,2])
c.swap(2,3)
c.append(peres, [1,2,3])
#c = c.reverse_bits()
fa = c.to_gate(label="RFA")
c.draw()

In [303]:
c = QuantumCircuit(4)
c.append(fa, [0,1,2,3])
transpiled = transpile(c, backend=usim)
transpiled.draw()

In [304]:
def generate_fa(a,b,d):
    '''
    a = first addend
    b = second addend
    d = carry input
    '''
    circuit = QuantumCircuit(4, 2)
    if (a == '1'):
        circuit.x(0)
    if (b == '1'):
        circuit.x(1)
    if (d == '1'):
        circuit.x(3)
    circuit.append(peres, [0,1,2])
    circuit.append(peres, [1,3,2])
    circuit.measure(3,0)
    circuit.measure(2,1)
    #circuit.measure_all()
    return circuit

inputs = ['0','1']
for i in range(len(inputs)):
    for j in range(len(inputs)):
        for k in range(len(inputs)):
            c = generate_fa(inputs[i], inputs[j], inputs[k])
            usim = Aer.get_backend('unitary_simulator')
            transpiled = transpile(c, backend=usim)
            backend = Aer.get_backend('aer_simulator')
            job = backend.run(transpiled, shots=1, memory=True)
            output = job.result().get_memory()[0]
            print("in: "+inputs[i]+inputs[j]+inputs[k]+" r: "+output)

in: 000 r: 00
in: 001 r: 01
in: 010 r: 01
in: 011 r: 10
in: 100 r: 01
in: 101 r: 10
in: 110 r: 10
in: 111 r: 11


# CSA Design-1

Each stage of the CSA can be designed as follows:

In [305]:
"""
This circuit represents a single stage of the CSA.

Input:
1 = first operand, represents "a"
2 = second operand, represents "b"
3 = 0, ancilla, copy of "a"
4 = 0, ancilla, copy of "b"
5 = 0, represents "c=0"
6 = 1, represents "c=1"
7 = 0, ancilla
8 = 0, ancilla
9 = input carry
10 = 0, ancilla

Output:
1 = g
2 = g
3 = g
4 = g
5 = carry
6 = g
7 = sum
8 = g
9 = carry
10 = g

5 ancilla bits and 8 garbage bits if we include the copy of the carry.
"""

a_in = QuantumRegister(1, name="a")
b_in = QuantumRegister(1, name="b")
a_copies = QuantumRegister(1, name="a'")
b_copies = QuantumRegister(1, name="b'")
carries = QuantumRegister(2, name="c")
zero = QuantumRegister(2, name="zero")
c_in = QuantumRegister(2, name="c_in")

csa = QuantumCircuit(a_in, b_in, a_copies, b_copies, carries, zero, c_in)
csa.reset(a_copies[0])
csa.reset(b_copies[0])
csa.cx(a_in[0], a_copies[0])                                        # a' = a
csa.cx(b_in[0], b_copies[0])                                        # b' = b
csa.reset(carries[0])                                               # c = 0
csa.reset(carries[1])
csa.x(carries[1])                                                   # c = 1
csa.reset(zero[0])
csa.reset(zero[1])
csa.reset(c_in[1])                                                  # c_in' = c_in

csa.cx(c_in[0], c_in[1])
csa.append(fa, [a_in[0], b_in[0], zero[0], carries[0]])
csa.append(fa, [a_copies[0], b_copies[0], zero[1], carries[1]])
#csa.append(fredkin, [c_in[1], zero[0], zero[1]])                    # mux for the sum qubits
#csa.append(fredkin, [c_in[0], carries[0], carries[1]])              # mux for the carry qubits
csa.fredkin(c_in[1], zero[0], zero[1])
csa.fredkin(c_in[0], carries[0], carries[1])

csa.reset(c_in[0])                                                  # Recycle c_in for the output of buffers

csa.cx(carries[0], c_in[0])                                         # Buffer

csa.draw()

Note that although the logic gates are represented in series, the circuit will be re-arranged by the transpiler in order to execute gates in parallel where possible. For instance, if you swap the position of "b" with "a'" it is clear that the two CNOTs at the beginning of the circuit can be run simultaneously.

The quantum logic gate CSA has number of input and output of 10 and we have different options to combine the stages together:

A first approach is by remarking the classical CSA exploiting the fact that all the FA operations can be done in parallel, but that means using a lot of qubits. As a matter of fact, if we want to make a two stage CSA, we have to connect the input qubits "i" and "l" (the carries propagated by the first stage) of the second CSA to the output T and X of the first one. To force the parallel execution of the four RFAs instead, we have to use 8 qubits for the first stage and 8 more qubits for the second stage. Therefore the total number of qubits used to implement the circuit is 8n+2. 

The number of garbage bits is given by 8n+2-(n+1) = 7n+1, where n+1 is given by: "n" for the number of bits representing the result and 1 for the carry out. Ancilla bits are instead 5n, because we have n copy of "a", n copy of "b", 2n copy zeros and n copies of input carries.

Another approach is to run each stage in series, reducing the number of qubits used to 10 (fixed, regardless of the number of stages), but in this way the CSA looses all its advantages because the delay grows with the number of stages.

In [307]:
transpiled = transpile(csa, backend=usim)
transpiled.draw()

In [267]:
transpiled.depth()

11

## Iterative construction of Design-1

Based on the discussion above, here's an iterative fucntion to build CSAs operating on n-qubits numbers. Measurments and output registers to store classical bits are added to test the circuit.

In [309]:
def generate_CSA1_n_qubits(n, a, b, c, opt=False, measure=False):
    '''
    n = length of addends
    a = first addend
    b = second addend
    c = carry in
    opt = if true, apply the optimization to save one qubit
    measure = if true, measure results and initialize qubits
    '''
    
    if len(a) != n or len(b) != n:
        return "Length of inputs differs from length of n."
    a = a[::-1]
    b = b[::-1]

    a_in = QuantumRegister(n, name="a")
    b_in = QuantumRegister(n, name="b")
    a_copies = QuantumRegister(n, name="a'")
    b_copies = QuantumRegister(n, name="b'")
    carries = QuantumRegister(n, name="c")
    carries_copies = QuantumRegister(n, name="c'")
    zero = QuantumRegister(n, name="zero")
    zero_copies = QuantumRegister(n, name="zero'")
    if (not opt):
        zero_buffers = QuantumRegister(n-1, name="zero_buff")
    c_in = QuantumRegister(2, name="c_in")

    output = ClassicalRegister(n+1, name='output')

    if (opt):
        csa = QuantumCircuit(a_in, b_in, a_copies, b_copies, carries, carries_copies, zero, zero_copies, c_in, output)
    else:
        csa = QuantumCircuit(a_in, b_in, a_copies, b_copies, carries, carries_copies, zero, zero_copies, zero_buffers, c_in, output)

    if (measure):
        if (c == '1'):  
            csa.x(c_in[0])
            
        for i in range(n):
            if (a[i] == '1'):
                csa.x(a_in[i])
            if (b[i] == '1'):
                csa.x(b_in[i]) 
            csa.reset(a_copies[i])
            csa.reset(b_copies[i])
            csa.cx(a_in[i], a_copies[i])                                                        # a' = a
            csa.cx(b_in[i], b_copies[i])                                                        # b' = b
            csa.reset(carries[i])                                                               # c = 0
            csa.reset(carries_copies[i])
            csa.x(carries_copies[i])                                                            # c = 1
            csa.reset(zero[i])
            csa.reset(zero_copies[i])
        csa.reset(c_in[1])                                                                      # c_in' = c_in
        csa.cx(c_in[0], c_in[1])
    
    for i in range(n):
        csa.append(fa, [a_in[i], b_in[i], zero[i], carries[i]])
        csa.append(fa, [a_copies[i], b_copies[i], zero_copies[i], carries_copies[i]])
        if i == 0:
            csa.fredkin(c_in[1], zero[i], zero_copies[i])                                    # mux for the sum qubits
        else:
            csa.fredkin(carries[i-1], zero[i], zero_copies[i])
        if (opt or i == 0):
            csa.fredkin(c_in[0], carries[i], carries_copies[i])                                  # mux for the carry qubits
        else:
            csa.fredkin(zero_buffers[i-1], carries[i], carries_copies[i])

        if (measure):
            csa.measure(zero[i], output[i])

        if (i < n-1):
            if (opt):
                csa.reset(c_in[0])                                                              # Recycle c_in for the output of buffers
                csa.cx(carries[i], c_in[0])                                                     # Buffer
            else:
                csa.cx(carries[i], zero_buffers[i])
        else:
            if (measure):
                csa.measure(carries[i], output[i+1])                                            # Measure the carry out if it's the last stage
    #csa.measure_all()
    return csa

This is how a two-qubit CSA looks like:

In [265]:
csa2q = no_opt_generate_CSA1_n_qubits(2, '00', '00', '0')
csa2q.draw()

## Correctness

Now I am going to test if the CSA exactly reproduce the truth table of a classical one-bit CSA, that is:

    a   b   c   |   out

    --------------------

    0   0   0   |   00

    0   0   1   |   01

    0   1   0   |   01

    0   1   1   |   10

    1   0   0   |   01

    1   0   1   |   10

    1   1   0   |   10

    1   1   1   |   11

Where the MSB of "out" is the carry out and the LSB is the sum. As you can see from the result below, the CSA works properly:
    

In [311]:
inputs = ['0', '1']
for i in range(len(inputs)):
    for j in range(len(inputs)):
        for k in range(len(inputs)):
            csa2q = generate_CSA1_n_qubits(1, inputs[i], inputs[j], inputs[k], measure=True)
            usim = Aer.get_backend('unitary_simulator')
            transpiled = transpile(csa2q, backend=usim)
            backend = Aer.get_backend('aer_simulator')
            job = backend.run(transpiled, shots=1, memory=True)
            output = job.result().get_memory()[0]
            print("a: "+inputs[i]+" b: "+inputs[j]+" c_in: "+inputs[k]+" r: "+output)

a: 0 b: 0 c_in: 0 r: 00
a: 0 b: 0 c_in: 1 r: 01
a: 0 b: 1 c_in: 0 r: 01
a: 0 b: 1 c_in: 1 r: 10
a: 1 b: 0 c_in: 0 r: 01
a: 1 b: 0 c_in: 1 r: 10
a: 1 b: 1 c_in: 0 r: 10
a: 1 b: 1 c_in: 1 r: 11


And this is the version with the optimization:

In [313]:
inputs = ['0', '1']
for i in range(len(inputs)):
    for j in range(len(inputs)):
        for k in range(len(inputs)):
            csa2q = generate_CSA1_n_qubits(1, inputs[i], inputs[j], inputs[k], opt=True, measure=True)
            usim = Aer.get_backend('unitary_simulator')
            transpiled = transpile(csa2q, backend=usim)
            backend = Aer.get_backend('aer_simulator')
            job = backend.run(transpiled, shots=1, memory=True)
            output = job.result().get_memory()[0]
            print("a: "+inputs[i]+" b: "+inputs[j]+" c_in: "+inputs[k]+" r: "+output)

a: 0 b: 0 c_in: 0 r: 00
a: 0 b: 0 c_in: 1 r: 01
a: 0 b: 1 c_in: 0 r: 01
a: 0 b: 1 c_in: 1 r: 10
a: 1 b: 0 c_in: 0 r: 01
a: 1 b: 0 c_in: 1 r: 10
a: 1 b: 1 c_in: 0 r: 10
a: 1 b: 1 c_in: 1 r: 11


Let's test now the two-qubit CSA and even this one works as intended:

In [314]:
inputs = ['00', '01', '10', '11']
carries = ['0', '1']
for i in range(len(inputs)):
    for j in range(len(inputs)):
        csa2q = generate_CSA1_n_qubits(2, inputs[i], inputs[j], '0', opt=False, measure=True)
        usim = Aer.get_backend('unitary_simulator')
        transpiled = transpile(csa2q, backend=usim)
        backend = Aer.get_backend('aer_simulator')
        job = backend.run(transpiled, shots=1, memory=True)
        output = job.result().get_memory()[0]
        print("a: "+inputs[i]+" b: "+inputs[j]+" r: "+output)

a: 00 b: 00 r: 000
a: 00 b: 01 r: 001
a: 00 b: 10 r: 010
a: 00 b: 11 r: 011
a: 01 b: 00 r: 001
a: 01 b: 01 r: 010
a: 01 b: 10 r: 011
a: 01 b: 11 r: 100
a: 10 b: 00 r: 010
a: 10 b: 01 r: 011
a: 10 b: 10 r: 100
a: 10 b: 11 r: 101
a: 11 b: 00 r: 011
a: 11 b: 01 r: 100
a: 11 b: 10 r: 101
a: 11 b: 11 r: 110


## Measurments

### Depth

    Baseline depth Ripple Carry Adder implementation of Cuccaro et al. (2004): 2n+6 (without measurments)
    
    Expectations:                           Reality:

    OPT   MEASURE   |   RES                 OPT   MEASURE   |   RES                 

    ------------------------                ------------------------

    F       F       |   2n+3                F       F       |   2n+2

    F       T       |   2n+5                F       T       |   2n+4

    T       F       |   3n+3                T       F       |   3n+1

    T       T       |   3n+5                T       T       |   3n+3

    Authors' RCA depth (no measuree, no opt of course): 2n+3

In [360]:
def evluate_circuit(n, circuitGenerator):
    '''
    Evaluate circuit which accepts addends of length n.
    '''
    input = '0' * n
    circuit = circuitGenerator(n, input, input, '0', opt=True, measure=False)
    usim = Aer.get_backend('unitary_simulator')
    transpiled = transpile(circuit, backend=usim, optimization_level=3)
    print("Depth of circuit with addends of "+ str(n) +" qubits is: "+str(transpiled.depth()))

Depth of the circuit should be given by:
- To run the **RFA** we need 4 steps, regardless of the length of the addends since they can be run in parallel;
- **Fredking** gates need 1 step (they run in parallel within the same stage) as for the **CNOT** gates (used as buffers) and they are run in series, so in total it is 2n-1 (last stage has no CNOT);
- **Initialization of qubits** takes 1 step;
- **Measurments** add 1 step;
- Also the reset of c_in[0] used to save one wire add n step

The total is 4 + 2n - 1 + 1 + 1 + 1 = 2n + 6 = 2(n + 3). But if we remove the measurments and the initializations (that in the authors' circuit are not considered) and the reset of c_in[0] we get **2n + 3**.

Total in authors' circuit: 2n - 1 + 4 = **2n + 3**

In [362]:
for i in range(2,17):
    evluate_circuit(i, generate_CSA1_n_qubits)

Depth of circuit with addends of 2 qubits is: 7
Depth of circuit with addends of 3 qubits is: 10
Depth of circuit with addends of 4 qubits is: 13
Depth of circuit with addends of 5 qubits is: 16
Depth of circuit with addends of 6 qubits is: 19
Depth of circuit with addends of 7 qubits is: 22
Depth of circuit with addends of 8 qubits is: 25
Depth of circuit with addends of 9 qubits is: 28
Depth of circuit with addends of 10 qubits is: 31
Depth of circuit with addends of 11 qubits is: 34
Depth of circuit with addends of 12 qubits is: 37
Depth of circuit with addends of 13 qubits is: 40
Depth of circuit with addends of 14 qubits is: 43
Depth of circuit with addends of 15 qubits is: 46
Depth of circuit with addends of 16 qubits is: 49


In [359]:
for i in range(2,17):
    evluate_circuit(i, generate_CSA1_n_qubits)

Depth of circuit with addends of 2 qubits is: 6
Depth of circuit with addends of 3 qubits is: 8
Depth of circuit with addends of 4 qubits is: 10
Depth of circuit with addends of 5 qubits is: 12
Depth of circuit with addends of 6 qubits is: 14
Depth of circuit with addends of 7 qubits is: 16
Depth of circuit with addends of 8 qubits is: 18
Depth of circuit with addends of 9 qubits is: 20
Depth of circuit with addends of 10 qubits is: 22
Depth of circuit with addends of 11 qubits is: 24
Depth of circuit with addends of 12 qubits is: 26
Depth of circuit with addends of 13 qubits is: 28
Depth of circuit with addends of 14 qubits is: 30
Depth of circuit with addends of 15 qubits is: 32
Depth of circuit with addends of 16 qubits is: 34


Depth is lower than expected because the transpiler relizes that while computing the last stage of the RFAs (the one with CNOTs), it can run some of the Fredking gates.

In [349]:
n=2
input = '0' * n
circuit = generate_CSA1_n_qubits(n, input, input, '0', measure=True)
usim = Aer.get_backend('unitary_simulator')
transpiled = transpile(circuit, backend=usim, optimization_level=3)
print("Depth of circuit with addends of "+ str(n) +" qubits is: "+str(transpiled.depth()))
transpiled.draw()

Depth of circuit with addends of 2 qubits is: 8


### Number of basis gates

# CSA Design-2

This design drops the Full Adder gates in favour of cheaper gates in terms of quantum cost. Furthermore, according to the authors, this design should be cheaper in terms of ancilla bits, but is actually more expensive as shown below. The reasons for that might be either because my implementation is not optimal or because the authors did not actually implement the circuit. Indeed according to their scheme (Fig. 8 in the paper) the ancilla bits are:

1. The zero in the first peres gate
2. The one in the second peres gate
3. And the zero in the CNOT (buffer)

Note that for the authors ancilla bits seems to have a different meaning in this design, because if the meaning was the same then the number of ancilla bits would not have been diffirent. However the authors do not take into account that in order to run in parallel the CNOT gates and the Peres gates (as the CLA would do so as the paper let intended), the input bits "a" and "b" should be copied on extra lines, as I did in the first design. The alternative, in order to lower ancilla bits (and thus the garbage bits), is to run in series some of the gates and that means increasing the delay of the circuit. This is not mentioned in the paper, thus I do not consider this option.

About garbage bits, I achieved to have 9 bits per stage because I recycle one line in order to give in input the zero to the CNOT buffer.

In total I have 10n+2 qubits, 8n+1 ancilla bits and 9n+1 garbage bits.

Even thought results are not the same as in the paper, I agree with the authors when they say this design is better. As a metter of fact the circuit uses less and simpler gates: 3n CNOTs, 2n Peres and 2n Fredkins (ignoring initialization gates which can be omitted). On the countrary Design-1 uses 2n RFA (4n Peres + 2n swap gates), 2n Fredkins and n-1 CNOTs. So, using more qubits, we can have a faster, simpler and cheaper (in terms of quantum cost) Carry Select Adder.

In [116]:
"""
This circuit represents a single stage of the CSA.

Input:
1 = first operand, represents "a"
2 = second operand, represents "b"
11 = input carry

Initialization of other states (ancilla):
3 = copy of a
4 = copy of b
5 = second copy of b
6 = not a
7 = not b
8 = copy of not a
9 = 0
10 = 1
12 = copy of input carry

Output:
1 = g
2 = sum
3 = g
4 = g
5 = g
6 = g
7 = copy of carry out
8 = g
9 = carry out
10 = g
11 = g
12 = g

9 ancilla bits and 9 garbage bits.
"""

a_in = QuantumRegister(1, name="a")
b_in = QuantumRegister(1, name="b")
a_copies = QuantumRegister(1, name="a'")
b_copies = QuantumRegister(1, name="b'")
b_copies2 = QuantumRegister(1, name="b''")
a_not = QuantumRegister(1, name="a_not")
b_not = QuantumRegister(1, name="b_not")
a_not_copies = QuantumRegister(1, name="a_not'")
zero = QuantumRegister(1, name="zero")
one = QuantumRegister(1, name="one")
c_in = QuantumRegister(2, name="c_in")

csa2 = QuantumCircuit(a_in, b_in, a_copies, b_copies, b_copies2, a_not, b_not, a_not_copies, zero, one, c_in)

# Initialization of qubit states
csa2.reset(a_copies[0])
csa2.reset(b_copies[0])
csa2.reset(b_copies2[0])
csa2.cx(a_in[0], a_copies[0])
csa2.cx(b_in[0], b_copies[0])
csa2.cx(b_in[0], b_copies2[0])
csa2.reset(a_not[0])
csa2.cx(a_in[0], a_not[0])
csa2.reset(b_not[0])
csa2.cx(b_in[0], b_not[0])
csa2.x(a_not[0])
csa2.x(b_not[0])
csa2.reset(a_not_copies[0])
csa2.cx(a_not[0], a_not_copies[0])
csa2.reset(zero[0])
csa2.reset(one[0])
csa2.x(one[0])
csa2.reset(c_in[0])
csa2.reset(c_in[1])
csa2.cx(c_in[0], c_in[1])

# Circuit
csa2.barrier()
csa2.cx(a_in[0], b_in[0])
csa2.cx(a_not[0], b_copies[0])
#csa2.append(fredkin, [c_in[0], b_in[0], b_copies[0]])               # Sum is on b_in[0]
csa2.fredkin(c_in[0], b_in[0], b_copies[0])

csa2.append(peres, [a_copies[0], b_copies2[0], zero[0]])            # Carry 1 is on zero[0]
csa2.append(peres, [a_not_copies[0], b_not[0], one[0]])             # Carry 2 is on one[0]
#csa2.append(fredkin, [c_in[1], zero[0], one[0]])                    # Carry out is on zero[0]
csa2.fredkin(c_in[1], zero[0], one[0])

csa2.reset(b_not[0])                                                # Recycle a garbage bit generated by Peres gate
csa2.cx(zero[0], b_not[0])                                          # A copy of the carry out is in b_not[0]

csa2.draw()

In [117]:
def generate_CSA2_n_qubits(n, a, b, c):
    if len(a) != n or len(b) != n:
        return "Length of inputs differs from length of n."
    a = a[::-1]
    b = b[::-1]
    
    a_in = QuantumRegister(n, name="a")
    b_in = QuantumRegister(n, name="b")
    a_copies = QuantumRegister(n, name="a'")
    b_copies = QuantumRegister(n, name="b'")
    b_copies2 = QuantumRegister(n, name="b''")
    a_not = QuantumRegister(n, name="a_not")
    b_not = QuantumRegister(n, name="b_not")
    a_not_copies = QuantumRegister(n, name="a_not'")
    zero = QuantumRegister(n, name="zero")
    one = QuantumRegister(n, name="one")
    c_in = QuantumRegister(2, name="c_in")
    output = ClassicalRegister(n+1, name="output")

    csa2 = QuantumCircuit(a_in, b_in, a_copies, b_copies, b_copies2, a_not, b_not, a_not_copies, zero, one, c_in, output)

    if (c == '1'):
        csa2.x(c_in[0])
    for i in range(n):
        # Initialization of qubit states
        if (a[i] == '1'):
            csa2.x(a_in[i])
        if (b[i] == '1'):
            csa2.x(b_in[i])
        csa2.reset(a_copies[i])
        csa2.reset(b_copies[i])
        csa2.reset(b_copies2[i])
        csa2.cx(a_in[i], a_copies[i])
        csa2.cx(b_in[i], b_copies[i])
        csa2.cx(b_in[i], b_copies2[i])
        csa2.reset(a_not[i])
        csa2.cx(a_in[i], a_not[i])
        csa2.reset(b_not[i])
        csa2.cx(b_in[i], b_not[i])
        csa2.x(a_not[i])
        csa2.x(b_not[i])
        csa2.reset(a_not_copies[i])
        csa2.cx(a_not[i], a_not_copies[i])
        csa2.reset(zero[i])
        csa2.reset(one[i])
        csa2.x(one[i])
    csa2.reset(c_in[1])
    csa2.cx(c_in[0], c_in[1])
    #csa2.barrier()
        
    for i in range(n):
        # Circuit
        csa2.cx(a_in[i], b_in[i])
        csa2.cx(a_not[i], b_copies[i])
        if i == 0:
            #csa2.append(fredkin, [c_in[0], b_in[i], b_copies[i]])               # Sum is on b_in[0]
            csa2.fredkin(c_in[0], b_in[i], b_copies[i])
        else:
            #csa2.append(fredkin, [zero[i-1], b_in[i], b_copies[i]])
            csa2.fredkin(zero[i-1], b_in[i], b_copies[i])
        
        csa2.measure(b_in[i], output[i])

        csa2.append(peres, [a_copies[i], b_copies2[i], zero[i]])                # Carry 1 is on zero[0]
        csa2.append(peres, [a_not_copies[i], b_not[i], one[i]])                 # Carry 2 is on one[0]
        if i == 0:
            #csa2.append(fredkin, [c_in[1], zero[i], one[i]])                    # Carry is on zero[0]
            csa2.fredkin(c_in[1], zero[i], one[i])
        else:
            #csa2.append(fredkin, [b_not[i-1], zero[i], one[i]])
            csa2.fredkin(b_not[i-1], zero[i], one[i])

        if (i < n-1):
            csa2.reset(b_not[i])                                                # Recycle a garbage bit generated by Peres gate
            csa2.cx(zero[i], b_not[i])                                          # A copy of the carry is in b_not[0]
    csa2.measure(zero[i], output[i+1])
    return csa2

Results of summations are correct as shown below:

In [118]:
inputs = ['0', '1']
for i in range(len(inputs)):
    for j in range(len(inputs)):
        for k in range(len(inputs)):
            csa2 = generate_CSA2_n_qubits(1, inputs[i], inputs[j], inputs[k])
            usim = Aer.get_backend('unitary_simulator')
            transpiled = transpile(csa2, backend=usim)
            backend = Aer.get_backend('aer_simulator')
            job = backend.run(transpiled, shots=1, memory=True)
            output = job.result().get_memory()[0]
            print("a: "+inputs[i]+" b: "+inputs[j]+" c_in: "+inputs[k]+" r: "+output)

a: 0 b: 0 c_in: 0 r: 00
a: 0 b: 0 c_in: 1 r: 01
a: 0 b: 1 c_in: 0 r: 01
a: 0 b: 1 c_in: 1 r: 10
a: 1 b: 0 c_in: 0 r: 01
a: 1 b: 0 c_in: 1 r: 10
a: 1 b: 1 c_in: 0 r: 10
a: 1 b: 1 c_in: 1 r: 11


In [120]:
inputs = ['00', '01', '10', '11']
carries = ['0', '1']
for i in range(len(inputs)):
    for j in range(len(inputs)):
        csa2 = generate_CSA2_n_qubits(2, inputs[i], inputs[j], '0')
        usim = Aer.get_backend('unitary_simulator')
        transpiled = transpile(csa2, backend=usim)
        backend = Aer.get_backend('aer_simulator')
        job = backend.run(transpiled, shots=1, memory=True)
        output = job.result().get_memory()[0]
        print("a: "+inputs[i]+" b: "+inputs[j]+" r: "+output)

a: 00 b: 00 r: 000
a: 00 b: 01 r: 001
a: 00 b: 10 r: 010
a: 00 b: 11 r: 011
a: 01 b: 00 r: 001
a: 01 b: 01 r: 010
a: 01 b: 10 r: 011
a: 01 b: 11 r: 100
a: 10 b: 00 r: 010
a: 10 b: 01 r: 011
a: 10 b: 10 r: 100
a: 10 b: 11 r: 101
a: 11 b: 00 r: 011
a: 11 b: 01 r: 100
a: 11 b: 10 r: 101
a: 11 b: 11 r: 110


In [121]:
csa2.draw()

In [134]:
for i in range(2,17):
    evluate_circuit(i, generate_CSA2_n_qubits)

Depth of circuit with addends of 2 qubits is: 10
Depth of circuit with addends of 3 qubits is: 12
Depth of circuit with addends of 4 qubits is: 14
Depth of circuit with addends of 5 qubits is: 16
Depth of circuit with addends of 6 qubits is: 18
Depth of circuit with addends of 7 qubits is: 20
Depth of circuit with addends of 8 qubits is: 22
Depth of circuit with addends of 9 qubits is: 24
Depth of circuit with addends of 10 qubits is: 26
Depth of circuit with addends of 11 qubits is: 28
Depth of circuit with addends of 12 qubits is: 30
Depth of circuit with addends of 13 qubits is: 32
Depth of circuit with addends of 14 qubits is: 34
Depth of circuit with addends of 15 qubits is: 36
Depth of circuit with addends of 16 qubits is: 38


**Depth** of CSA Design-2 is 2*(n+3).