# Quantum-Enhanced DNA Sequencing

The second part of AlfiniQ begins with an efficient DNA sequencing of the person we are attempting to determine the futures for. There are two primary techniques for DNA sequencing: reference-guided and de novo. We will briefly describe the differences between both of them.

## Reference-guided vs. De Novo DNA Sequencing

### Reference-guided DNA Sequencing
- **Explanation**: Reference-guided sequencing involves aligning sequence reads obtained from a sample against a known reference genome.
- **Process**: Sequence reads are compared to a pre-existing genome sequence serving as a reference to identify variations.
- **Advantages**:
  - **Speed**: Faster due to alignment to a known template rather than de novo assembly.
  - **Efficiency**: Requires less computational power and memory.
  - **Detection of Variations**: Efficient detection of variants by comparing to a known standard.

### De Novo DNA Sequencing
- **Explanation**: De Novo sequencing involves assembling the entire genome sequence from scratch without using a reference genome.
- **Process**: Piecing together short DNA reads into longer sequences to reconstruct the genome.
- **Advantages**:
  - **Unbiased**: Reveals novel genetic information without relying on existing references.
  - **Discovery**: Identifies novel genes, structural variations, and genetic features.

**Why Reference-based Sequencing is Faster**:
Reference-based sequencing is faster because it leverages existing information from a known genome sequence. Instead of reconstructing the entire genome from scratch, it pieces together unique elements from the genome of interest to insert into the "reference" human genome to accurately and quickly construct the genome of interest.

### How is quantum computing involved?

A researcher at the Democritus University of Thrace, working with the consulting firm Ernst and Young, developed a theoretical quantum algorithm that sequences DNA through reference guiding. Working from the theory that the paper describes, we wrote our algorithm that efficiently constructs genomes in $O\left(\frac{n}{m\cdot k}\right)$ time (where we are attempting to insert $n$ reads—sequences unique to the person—into $k$ sequences, with $m$ representing the superposition of the reads.

Specifically, the quantum algorithm described in the paper uses a quantum compare operator and a quantum shift operator to simultaneously evaluate multiple potential alignments between a DNA read sequence and a reference sequence. This method stands out because it allows the exploration of all possible alignments concurrently rather than sequentially, as is the case with classical algorithms. The quantum approach significantly reduces the computational complexity involved in finding the optimal alignment, providing a theoretical quadratic speedup in search capabilities due to the quantum parallelism, which is a substantial improvement over traditional methods that generally require linear or super-linear time complexity. This makes the quantum method superior for handling large genomic datasets, where the volume and complexity of data can be overwhelmingly extensive.




## Quantum algorithm for reference-guided DNA sequencing

We start with our relevant imports.

In [1]:
import pennylane as qml
from pennylane import numpy as np



### Encoding DNA Base to Binary

The `encode_base` function encodes a DNA base into a binary representation. This encoding maps each DNA base ('A', 'C', 'G', 'T') to a corresponding binary sequence.

### Function Description

The function takes a single argument `base`, representing a DNA base, and returns a binary representation encoded as a list of two integers.

- If `base` is 'A', the function returns `[0, 0]`.
- If `base` is 'C', the function returns `[0, 1]`.
- If `base` is 'G', the function returns `[1, 0]`.
- If `base` is 'T', the function returns `[1, 1]`.

This encoding scheme converts DNA bases into binary digits, facilitating further processing or analysis of DNA sequences.

**Example Usage:**

```python
encoded = encode_base('A')
print(encoded)  # Output: [0, 0]

encoded = encode_base('T')
print(encoded)  # Output: [1, 1]


In [2]:
def encode_dna_base(base):
    if base == "A":
        return [0, 0]
    elif base == "T":
        return [1, 1]
    elif base == "C":
        return [0, 1]
    elif base == "G":
        return [1, 0]

### Defining a Quantum Device for DNA Sequencing

The `qml.device` function is used to define a quantum device (`dev`) for simulating quantum computations related to DNA sequencing.

### Function Description

The function initializes a quantum device (`dev`) using the `default.qubit` backend with the following parameters:
- `num_bases`: An integer representing the length of the DNA sequences (number of bases).

The quantum device (`dev`) is configured to allocate the necessary qubits for simulating DNA sequence encoding and processing: 2 * `num_bases` * 2 + `num_bases`

Here:
- `2 * num_bases * 2`: Represents the total qubits needed to encode DNA bases. Each base requires 2 qubits for binary encoding.
- `num_bases`: Accounts for the ancilla qubits, with one ancilla qubit allocated per DNA base.

This setup enables the quantum device (`dev`) to handle DNA sequence simulations efficiently, bridging the gap between quantum computing and molecular biology applications.

**Example Usage:**

```python
import pennylane as qml

# Define the length of the DNA sequences
num_bases = 4

# Define a quantum device for simulating DNA sequencing
dev = qml.device('default.qubit', wires=2 * num_bases * 2 + num_bases)




In [3]:
# Define a device
num_bases = 4 # This is the length of the DNA sequences
dev = qml.device('default.qubit', wires=2 * num_bases * 2 + num_bases)  # Each base uses 2 qubits, plus one ancilla per base

### Encoding a Sequence of DNA Bases to Qubit States

The `encode_dna_sequence` function converts a sequence of DNA bases into qubit states using the `encode_dna_base` function.

### Function Description

The function `encode_dna_sequence` takes a string `sequence` representing a sequence of DNA bases and encodes each base into qubit states using the `encode_dna_base` function.

For each base in the input `sequence`, the function utilizes `encode_dna_base(base)` to generate the corresponding qubit state representation. The resulting qubit states are returned as a list of encoded representations for the entire DNA sequence.

**Example Usage:**

```python
# Define a DNA sequence
dna_sequence = "ACGT"

# Encode the DNA sequence into qubit states
qubit_states = encode_dna_sequence(dna_sequence)
print(qubit_states)

import pennylane as qml

# Define the length of the DNA sequences
num_bases = 4

# Define a quantum device for simulating DNA sequencing
dev = qml.device('default.qubit', wires=2 * num_bases * 2 + num_bases)


In [4]:
# Function to encode a sequence of DNA bases to qubit states
def encode_dna_sequence(sequence):
    return [encode_dna_base(base) for base in sequence]

### Creating a Quantum Circuit to Compare DNA Sequences

The `dna_sequence_compare_circuit` function defines a quantum circuit using PennyLane (`qml.qnode`) for comparing two DNA sequences.

### Function Description

The function `dna_sequence_compare_circuit` is a PennyLane quantum node (`qml.qnode`) that compares two DNA sequences (`ref_sequence` and `read_sequence`) by encoding and manipulating their qubit representations.

The quantum circuit performs the following operations:
1. **Encoding DNA Sequences**:
   - The input DNA sequences (`ref_sequence` and `read_sequence`) are encoded into qubit states using the `encode_dna_sequence` function.

2. **Qubit Initialization**:
   - For each base in the sequences (`num_bases` times):
     - Qubits representing the reference base and the read base are initialized based on their encoded representations.
     - PauliX gates (`qml.PauliX`) are applied to set specific qubits based on the binary encoding of DNA bases.

3. **Comparison using CNOT and Toffoli Gates**:
   - CNOT gates (`qml.CNOT`) are used to compare corresponding qubits of the reference and read bases.
   - Toffoli gates (`qml.Toffoli`) are applied to set ancilla qubits based on the comparison results.

4. **Measurement**:
   - The final step involves measuring the state of ancilla qubits (`2*num_bases*2` to `2*num_bases*2 + num_bases - 1`) using PauliZ gates (`qml.expval(qml.PauliZ(i))`).

The function returns a list of expectation values corresponding to the measured ancilla qubits, which can be interpreted to assess the similarity between the input DNA sequences.

**Example Usage:**

```python
import pennylane as qml

# Define the length of the DNA sequences
num_bases = 4

# Define a quantum device for simulating DNA sequencing
dev = qml.device('default.qubit', wires=2 * num_bases * 2 + num_bases)


In [5]:
@qml.qnode(dev)
def dna_sequence_compare_circuit(ref_sequence, read_sequence):
    ref_encoded = encode_dna_sequence(ref_sequence)
    read_encoded = encode_dna_sequence(read_sequence)

    # Initialize the qubits based on DNA base encoding
    for i in range(num_bases):
        # Set the qubits for the reference base
        if ref_encoded[i][0] == 1:
            qml.PauliX(wires=i*2)
        if ref_encoded[i][1] == 1:
            qml.PauliX(wires=i*2 + 1)
        
        # Set the qubits for the read base
        if read_encoded[i][0] == 1:
            qml.PauliX(wires=num_bases*2 + i*2)
        if read_encoded[i][1] == 1:
            qml.PauliX(wires=num_bases*2 + i*2 + 1)

        # Apply CNOT gates for comparison
        qml.CNOT(wires=[i*2, num_bases*2 + i*2])
        qml.CNOT(wires=[i*2 + 1, num_bases*2 + i*2 + 1])
        
        # Use Toffoli to set the ancilla
        qml.Toffoli(wires=[num_bases*2 + i*2, num_bases*2 + i*2 + 1, 2*num_bases*2 + i])

    # Measure the ancilla qubits
    return [qml.expval(qml.PauliZ(i)) for i in range(2*num_bases*2, 2*num_bases*2 + num_bases)]


### Example DNA Sequences for Quantum Comparison

The following code defines example DNA sequences (`ref_sequences` and `read_sequences`) used for demonstrating quantum comparison methods.

### Description

The `ref_sequences` list contains multiple reference DNA sequences, each represented as a list of DNA bases (`"A"`, `"T"`, `"G"`, `"C"`).

The `read_sequences` list represents a single read DNA sequence, also as a list of DNA bases.

**Example Usage:**

```python
# Define example reference DNA sequences
ref_sequences = [
    ["A", "T", "G", "C"],
    ["A", "C", "G", "T"],
    ["T", "G", "C", "A"],
    ["A", "T", "G", "C"],
    ["C", "G", "T", "A"],
    ["T", "C", "G", "A"],
    ["A", "G", "T", "C"]
]

# Define a single read DNA sequence
read_sequence = ["A", "T", "G", "C"]


In [6]:
# Example DNA sequences
ref_sequences = [["A", "T", "G", "C"], ["A", "C", "G", "T"], ["T", "G", "C", "A"], ["A", "T", "G", "C"], ["C", "G", "T", "A"], ["T", "C", "G", "A"], ["A", "G", "T", "C"]]

read_sequences = ["A", "T", "G", "C"]

### Comparing DNA Sequences Using Quantum Circuits

The following code snippet demonstrates how to compare a read DNA sequence against multiple reference DNA sequences using a quantum circuit.

### Code Explanation

The code iterates over each reference DNA sequence (`ref_sequence`) in the `ref_sequences` list and compares it against a single read DNA sequence (`read_sequences`) using the `dna_sequence_compare_circuit` function.

**Steps:**
1. **Initialization**:
   - `comparison_results` is initialized as an empty list to store the results of DNA sequence comparisons.

2. **Comparison Loop**:
   - For each index `i` in the range of the length of `ref_sequences`:
     - Extract the current `ref_sequence` from `ref_sequences`.
     - Set `read_sequence` to the single read DNA sequence (`read_sequences`).
     - Use the `dna_sequence_compare_circuit` function to compare `ref_sequence` against `read_sequence`.
     - Append the comparison result (list of expectation values) to `comparison_results`.

3. **Flattening Results**:
   - `final_data` is generated by flattening the nested list `comparison_results` into a single list of expectation values (`final_data`).

**Example Usage:**

```python
# Initialize empty list to store comparison results
comparison_results = []

# Compare each reference DNA sequence against the read DNA sequence
for i in range(len(ref_sequences)):
    ref_sequence = ref_sequences[i]
    read_sequence = read_sequences
    comparison_results.append(dna_sequence_compare_circuit(ref_sequence, read_sequence))

# Flatten the comparison results into a single list
final_data = [element for innerList in comparison_results for element in innerList]


In [7]:
comparison_results = []

for i in range(len(ref_sequences)):
    ref_sequence = ref_sequences[i]
    read_sequence = read_sequences
    comparison_results.append(dna_sequence_compare_circuit(ref_sequence, read_sequence))
          
total_diff = []
index = 0

final_data = [element for innerList in comparison_results for element in innerList]

### Analyzing Comparison Results for DNA Sequences

The following code snippet analyzes the comparison results (`final_data`) obtained from quantum circuit simulations to identify positions where the read DNA sequence differs from the reference sequences.

### Code Explanation

The code iterates through each element `i` in the `final_data` list, representing the expectation values from DNA sequence comparisons using quantum circuits.

**Steps:**
1. **Initialization**:
   - `index` is initialized to track the position of elements in `final_data`.
   - `total_diff` is initialized as an empty list to store indices where the comparison result is less than `0` (indicating differences).

2. **Comparison Analysis Loop**:
   - For each `i` in `final_data`:
     - Increment the `index` counter.
     - Print the current expectation value `i`.
     - If `i` is less than `0` (indicating a difference between sequences):
       - Append the current `index` to `total_diff` to record the position of the difference.

3. **Output**:
   - Print the indices (`total_diff`) where the read DNA sequence differs from the reference sequences based on the comparison results.

**Example Usage:**

```python
# Initialize variables to track differences
total_diff = []
index = 0

# Analyze comparison results and identify positions of differences
for i in final_data:
    index += 1
    print(i)
    if i < 0:
        total_diff.append(index)

# Output positions where differences are detected
print("You should insert the read sequence at", total_diff)


In [8]:
print("Comparison results:")
for i in final_data:
    index +=1
    print(i)
    if i < 0:
        total_diff.append(index)

print("You should insert the read sequence at", total_diff)

Comparison results:
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
-1.0
1.0
-1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
-1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
You should insert the read sequence at [9, 11, 21]
