# Question 1

***question:*** Note the main advantages and disadvantages of storing information in DNA compared to traditional digital storage.

***answer:*** 

Advantages:

1. High density: DNA can store a large amount of information in a small volume. Example shown in class - 1 Gram of DNA is roughly equivalent to 215 Petabytes of data.
2. Longevity: DNA can last for thousands of years if stored properly.
3. Low maintenance: DNA does not require power to maintain the information stored.
4. Security: DNA is not susceptible to hacking or cyber attacks, as it is not digital.

Disadvantages:
1. High cost: DNA storage is currently expensive compared to traditional digital storage.
2. Slow read/write speeds: Reading and writing data to DNA is slow, the PCR process takes a lot of real-time and needs many iterations.
3. Error rate: DNA storage has a higher error rate compared to traditional digital storage, because it is a biological\chemical process.
4. Complexity: DNA storage requires specialized equipment and expertise to read and write data.
5. Problems with current infrastructure: Current infrastructure is not designed to handle DNA storage, so it would require significant investment to implement.

# Question 2

#### a.

I didn't find anywhere in the lecture notes or the slides info about hamming code, so I took my source from the internet, specifically from [this link](https://www.youtube.com/watch?v=373FUw-2U2k&ab_channel=JitheshKunissery) and I didn't understand the notation (11, 15) Hamming code. I assume that it means that there are 11 data bits and 4 parity bits, so the total length of the code is 15 bits. Then in this case, the code would like this:

$p_1 p_2 d_3 p_4 d_5 d_6 d_7 p_8 d_9 d_{10} d_{11} d_{12} d_{13} d_{14} d_{15}$

where $p_i$ is the parity bit and $d_i$ is the data bit.

---
#### b.

Encode the code: (1010 1010 101)
to: ($p_1 p_2 1 p_4 010 p_8 1010101$)

- $p_1$: is calculated with **p1** p2 **1** p4 **0**1**0** p8 **1**0**1**0**1**0**1** so $p_1 = 1$ (five 1s)
- $p_2$: is calculated with p1 **p2 1** p4 0**10** p8 1**01**01**01** so $p_2 = 0$ (four 1s)
- $p_4$: is calculated with p_1 p_2 1 **p_4 010** p_8 101**0101** so $p_4 = 1$ (three 1s)
- $p_8$: is calculated with p_1 p_2 1 p_4 010 **p_8 1010101** so $p_8 = 0$ (four 1s)

So the encoded code is: **10**1**1**010**0**1010101

---
#### c.

Find the error in the following code: (11111 00000 00000)

First, we calculate the parity bits:

$p_1$ = 0
$p_2$ = 0
$p_4$ = 1
$p_8$ = 0

$p_1$ and $p_2$ are different from the calculated parity bits, so there is an error in the code.
The error is in $1 + 2$ = 3rd bit, so the correct code is: (11011 00000 10000). If we recalculate the parity bits, we get matching parity bits, so only the 3rd bit is in error.

# Question 3

In [1]:
import random
import copy
from typing import List, Tuple, Dict, Optional

In [2]:
class DNAFountainCodec:
    def __init__(self, original_bits: str):
        """
        Initialize the codec with the original 32-bit binary message.
        """
        self.original_bits = original_bits.replace(" ", "")
        if len(self.original_bits) != 32 or any(c not in '01' for c in self.original_bits):
            raise ValueError("Input must be a 32-bit binary string.")
        self.original_segments = self._split_into_segments(self.original_bits)
        self.degree_for_seed = [
            2,  # seed=0
            2,  # seed=1
            1,  # seed=2
            1,  # seed=3
            2,  # seed=4
            4,  # seed=5
            2,  # seed=6
            1,  # seed=7
            6,  # seed=8
            1,  # seed=9
            1,  # seed=10
            2,  # seed=11
            7,  # seed=12
            2,  # seed=13
            1,  # seed=14
            4   # seed=15
        ]
        self.dna_mapping = {
            '00': 'A',
            '01': 'C',
            '10': 'G',
            '11': 'T'
        }
        self.inverse_dna_mapping = {v: k for k, v in self.dna_mapping.items()}
        self.drops: List[Dict] = []
        self.dna_drops: List[Dict] = []
        self.decoded_segments: List[Optional[int]] = [None] * 8

    def _split_into_segments(self, bits: str) -> List[int]:
        """
        Split the 32-bit string into eight 4-bit segments.
        """
        return [int(bits[i:i+4], 2) for i in range(0, 32, 4)]

    def _four_bits_to_binary_str(self, x: int) -> str:
        """
        Convert a number (0-15) to a 4-bit binary string.
        """
        return f"{x:04b}"

    def _bits_to_dna(self, bits2: str) -> str:
        """
        Convert a 2-bit string to its corresponding DNA base.
        """
        return self.dna_mapping.get(bits2, 'N')  # 'N' for undefined mappings

    def encode(self):
        """
        Encode the original segments into DNA drops.
        """
        print("### Encoding Process ###\n")
        print("Original Segments:")
        print(self.original_segments)
        print("\nGenerating Drops:\n")

        for seed in range(16):
            random.seed(seed)  # Initialize random with seed
            degree = self.degree_for_seed[seed]
            if degree > len(self.original_segments):
                raise ValueError(f"Degree {degree} for seed {seed} exceeds number of segments.")
            # Select 'degree' unique segment indices
            chosen_indices = random.sample(range(8), degree)
            # Compute XOR of the selected segments
            drop_val = 0
            for idx in chosen_indices:
                drop_val ^= self.original_segments[idx]
            # Store the drop
            drop = {
                'seed': seed,
                'drop_val': drop_val,
                'indices': chosen_indices
            }
            self.drops.append(drop)
            print(f"Seed={seed:04b} (Dec={seed}), Degree={degree}, Indices={chosen_indices}, XOR_value={drop_val:04b}")

        print("\nConverting Drops to DNA Sequences:\n")
        for drop in self.drops:
            dna_seq = self._encode_drop_to_dna(drop['seed'], drop['drop_val'])
            dna_drop = {
                'dna_seq': dna_seq,
                'seed': drop['seed'],
                'drop_val': drop['drop_val'],
                'indices': drop['indices']
            }
            self.dna_drops.append(dna_drop)
            print(f"Drop Seed={drop['seed']:04b}, XOR={drop['drop_val']:04b}, DNA={dna_seq}")

    def _encode_drop_to_dna(self, seed_val: int, drop_val: int) -> str:
        """
        Encode a seed and drop value into a DNA sequence.
        """
        seed_bits = self._four_bits_to_binary_str(seed_val)
        drop_bits = self._four_bits_to_binary_str(drop_val)
        all_bits = seed_bits + drop_bits  # 8 bits
        dna = ""
        for i in range(0, 8, 2):
            bits_pair = all_bits[i:i+2]
            dna += self._bits_to_dna(bits_pair)
        return dna

    def _dna_to_bits(self, dna_seq: str) -> str:
        """
        Decode a DNA sequence back to its 8-bit binary string.
        """
        bits = ""
        for i in range(0, len(dna_seq), 1):
            base = dna_seq[i]
            bits += self.inverse_dna_mapping.get(base, '00')  # Default to '00' if undefined
        # Adjust to ensure even number of bits
        if len(bits) % 2 != 0:
            bits = bits[:-1]
        return bits

    def decode(self):
        """
        Decode the DNA drops back to the original segments.
        """
        print("\n### Decoding Process ###\n")
        print("Starting Decoding...\n")

        # Reconstruct the drops from DNA sequences
        reconstructed_drops = []
        for dna_drop in self.dna_drops:
            dna_seq = dna_drop['dna_seq']
            bits = ""
            for i in range(0, len(dna_seq), 1):
                base = dna_seq[i]
                bits += self.inverse_dna_mapping.get(base, '00')
            if len(bits) != 8:
                raise ValueError(f"DNA sequence {dna_seq} does not correspond to 8 bits.")
            seed_bits = bits[:4]
            drop_bits = bits[4:]
            seed = int(seed_bits, 2)
            drop_val = int(drop_bits, 2)
            reconstructed_drops.append({
                'seed': seed,
                'drop_val': drop_val,
                'indices': self.drops[seed]['indices']  # Assuming the same seed to indices mapping
            })

        # Initialize a list to keep track of drops
        drops_for_decoding = copy.deepcopy(reconstructed_drops)

        # Iterative decoding using peeling
        progress = True
        while progress:
            progress = False
            drops_to_remove = []
            for drop in drops_for_decoding:
                unknown_indices = [idx for idx in drop['indices'] if self.decoded_segments[idx] is None]
                if len(unknown_indices) == 1:
                    # Only one unknown segment in this drop
                    unknown_idx = unknown_indices[0]
                    # The value of this segment is drop_val XOR all known segments in the drop
                    val = drop['drop_val']
                    for idx in drop['indices']:
                        if self.decoded_segments[idx] is not None:
                            val ^= self.decoded_segments[idx]
                    self.decoded_segments[unknown_idx] = val
                    print(f"Decoded Segment {unknown_idx}: {val:04b} from Drop Seed={drop['seed']:04b}, XOR={drop['drop_val']:04b}")
                    drops_to_remove.append(drop)
                    progress = True  # Continue the loop since progress was made

            # Remove processed drops
            for drop in drops_to_remove:
                drops_for_decoding.remove(drop)

            # Update remaining drops by removing the decoded segments
            for drop in drops_for_decoding:
                drop['drop_val'] ^= self.decoded_segments[idx] if self.decoded_segments[idx] is not None else 0

        print("\nDecoded Segments:")
        for idx, val in enumerate(self.decoded_segments):
            print(f"Segment {idx}: {val:04b} (Dec={val})")

    def verify(self):
        """
        Verify that the decoded segments match the original segments.
        """
        print("\n### Verification ###\n")
        if self.decoded_segments == self.original_segments:
            print("Success! Decoded segments match the original segments.")
        else:
            print("Failure! Decoded segments do not match the original segments.")
            print("Original Segments:", self.original_segments)
            print("Decoded Segments: ", self.decoded_segments)

    def run(self):
        """
        Execute the full encoding and decoding process with verification.
        """
        self.encode()
        self.decode()
        self.verify()

In [None]:
# -------------------------- Testing the Codec --------------------------
# Given message: 0100 0001 1010 1111 0000 0101 1010 0101
original_message = "0100 0001 1010 1111 0000 0101 1010 0101"
codec = DNAFountainCodec(original_message)
codec.run()