## This notebook is created by Prateek Paul.
* Email: prateekp@iiitd.ac.in
* LinkedIn: [linkedin.com/in/prateekpaulpro/](https://linkedin.com/in/prateekpaulpro/)

Disclaimer: 
The code and content in this notebook are compiled from various open sources, personal experience, and reference materials. It is intended solely for educational purposes. All credits for original ideas and code snippets go to their respective authors. If you find any inaccuracies or have suggestions, feel free to reach out.

# Bio Computing Course

# Instructions
### Welcome to the Python and Biological Concepts practice notebook! This set of exercises will help you apply your understanding of basic Python programming while exploring biological concepts. Please read each question carefully and use the provided code cells to write your solutions.

# General Guidelines:
### Read Each Question Carefully: Make sure you understand what is being asked before you start coding. Pay close attention to the input and output requirements.

### Write Clear and Efficient Code: Aim to write code that is both correct and efficient. Use basic Python concepts such as loops, conditionals, lists, dictionaries, and strings as needed.

Please verify the answers logically if they are correct or not. In case you find something please feel free to report us back.

In [1]:
dna_sequence = "ATGCGTACGTT"

### 1. String Manipulation - DNA Sequence
#### Question: Given a DNA sequence "ATGCGTACGTT", write a Python function to count the number of times the nucleotide 'G' appears.

In [2]:
def count_g(dna_sequence):
    return dna_sequence.count('G')

dna_sequence = "ATGCGTACGTT"
print(count_g(dna_sequence))  # Output: 3

3


### 2. List - DNA Codons
#### Question: Create a list of all codons (3-letter combinations) in the DNA sequence "ATGCGTACGTT".

In [3]:
def get_codons(dna_sequence):
    return [dna_sequence[i:i+3] for i in range(0, len(dna_sequence), 3)]

dna_sequence = "ATGCGTACGTTATAAATGCGTACGTTA"
print(get_codons(dna_sequence))

['ATG', 'CGT', 'ACG', 'TTA', 'TAA', 'ATG', 'CGT', 'ACG', 'TTA']


### 3. Dictionary - Codon to Amino Acid Mapping
#### Question: Create a dictionary that maps codons to their corresponding amino acids for the following: 'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine'.

In [4]:
codon_to_amino_acid = {
    'ATG': 'Methionine',
    'CGT': 'Arginine',
    'TAC': 'Tyrosine',
    'GTT': 'Valine'
}

print(codon_to_amino_acid)  # Output: {'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine'}


{'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine'}


### 4. Conditional Statements - DNA Base Identification
#### Question: Write a function that identifies if a base is a purine (A, G) or pyrimidine (C, T).

In [5]:
def identify_base(base):
    if base in 'AG':
        return 'Purine'
    elif base in 'CT':
        return 'Pyrimidine'
    else:
        return 'Unknown'

base = 'A'
print(identify_base(base))  # Output: Purine


Purine


### 5. Loops - Counting Bases
#### Question: Write a function to count the number of each base in the DNA sequence "ATGCGTACGTT".

In [6]:
def count_bases(dna_sequence):
    base_count = {'A': 0, 'T': 0, 'C': 0, 'G': 0}
    for base in dna_sequence:
        if base in base_count:
            base_count[base] += 1
    return base_count

dna_sequence = "ATGCGTACGTT"
print(count_bases(dna_sequence))  # Output: {'A': 2, 'T': 2, 'C': 2, 'G': 3}


{'A': 2, 'T': 4, 'C': 2, 'G': 3}


### 6. String Slicing - mRNA from DNA

#### Question: Convert the DNA sequence "ATGCGTACGTT" to its mRNA sequence.



In [7]:
def dna_to_mrna(dna_sequence):
    return dna_sequence.replace('T', 'U')

dna_sequence = "ATGCGTACGTT"
print(dna_to_mrna(dna_sequence))  # Output: AUGCGUACGUU


AUGCGUACGUU


### 7. List - Reverse Complement of DNA
#### Question: Write a function that returns the reverse complement of a DNA sequence "ATGCGTACGTT".

In [8]:
def reverse_complement(dna_sequence):
    complement = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
    return ''.join(complement[base] for base in reversed(dna_sequence))

dna_sequence = "ATGCGTACGTT"
print(reverse_complement(dna_sequence))  # Output: AACGTACGCAT


AACGTACGCAT


### 8. Dictionary - Transcription Mapping

#### Question: Create a dictionary to map each DNA base to its RNA complement (A->U, T->A, C->G, G->C) and use it to transcribe "ATGCGTACGTT".



In [9]:
dna_to_rna = {'A': 'U', 'T': 'A', 'C': 'G', 'G': 'C'}

def transcribe(dna_sequence):
    return ''.join(dna_to_rna[base] for base in dna_sequence)

dna_sequence = "ATGCGTACGTT"
print(transcribe(dna_sequence))  # Output: UACGCAUGCAA


UACGCAUGCAA


### 9. Conditional Statements - DNA Sequence Validity

#### Question: Write a function to check if a given sequence is a valid DNA sequence (only contains A, T, C, G).



In [10]:
def is_valid_dna(dna_sequence):
    for base in dna_sequence:
        if base not in 'ATCG':
            return False
    return True

dna_sequence = "ATGCGTACGTT"
print(is_valid_dna(dna_sequence))  # Output: True


True


### 10. Loops - Translation of Codons

#### Question: Write a function to translate the DNA sequence "ATGCGTACGTT" into a protein sequence using the codon dictionary provided.

In [11]:
codon_to_amino_acid = {
    'ATG': 'Methionine',
    'CGT': 'Arginine',
    'TAC': 'Tyrosine',
    'GTT': 'Valine'
}

def translate(dna_sequence):
    protein = []
    for i in range(0, len(dna_sequence), 3):
        codon = dna_sequence[i:i+3]
        if codon in codon_to_amino_acid:
            protein.append(codon_to_amino_acid[codon])
    return protein

dna_sequence = "ATGCGTACGTT"
print(translate(dna_sequence))  # Output: ['Methionine', 'Arginine', 'Tyrosine']


['Methionine', 'Arginine']


### 11. String - Counting GC Content

#### Question: Write a function to calculate the GC content of a DNA sequence "ATGCGTACGTT".

In [12]:
def gc_content(dna_sequence):
    g_count = dna_sequence.count('G')
    c_count = dna_sequence.count('C')
    return (g_count + c_count) / len(dna_sequence) * 100

dna_sequence = "ATGCGTACGTT"
print(gc_content(dna_sequence))  # Output: 45.45


45.45454545454545


### 12. List - Extract Exons

#### Question: Given a list of exon positions [(0, 3), (4, 7)], extract the exons from the DNA sequence "ATGCGTACGTT".

In [13]:
def extract_exons(dna_sequence, exon_positions):
    exons = []
    for start, end in exon_positions:
        exons.append(dna_sequence[start:end])
    return ''.join(exons)

dna_sequence = "ATGCGTACGTT"
exon_positions = [(0, 3), (4, 7)]
print(extract_exons(dna_sequence, exon_positions))  # Output: ATGCGT


ATGGTA


### 13. Dictionary - Frequency of Codons

#### Question: Write a function to count the frequency of each codon in the DNA sequence "ATGCGTACGTT".

In [14]:
def codon_frequency(dna_sequence):
    freq = {}
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        if codon in freq:
            freq[codon] += 1
        else:
            freq[codon] = 1
    return freq

dna_sequence = "ATGCGTACGTT"
print(codon_frequency(dna_sequence))  # Output: {'ATG': 1, 'CGT': 1, 'ACG': 1}


{'ATG': 1, 'CGT': 1, 'ACG': 1}


### 14. String Slicing - Subsequence Check
#### Question: Write a function to check if the sequence "CGTAC" is a subsequence of "ATGCGTACGTT".

In [15]:
def is_subsequence(sub, main):
    return sub in main

subsequence = "CGTAC"
main_sequence = "ATGCGTACGTT"
print(is_subsequence(subsequence, main_sequence))  # Output: True


True


### 15. Loops - Finding Start Codon
#### Question: Write a function to find the position of the first start codon (ATG) in the DNA sequence "ATGCGTACGTT".

In [16]:
def find_start_codon(dna_sequence):
    start_codon = "ATG"
    for i in range(len(dna_sequence) - 2):
        if dna_sequence[i:i+3] == start_codon:
            return i
    return -1

dna_sequence = "ATGCGTACGTT"
print(find_start_codon(dna_sequence))  # Output: 0


0


### 16. Dictionary - Counting Nucleotides
#### Question: Write a function to count the occurrence of each nucleotide in the DNA sequence "ATGCGTACGTT" using a dictionary.

In [17]:
def nucleotide_count(dna_sequence):
    count = {}
    for base in dna_sequence:
        if base in count:
            count[base] += 1
        else:
            count[base] = 1
    return count

dna_sequence = "ATGCGTACGTT"
print(nucleotide_count(dna_sequence))  # Output: {'A': 2, 'T': 2, 'G': 3, 'C': 2}


{'A': 2, 'T': 4, 'G': 3, 'C': 2}


### 17. List - Split DNA Sequence into Codons
#### Question: Write a function to split the DNA sequence "ATGCGTACGTT" into codons and store them in a list.

In [18]:
def split_into_codons(dna_sequence):
    codons = [dna_sequence[i:i+3] for i in range(0, len(dna_sequence), 3)]
    return codons

dna_sequence = "ATGCGTACGTT"
print(split_into_codons(dna_sequence))  # Output: ['ATG', 'CGT', 'ACG', 'TT']


['ATG', 'CGT', 'ACG', 'TT']


### 18. Conditional Statements - Check Palindrome
#### Question: Write a function to check if the DNA sequence "ATGCGCAT" is a palindrome.

In [19]:
def is_palindrome(dna_sequence):
    return dna_sequence == dna_sequence[::-1]

dna_sequence = "ATGCGCAT"
print(is_palindrome(dna_sequence))  # Output: False


False


### 19. Loops - Complementary DNA Strand
#### Question: Write a function to generate the complementary DNA strand for "ATGCGTACGTT".

In [20]:
def complementary_dna(dna_sequence):
    complement = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
    return ''.join(complement[base] for base in dna_sequence)

dna_sequence = "ATGCGTACGTT"
print(complementary_dna(dna_sequence))  # Output: TACGCATGCAA


TACGCATGCAA


### 20. Dictionary - Counting Codons in mRNA
#### Question: Write a function to count the frequency of each codon in the mRNA sequence "AUGCGUACGUU" using a dictionary.

In [21]:
def codon_count_mrna(mrna_sequence):
    count = {}
    for i in range(0, len(mrna_sequence) - 2, 3):
        codon = mrna_sequence[i:i+3]
        if codon in count:
            count[codon] += 1
        else:
            count[codon] = 1
    return count

mrna_sequence = "AUGCGUACGUU"
print(codon_count_mrna(mrna_sequence))  # Output: {'AUG': 1, 'CGU': 2, 'ACG': 1}


{'AUG': 1, 'CGU': 1, 'ACG': 1}


In [22]:
codon_to_amino_acid = {
    'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine',
    'TAA': 'Stop', 'TAG': 'Stop', 'TGA': 'Stop'
}

### 21. Dictionary & List - DNA to Protein Translation with a Stop Codon

#### Question: Given a dictionary mapping codons to amino acids and a DNA sequence "ATGCGTTAA", write a function to translate the DNA sequence into a protein sequence, stopping translation at the first stop codon ('TAA', 'TAG', 'TGA').

In [23]:
codon_to_amino_acid = {
    'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine',
    'TAA': 'Stop', 'TAG': 'Stop', 'TGA': 'Stop'
}

def translate_with_stop(dna_sequence):
    protein = []
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        amino_acid = codon_to_amino_acid.get(codon, 'Unknown')
        if amino_acid == 'Stop':
            break
        protein.append(amino_acid)
    return protein

dna_sequence = "ATGCGTTAA"
print(translate_with_stop(dna_sequence))  # Output: ['Methionine', 'Arginine']


['Methionine', 'Arginine']


### 22. Set Operations - Unique Nucleotide Combinations

#### Question: Write a function that finds all unique nucleotide combinations of length 2 in a DNA sequence "ATGCGTACGTT".

In [24]:
def unique_combinations(dna_sequence, length):
    combinations = set()
    for i in range(len(dna_sequence) - length + 1):
        combinations.add(dna_sequence[i:i+length])
    return combinations

dna_sequence = "ATGCGTACGTT"
print(unique_combinations(dna_sequence, 2))  # Output: {'AT', 'TG', 'GC', 'CG', 'GT', 'AC', 'TT'}


{'GC', 'TG', 'TT', 'CG', 'AC', 'AT', 'GT', 'TA'}


### 23. Nested Loops - Finding All ORFs
#### Question: Write a function to find all open reading frames (ORFs) in the DNA sequence "ATGCGTACGTTATGCGTTAA" starting with 'ATG' and ending with a stop codon.

In [25]:
def find_orfs(dna_sequence):
    stop_codons = {'TAA', 'TAG', 'TGA'}
    orfs = []
    length = len(dna_sequence)
    for i in range(length - 2):
        if dna_sequence[i:i+3] == 'ATG':
            for j in range(i, length - 2, 3):
                codon = dna_sequence[j:j+3]
                if codon in stop_codons:
                    orfs.append(dna_sequence[i:j+3])
                    break
    return orfs

dna_sequence = "ATGCGTACGTTATGCGTTAA"
print(find_orfs(dna_sequence))  # Output: ['ATGCGTACGTT', 'ATGCGTTAA']


['ATGCGTTAA']


### 24. Regular Expressions - Validating DNA Sequence
#### Question: Use a regular expression to validate if a given DNA sequence "ATGCGTACGTT" only contains valid nucleotides (A, T, C, G).

In [26]:
import re

def validate_dna_sequence(dna_sequence):
    pattern = re.compile(r'^[ATCG]+$')
    return bool(pattern.match(dna_sequence))

dna_sequence = "ATGCGTACGTT"
print(validate_dna_sequence(dna_sequence))  # Output: True


True


### 25. Recursion - Calculate GC Content Recursively
#### Question: Write a recursive function to calculate the GC content of a DNA sequence "ATGCGTACGTT".

In [27]:
def recursive_gc_content(dna_sequence):
    if not dna_sequence:
        return 0
    first, rest = dna_sequence[0], dna_sequence[1:]
    return (1 if first in 'GC' else 0) + recursive_gc_content(rest)

def gc_content_percentage(dna_sequence):
    total_gc = recursive_gc_content(dna_sequence)
    return (total_gc / len(dna_sequence)) * 100

dna_sequence = "ATGCGTACGTT"
print(gc_content_percentage(dna_sequence))  # Output: 45.45


45.45454545454545


### 27. List & Set - Finding Unique and Common Codons
#### Question: Write a function to find unique and common codons between two DNA sequences "ATGCGTACGTT" and "ATGCGTGTGTA".

In [28]:
def unique_and_common_codons(seq1, seq2):
    codons1 = {seq1[i:i+3] for i in range(0, len(seq1) - 2, 3)}
    codons2 = {seq2[i:i+3] for i in range(0, len(seq2) - 2, 3)}
    unique_codons = codons1.symmetric_difference(codons2)
    common_codons = codons1.intersection(codons2)
    return unique_codons, common_codons

seq1 = "ATGCGTACGTT"
seq2 = "ATGCGTGTGTA"
print(unique_and_common_codons(seq1, seq2))  # Output: ({'GTG', 'GTT', 'TAC'}, {'ATG', 'CGT'})


({'GTG', 'ACG'}, {'ATG', 'CGT'})


### 28. List Comprehension - Transcribe Multiple DNA Sequences
#### Question: Write a function to transcribe a list of DNA sequences ["ATGCGT", "GATTACA", "CGTACG"] into their respective mRNA sequences using list comprehension.

In [29]:
def transcribe_dna_list(dna_list):
    return [dna.replace('T', 'U') for dna in dna_list]

dna_list = ["ATGCGT", "GATTACA", "CGTACG"]
print(transcribe_dna_list(dna_list))  # Output: ['AUGCGU', 'GAUUACA', 'CGUACG']


['AUGCGU', 'GAUUACA', 'CGUACG']


### 29. Nested Loops - Counting Overlapping Codons
#### Question: Write a function to count the number of times a specific codon "CGT" appears in a DNA sequence "ATGCGTACGTT" including overlapping occurrences.

In [30]:
def count_overlapping_codons(dna_sequence, target_codon):
    count = 0
    for i in range(len(dna_sequence) - len(target_codon) + 1):
        if dna_sequence[i:i+len(target_codon)] == target_codon:
            count += 1
    return count

dna_sequence = "ATGCGTACGTT"
target_codon = "CGT"
print(count_overlapping_codons(dna_sequence, target_codon))  # Output: 2


2


### 30. List & Dictionary - Grouping DNA Sequences by GC Content
#### Question: Write a function to group DNA sequences ["ATGCGT", "GATTACA", "CGTACG", "TATATA"] into categories based on GC content: "High GC" (>50%), "Moderate GC" (30-50%), "Low GC" (<30%).

In [31]:
def gc_content_grouping(dna_list):
    categories = {"High GC": [], "Moderate GC": [], "Low GC": []}
    for dna in dna_list:
        gc_content = (dna.count('G') + dna.count('C')) / len(dna) * 100
        if gc_content > 50:
            categories["High GC"].append(dna)
        elif gc_content >= 30:
            categories["Moderate GC"].append(dna)
        else:
            categories["Low GC"].append(dna)
    return categories

dna_list = ["ATGCGT", "GATTACA", "CGTACG", "TATATA"]
print(gc_content_grouping(dna_list))
# Output: {'High GC': ['ATGCGT', 'CGTACG'], 'Moderate GC': ['GATTACA'], 'Low GC': ['TATATA']}


{'High GC': ['CGTACG'], 'Moderate GC': ['ATGCGT'], 'Low GC': ['GATTACA', 'TATATA']}


### 31. List & Conditional Statements - Counting Transitions and Transversions
#### Question: Write a function that counts the number of transitions (purine to purine, pyrimidine to pyrimidine) and transversions (purine to pyrimidine, vice versa) between two DNA sequences "ATGCGTAC" and "ATGCGTAG".

In [32]:
def count_transitions_transversions(seq1, seq2):
    purines = {'A', 'G'}
    pyrimidines = {'C', 'T'}
    transitions = 0
    transversions = 0

    for base1, base2 in zip(seq1, seq2):
        if base1 != base2:
            if (base1 in purines and base2 in purines) or (base1 in pyrimidines and base2 in pyrimidines):
                transitions += 1
            else:
                transversions += 1

    return transitions, transversions

seq1 = "ATGCGTAC"
seq2 = "ATGCGTAG"
print(count_transitions_transversions(seq1, seq2))  # Output: (1, 1)


(0, 1)


### 32. List & Dictionary - Identifying Palindromic Sequences
#### Question: Write a function to find all palindromic sequences of length 4 in a DNA sequence "ATGCGTACGTACGCGT".

In [33]:
def find_palindromes(dna_sequence, length):
    palindromes = []
    for i in range(len(dna_sequence) - length + 1):
        segment = dna_sequence[i:i+length]
        if segment == segment[::-1]:
            palindromes.append(segment)
    return palindromes

dna_sequence = "ATGCGTACGTACGCGT"
print(find_palindromes(dna_sequence, 4))  # Output: ['CGTC', 'GTAC']


[]


### 33. String & List - DNA Mutation Simulation
#### Question: Write a function to simulate a point mutation in a DNA sequence "ATGCGTAC" by randomly replacing one nucleotide with another.

In [34]:
import random

def point_mutation(dna_sequence):
    nucleotides = 'ATCG'
    position = random.randint(0, len(dna_sequence) - 1)
    original_base = dna_sequence[position]
    new_base = random.choice([n for n in nucleotides if n != original_base])
    mutated_sequence = dna_sequence[:position] + new_base + dna_sequence[position+1:]
    return mutated_sequence, position, original_base, new_base

dna_sequence = "ATGCGTAC"
print(point_mutation(dna_sequence))  # Example Output: ('ATCCGTAC', 2, 'G', 'C')


('ATGCGTTC', 6, 'A', 'T')


### 34. Loops & Dictionary - Counting Nucleotides in Multiple Sequences
#### Question: Write a function to count the occurrence of each nucleotide in a list of DNA sequences ["ATG", "CGT", "TAC", "GTT"].

In [35]:
def count_nucleotides(dna_list):
    nucleotide_count = {}
    for dna in dna_list:
        for base in dna:
            if base in nucleotide_count:
                nucleotide_count[base] += 1
            else:
                nucleotide_count[base] = 1
    return nucleotide_count

dna_list = ["ATG", "CGT", "TAC", "GTT"]
print(count_nucleotides(dna_list))  # Output: {'A': 2, 'T': 4, 'G': 3, 'C': 2}


{'A': 2, 'T': 5, 'G': 3, 'C': 2}


### 35. String Manipulation - GC Skew Calculation
#### Question: Write a function to calculate the GC skew (G - C / G + C) at each position in the DNA sequence "ATGCGTACGTT".

In [36]:
def gc_skew(dna_sequence):
    g_count = 0
    c_count = 0
    skew = []
    for base in dna_sequence:
        if base == 'G':
            g_count += 1
        elif base == 'C':
            c_count += 1
        if g_count + c_count != 0:
            skew.append((g_count - c_count) / (g_count + c_count))
        else:
            skew.append(0)
    return skew

dna_sequence = "ATGCGTACGTT"
print(gc_skew(dna_sequence))
# Output: [1.0, 1.0, 1.0, 0.5, 0.2, 0.3333333333333333, 0.2857142857142857, 0.42857142857142855, 0.375, 0.4, 0.45454545454545453]


[0, 0, 1.0, 0.0, 0.3333333333333333, 0.3333333333333333, 0.3333333333333333, 0.0, 0.2, 0.2, 0.2]


### 36. List & Set - Finding Common and Unique Nucleotides
#### Question: Write a function to find common and unique nucleotides between two DNA sequences "ATGCGT" and "GATTACA".

In [37]:
def common_and_unique_nucleotides(seq1, seq2):
    set1 = set(seq1)
    set2 = set(seq2)
    common = set1.intersection(set2)
    unique = set1.symmetric_difference(set2)
    return common, unique

seq1 = "ATGCGT"
seq2 = "GATTACA"
print(common_and_unique_nucleotides(seq1, seq2))  # Output: ({'A', 'T'}, {'G', 'C'})


({'C', 'G', 'T', 'A'}, set())


### 37. String & List - Reversing Transcription
#### Question: Write a function to reverse transcribe an mRNA sequence "AUGCGUACGUU" back into a DNA sequence.

In [38]:
def reverse_transcribe(mrna_sequence):
    rna_to_dna = {'A': 'T', 'U': 'A', 'C': 'G', 'G': 'C'}
    return ''.join(rna_to_dna[base] for base in mrna_sequence)

mrna_sequence = "AUGCGUACGUU"
print(reverse_transcribe(mrna_sequence))  # Output: 'TACGCATGCAA'


TACGCATGCAA


### 38. List & String - Translating Overlapping Codons
#### Question: Write a function to translate overlapping codons in a DNA sequence "ATGCGTACGTT" by one base at a time.

In [39]:
def translate_overlapping(dna_sequence):
    codon_to_amino_acid = {
        'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine',
        'TAA': 'Stop', 'TAG': 'Stop', 'TGA': 'Stop'
    }
    translations = []
    for i in range(len(dna_sequence) - 2):
        codon = dna_sequence[i:i+3]
        translations.append(codon_to_amino_acid.get(codon, 'Unknown'))
    return translations

dna_sequence = "ATGCGTACGTT"
print(translate_overlapping(dna_sequence))
# Output: ['Methionine', 'Arginine', 'Tyrosine', 'Valine', 'Unknown', 'Unknown', 'Unknown', 'Unknown']


['Methionine', 'Unknown', 'Unknown', 'Arginine', 'Unknown', 'Tyrosine', 'Unknown', 'Arginine', 'Valine']


### 39. List & Conditional Statements - Detecting Frameshifts
#### Question: Write a function to detect frameshift mutations between two sequences "ATGCGTACGTT" and "ATCGTACGTT".

In [40]:
def detect_frameshift(seq1, seq2):
    if len(seq1) != len(seq2):
        return "Length mismatch, potential frameshift"

    differences = sum(1 for a, b in zip(seq1, seq2) if a != b)
    if differences % 3 != 0:
        return "Frameshift detected"
    return "No frameshift detected"

seq1 = "ATGCGTACGTT"
seq2 = "ATCGTACGTT"
print(detect_frameshift(seq1, seq2))  # Output: "Frameshift detected"


Length mismatch, potential frameshift


### 40. String & List - Extracting Introns and Exons
#### Question: Write a function to extract exons and introns from a DNA sequence "ATGCGTACGTT" with exons at positions [(0, 3), (4, 7)].

In [41]:
def extract_exons_introns(dna_sequence, exon_positions):
    exons = []
    last_exon_end = 0
    for start, end in exon_positions:
        exons.append(dna_sequence[start:end])
        last_exon_end = end
    introns = dna_sequence[:exon_positions[0][0]] + dna_sequence[last_exon_end:]
    return exons, introns

dna_sequence = "ATGCGTACGTT"
exon_positions = [(0, 3), (4, 7)]
print(extract_exons_introns(dna_sequence, exon_positions))
# Output: (['ATG', 'CGT'], 'ACGTT')


(['ATG', 'GTA'], 'CGTT')


### 41. String & Dictionary - Codon Usage Frequency

#### Question: Write a function that calculates the frequency of each codon in a DNA sequence "ATGCGTACGTTATGCGT" and returns a dictionary with the counts.



In [42]:
def codon_usage(dna_sequence):
    codon_count = {}
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        if codon in codon_count:
            codon_count[codon] += 1
        else:
            codon_count[codon] = 1
    return codon_count

dna_sequence = "ATGCGTACGTTATGCGT"
print(codon_usage(dna_sequence))
# Output: {'ATG': 2, 'CGT': 2, 'ACG': 1, 'TTA': 1}


{'ATG': 1, 'CGT': 1, 'ACG': 1, 'TTA': 1, 'TGC': 1}


### 42. Nested Loops - Finding Longest ORF

#### Question: Write a function to find the longest open reading frame (ORF) in the DNA sequence "ATGCGTACGTTATGCGTTAA" that starts with 'ATG' and ends with a stop codon.



In [43]:
def longest_orf(dna_sequence):
    stop_codons = {'TAA', 'TAG', 'TGA'}
    longest_orf = ''
    length = len(dna_sequence)
    for i in range(length - 2):
        if dna_sequence[i:i+3] == 'ATG':
            for j in range(i, length - 2, 3):
                codon = dna_sequence[j:j+3]
                if codon in stop_codons:
                    orf = dna_sequence[i:j+3]
                    if len(orf) > len(longest_orf):
                        longest_orf = orf
                    break
    return longest_orf

dna_sequence = "ATGCGTACGTTATGCGTTAA"
print(longest_orf(dna_sequence))  # Output: 'ATGCGTACGTT'


ATGCGTTAA


### 43. List & Set - Unique Amino Acids in Protein Sequence

#### Question: Write a function to identify unique amino acids in a protein sequence "MKVLYRFY" using a set.

##### Output: {'R', 'V', 'K', 'L', 'Y', 'M', 'F'}

In [44]:
def unique_amino_acids(protein_sequence):
    return set(protein_sequence)

protein_sequence = "MKVLYRFY"
print(unique_amino_acids(protein_sequence))  # Output: {'R', 'V', 'K', 'L', 'Y', 'M', 'F'}


{'Y', 'R', 'V', 'K', 'F', 'M', 'L'}


### 44. String Manipulation - Finding Overlapping K-mers

#### Question: Write a function to find all overlapping k-mers of length 3 in the DNA sequence "ATGCGTACGTT".



In [45]:
def overlapping_kmers(dna_sequence, k):
    kmers = []
    for i in range(len(dna_sequence) - k + 1):
        kmers.append(dna_sequence[i:i+k])
    return kmers

dna_sequence = "ATGCGTACGTT"
print(overlapping_kmers(dna_sequence, 3))  # Output: ['ATG', 'TGC', 'GCG', 'CGT', 'GTA', 'TAC', 'ACG', 'CGT', 'GTT']


['ATG', 'TGC', 'GCG', 'CGT', 'GTA', 'TAC', 'ACG', 'CGT', 'GTT']


### 45. List & Conditional Statements - GC Content Windows

#### Question: Write a function to calculate the GC content in non-overlapping windows of size 4 in the DNA sequence "ATGCGTACGTT".

#####  Output: [50.0, 50.0, 25.0]


In [46]:
def gc_content_windows(dna_sequence, window_size):
    gc_contents = []
    for i in range(0, len(dna_sequence), window_size):
        window = dna_sequence[i:i+window_size]
        gc_count = window.count('G') + window.count('C')
        gc_content = (gc_count / len(window)) * 100
        gc_contents.append(gc_content)
    return gc_contents

dna_sequence = "ATGCGTACGTT"
print(gc_content_windows(dna_sequence, 4))  # Output: [50.0, 50.0, 25.0]


[50.0, 50.0, 33.33333333333333]


### 46. List & String - Translating Codon Frame Shifts

#### Question: Write a function to translate all possible frames (0, +1, +2) in the DNA sequence "ATGCGTACGTT".

##### # Output: [['Methionine', 'Arginine', 'Tyrosine'], ['Unknown', 'Unknown', 'Unknown'], ['Unknown', 'Unknown', 'Unknown']]

In [47]:
def translate_frames(dna_sequence):
    codon_to_amino_acid = {
        'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine',
        'TAA': 'Stop', 'TAG': 'Stop', 'TGA': 'Stop'
    }
    frames = []
    for frame in range(3):
        translation = []
        for i in range(frame, len(dna_sequence) - 2, 3):
            codon = dna_sequence[i:i+3]
            translation.append(codon_to_amino_acid.get(codon, 'Unknown'))
        frames.append(translation)
    return frames

dna_sequence = "ATGCGTACGTT"
print(translate_frames(dna_sequence))
# Output: [['Methionine', 'Arginine', 'Tyrosine'], ['Unknown', 'Unknown', 'Unknown'], ['Unknown', 'Unknown', 'Unknown']]


[['Methionine', 'Arginine', 'Unknown'], ['Unknown', 'Unknown', 'Arginine'], ['Unknown', 'Tyrosine', 'Valine']]


### 47. String & Dictionary - Complementary RNA Strand
#### Question: Write a function to generate the complementary RNA strand for the sequence "AUGCGUACGUU".
##### Output: 'UACGCAUGCAA'

In [48]:
def complementary_rna(rna_sequence):
    complement = {'A': 'U', 'U': 'A', 'C': 'G', 'G': 'C'}
    return ''.join(complement[base] for base in rna_sequence)

rna_sequence = "AUGCGUACGUU"
print(complementary_rna(rna_sequence))  # Output: 'UACGCAUGCAA'


UACGCAUGCAA


### 49. List & Loops - Translating Reverse Complement
#### Question: Write a function to translate the reverse complement of a DNA sequence "ATGCGTACGTT" into a protein sequence.


##### # Output: ['Unknown', 'Unknown', 'Unknown', 'Unknown']

In [49]:
def reverse_complement(dna_sequence):
    complement = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
    return ''.join(complement[base] for base in reversed(dna_sequence))

def translate_reverse_complement(dna_sequence):
    codon_to_amino_acid = {
        'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine',
        'TAA': 'Stop', 'TAG': 'Stop', 'TGA': 'Stop'
    }
    rev_comp = reverse_complement(dna_sequence)
    protein = []
    for i in range(0, len(rev_comp) - 2, 3):
        codon = rev_comp[i:i+3]
        protein.append(codon_to_amino_acid.get(codon, 'Unknown'))
    return protein

dna_sequence = "ATGCGTACGTT"
print(translate_reverse_complement(dna_sequence))
# Output: ['Unknown', 'Unknown', 'Unknown', 'Unknown']


['Unknown', 'Unknown', 'Unknown']


### 50. List & String - Identifying Start and Stop Codons

#### Question: Write a function to find all positions of start codons 'ATG' and stop codons ('TAA', 'TAG', 'TGA') in a DNA sequence "ATGCGTATGCGTTAA".

##### # Output: ([0, 5], [11])

In [50]:
def find_start_stop_codons(dna_sequence):
    start_codons = []
    stop_codons = []
    for i in range(len(dna_sequence) - 2):
        codon = dna_sequence[i:i+3]
        if codon == 'ATG':
            start_codons.append(i)
        elif codon in ('TAA', 'TAG', 'TGA'):
            stop_codons.append(i)
    return start_codons, stop_codons

dna_sequence = "ATGCGTATGCGTTAA"
print(find_start_stop_codons(dna_sequence))  # Output: ([0, 5], [11])


([0, 6], [12])


### 51. String & List - Generating K-mer Frequencies

#### Question: Write a function to calculate the frequency of each k-mer of length 3 in the DNA sequence "ATGCGTACGTT".

##### # Output: {'ATG': 1, 'TGC': 1, 'GCG': 1, 'CGT': 2, 'GTA': 1, 'TAC': 1, 'ACG': 1, 'GTT': 1}


In [51]:
def kmer_frequencies(dna_sequence, k):
    kmer_count = {}
    for i in range(len(dna_sequence) - k + 1):
        kmer = dna_sequence[i:i+k]
        if kmer in kmer_count:
            kmer_count[kmer] += 1
        else:
            kmer_count[kmer] = 1
    return kmer_count

dna_sequence = "ATGCGTACGTT"
print(kmer_frequencies(dna_sequence, 3))
# Output: {'ATG': 1, 'TGC': 1, 'GCG': 1, 'CGT': 2, 'GTA': 1, 'TAC': 1, 'ACG': 1, 'GTT': 1}


{'ATG': 1, 'TGC': 1, 'GCG': 1, 'CGT': 2, 'GTA': 1, 'TAC': 1, 'ACG': 1, 'GTT': 1}


### 52. Dictionary & List - Counting Dinucleotides

#### Question: Write a function to count the frequency of each dinucleotide pair (e.g., 'AA', 'AC', etc.) in a DNA sequence "ATGCGTACGTT".

##### # Output: {'AT': 1, 'TG': 1, 'GC': 2, 'CG': 2, 'GT': 2, 'TA': 1, 'AC': 1}


In [52]:
def dinucleotide_frequencies(dna_sequence):
    dinucleotide_count = {}
    for i in range(len(dna_sequence) - 1):
        dinucleotide = dna_sequence[i:i+2]
        if dinucleotide in dinucleotide_count:
            dinucleotide_count[dinucleotide] += 1
        else:
            dinucleotide_count[dinucleotide] = 1
    return dinucleotide_count

dna_sequence = "ATGCGTACGTT"
print(dinucleotide_frequencies(dna_sequence))
# Output: {'AT': 1, 'TG': 1, 'GC': 2, 'CG': 2, 'GT': 2, 'TA': 1, 'AC': 1}


{'AT': 1, 'TG': 1, 'GC': 1, 'CG': 2, 'GT': 2, 'TA': 1, 'AC': 1, 'TT': 1}


### 53. List & String - Finding Reverse Palindromes

#### Question: Write a function to find all reverse palindromic sequences of length 6 in the DNA sequence "ATGCGTACGCGTACGT".

##### Output: ['CGTACG', 'GCGTAC']

In [53]:
def reverse_palindromes(dna_sequence, length):
    palindromes = []
    for i in range(len(dna_sequence) - length + 1):
        segment = dna_sequence[i:i+length]
        if segment == reverse_complement(segment):
            palindromes.append(segment)
    return palindromes

dna_sequence = "ATGCGTACGCGTACGT"
print(reverse_palindromes(dna_sequence, 6))  # Output: ['CGTACG', 'GCGTAC']


['CGTACG', 'ACGCGT', 'CGTACG']


### 54. String & List - Protein Subsequence Search

#### Question: Write a function to find all occurrences of a protein subsequence "LYR" in a protein sequence "MKVLYRLYRFY".

##### Output: [3, 6]

In [54]:
def find_protein_subsequence(protein_sequence, subsequence):
    positions = []
    subseq_len = len(subsequence)
    for i in range(len(protein_sequence) - subseq_len + 1):
        if protein_sequence[i:i+subseq_len] == subsequence:
            positions.append(i)
    return positions

protein_sequence = "MKVLYRLYRFY"
subsequence = "LYR"
print(find_protein_subsequence(protein_sequence, subsequence))  # Output: [3, 6]


[3, 6]


### 55. List & Dictionary - Finding Codon Usage Bias

#### Question: Write a function to compare codon usage in two different DNA sequences "ATGCGTACGTT" and "ATGCGTAGCGT".

##### Output: ({'ATG': 1, 'CGT': 2, 'ACG': 1, 'GTT': 1}, {'ATG': 1, 'CGT': 2, 'AGC': 1, 'GTA': 1})


In [55]:
def codon_usage_bias(seq1, seq2):
    def count_codons(dna_sequence):
        codon_count = {}
        for i in range(0, len(dna_sequence) - 2, 3):
            codon = dna_sequence[i:i+3]
            if codon in codon_count:
                codon_count[codon] += 1
            else:
                codon_count[codon] = 1
        return codon_count

    usage1 = count_codons(seq1)
    usage2 = count_codons(seq2)

    return usage1, usage2

seq1 = "ATGCGTACGTT"
seq2 = "ATGCGTAGCGT"
print(codon_usage_bias(seq1, seq2))
# Output: ({'ATG': 1, 'CGT': 2, 'ACG': 1, 'GTT': 1}, {'ATG': 1, 'CGT': 2, 'AGC': 1, 'GTA': 1})


({'ATG': 1, 'CGT': 1, 'ACG': 1}, {'ATG': 1, 'CGT': 1, 'AGC': 1})


### 56. List & String - Converting DNA to Protein Using a Custom Genetic Code

#### Question: Write a function to translate a DNA sequence "ATGCGTACGTT" into a protein using a custom genetic code mapping.

##### Output: ['Methionine', 'Arginine', 'Tyrosine', 'Valine']


In [56]:
def translate_custom(dna_sequence, genetic_code):
    protein = []
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        protein.append(genetic_code.get(codon, 'Unknown'))
    return protein

custom_genetic_code = {
    'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine',
    'TAA': 'Stop', 'TAG': 'Stop', 'TGA': 'Stop'
}

dna_sequence = "ATGCGTACGTT"
print(translate_custom(dna_sequence, custom_genetic_code))
# Output: ['Methionine', 'Arginine', 'Tyrosine', 'Valine']


['Methionine', 'Arginine', 'Unknown']


### 57. List & Set - Finding Unique Codons

#### Question: Write a function to find all unique codons in a DNA sequence "ATGCGTACGTTATGCGT".

##### Output: {'ATG', 'CGT', 'ACG', 'TTA'}

In [57]:
def unique_codons(dna_sequence):
    codons = set()
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        codons.add(codon)
    return codons

dna_sequence = "ATGCGTACGTTATGCGT"
print(unique_codons(dna_sequence))  # Output: {'ATG', 'CGT', 'ACG', 'TTA'}


{'TGC', 'TTA', 'CGT', 'ACG', 'ATG'}


### 58. String & List - Translating Protein Sequences with Multiple Start Codons

#### Question: Write a function to translate all possible protein sequences starting from each occurrence of 'ATG' in a DNA sequence "ATGCGTATGCGTTAA".

##### Output: [['Methionine', 'Arginine', 'Tyrosine'], ['Methionine', 'Arginine', 'Valine']]


In [58]:
def translate_from_all_starts(dna_sequence, genetic_code):
    proteins = []
    for i in range(len(dna_sequence) - 2):
        if dna_sequence[i:i+3] == 'ATG':
            protein = []
            for j in range(i, len(dna_sequence) - 2, 3):
                codon = dna_sequence[j:j+3]
                amino_acid = genetic_code.get(codon, 'Stop')
                if amino_acid == 'Stop':
                    break
                protein.append(amino_acid)
            proteins.append(protein)
    return proteins

genetic_code = {
    'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine',
    'TAA': 'Stop', 'TAG': 'Stop', 'TGA': 'Stop'
}

dna_sequence = "ATGCGTATGCGTTAA"
print(translate_from_all_starts(dna_sequence, genetic_code))
# Output: [['Methionine', 'Arginine', 'Tyrosine'], ['Methionine', 'Arginine', 'Valine']]


[['Methionine', 'Arginine', 'Methionine', 'Arginine'], ['Methionine', 'Arginine']]


### 59. List & Dictionary - Calculating Codon Usage in Different Frames

#### Question: Write a function to calculate codon usage in all three reading frames of a DNA sequence "ATGCGTACGTT".

##### Output: [{'ATG': 1, 'CGT': 2, 'ACG': 1, 'GTT': 1}, {'TGC': 1, 'GTA': 1, 'CGT': 1, 'TAC': 1}, {'GCG': 1, 'TAC': 1, 'GT': 1}]


In [59]:
def codon_usage_in_frames(dna_sequence):
    def count_codons(dna_sequence):
        codon_count = {}
        for i in range(0, len(dna_sequence) - 2, 3):
            codon = dna_sequence[i:i+3]
            if codon in codon_count:
                codon_count[codon] += 1
            else:
                codon_count[codon] = 1
        return codon_count

    frames = [dna_sequence, dna_sequence[1:], dna_sequence[2:]]
    codon_usage = [count_codons(frame) for frame in frames]
    return codon_usage

dna_sequence = "ATGCGTACGTT"
print(codon_usage_in_frames(dna_sequence))
# Output: [{'ATG': 1, 'CGT': 2, 'ACG': 1, 'GTT': 1}, {'TGC': 1, 'GTA': 1, 'CGT': 1, 'TAC': 1}, {'GCG': 1, 'TAC': 1, 'GT': 1}]


[{'ATG': 1, 'CGT': 1, 'ACG': 1}, {'TGC': 1, 'GTA': 1, 'CGT': 1}, {'GCG': 1, 'TAC': 1, 'GTT': 1}]


### 60. List & Conditional Statements - Identifying ORFs in Reverse Complement

#### Question: Write a function to find all open reading frames (ORFs) in the reverse complement of a DNA sequence "ATGCGTACGTT" that start with 'ATG' and end with a stop codon.



In [60]:
def find_orfs_reverse_complement(dna_sequence):
    stop_codons = {'TAA', 'TAG', 'TGA'}
    reverse_dna = reverse_complement(dna_sequence)
    orfs = []
    for i in range(len(reverse_dna) - 2):
        if reverse_dna[i:i+3] == 'ATG':
            for j in range(i, len(reverse_dna) - 2, 3):
                codon = reverse_dna[j:j+3]
                if codon in stop_codons:
                    orf = reverse_dna[i:j+3]
                    orfs.append(orf)
                    break
    return orfs

dna_sequence = "ATGCGTACGTT"
print(find_orfs_reverse_complement(dna_sequence))
# Output: ['AACGTACGCAT']


[]


### 61. String & Set - Finding Overlapping Motifs

#### Question: Write a function to find all occurrences of the motif "ACGT" in the DNA sequence "ATGCGTACGTTACGTACGT".



In [61]:
def find_overlapping_motifs(dna_sequence, motif):
    positions = []
    motif_len = len(motif)
    for i in range(len(dna_sequence) - motif_len + 1):
        if dna_sequence[i:i+motif_len] == motif:
            positions.append(i)
    return positions

dna_sequence = "ATGCGTACGTTACGTACGT"
motif = "ACGT"
print(find_overlapping_motifs(dna_sequence, motif))  # Output: [3, 8, 12]


[6, 11, 15]


### 62. List & Dictionary - Counting Triplet Nucleotide Repeats

#### Question: Write a function to count the number of times each triplet nucleotide repeat occurs in the DNA sequence "ATGCGTACGTTACG".



In [62]:
def triplet_nucleotide_repeats(dna_sequence):
    repeats_count = {}
    for i in range(len(dna_sequence) - 2):
        triplet = dna_sequence[i:i+3]
        if triplet in repeats_count:
            repeats_count[triplet] += 1
        else:
            repeats_count[triplet] = 1
    return repeats_count

dna_sequence = "ATGCGTACGTTACG"
print(triplet_nucleotide_repeats(dna_sequence))
# Output: {'ATG': 1, 'TGC': 1, 'GCG': 1, 'CGT': 2, 'GTA': 1, 'TAC': 2, 'ACG': 1}


{'ATG': 1, 'TGC': 1, 'GCG': 1, 'CGT': 2, 'GTA': 1, 'TAC': 2, 'ACG': 2, 'GTT': 1, 'TTA': 1}


### 63. String & Loops - Translating a Custom Genetic Code Sequence

#### Question: Write a function to translate a DNA sequence "ATGCGTACGTT" using a custom genetic code that includes ambiguous codons (e.g., 'ATN' -> 'Methionine', 'CGN' -> 'Arginine').



In [63]:
def translate_with_ambiguity(dna_sequence, genetic_code):
    protein = []
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        amino_acid = genetic_code.get(codon, genetic_code.get(codon[:2] + 'N', 'Unknown'))
        protein.append(amino_acid)
    return protein

genetic_code = {
    'ATG': 'Methionine', 'ATN': 'Methionine', 'CGT': 'Arginine', 'CGN': 'Arginine',
    'TAC': 'Tyrosine', 'GTT': 'Valine', 'TAA': 'Stop', 'TAG': 'Stop', 'TGA': 'Stop'
}

dna_sequence = "ATGCGTACGTT"
print(translate_with_ambiguity(dna_sequence, genetic_code))
# Output: ['Methionine', 'Arginine', 'Tyrosine', 'Valine']


['Methionine', 'Arginine', 'Unknown']


### 64. String & List - Analyzing Amino Acid Composition

#### Question: Write a function to calculate the composition of each amino acid in a protein sequence "MKVLYRFY".



In [64]:
def amino_acid_composition(protein_sequence):
    composition = {}
    for aa in protein_sequence:
        if aa in composition:
            composition[aa] += 1
        else:
            composition[aa] = 1
    return composition

protein_sequence = "MKVLYRFY"
print(amino_acid_composition(protein_sequence))
# Output: {'M': 1, 'K': 1, 'V': 1, 'L': 1, 'Y': 2, 'R': 1, 'F': 1}


{'M': 1, 'K': 1, 'V': 1, 'L': 1, 'Y': 2, 'R': 1, 'F': 1}


### 65. List & Dictionary - Calculating Nucleotide Composition in Codon Positions

#### Question: Write a function to calculate the nucleotide composition at each codon position (1st, 2nd, 3rd) in the DNA sequence "ATGCGTACGTT".

##### # Output: {1: {'A': 1, 'T': 1, 'C': 1, 'G': 2}, 2: {'A': 0, 'T': 1, 'C': 3, 'G': 1}, 3: {'A': 1, 'T': 2, 'C': 0, 'G': 2}}


In [65]:
def nucleotide_composition_by_codon_position(dna_sequence):
    positions = {1: {'A': 0, 'T': 0, 'C': 0, 'G': 0},
                 2: {'A': 0, 'T': 0, 'C': 0, 'G': 0},
                 3: {'A': 0, 'T': 0, 'C': 0, 'G': 0}}
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        for j, nucleotide in enumerate(codon, 1):
            positions[j][nucleotide] += 1
    return positions

dna_sequence = "ATGCGTACGTT"
print(nucleotide_composition_by_codon_position(dna_sequence))
# Output: {1: {'A': 1, 'T': 1, 'C': 1, 'G': 2}, 2: {'A': 0, 'T': 1, 'C': 3, 'G': 1}, 3: {'A': 1, 'T': 2, 'C': 0, 'G': 2}}


{1: {'A': 2, 'T': 0, 'C': 1, 'G': 0}, 2: {'A': 0, 'T': 1, 'C': 1, 'G': 1}, 3: {'A': 0, 'T': 1, 'C': 0, 'G': 2}}


### 66. List & Set - Identifying Non-overlapping Motifs

#### Question: Write a function to find all non-overlapping occurrences of the motif "ACG" in the DNA sequence "ATGCGTACGTTACG".

##### Output: [5, 11]

In [66]:
def find_non_overlapping_motifs(dna_sequence, motif):
    positions = []
    motif_len = len(motif)
    i = 0
    while i <= len(dna_sequence) - motif_len:
        if dna_sequence[i:i+motif_len] == motif:
            positions.append(i)
            i += motif_len
        else:
            i += 1
    return positions

dna_sequence = "ATGCGTACGTTACG"
motif = "ACG"
print(find_non_overlapping_motifs(dna_sequence, motif))  # Output: [5, 11]


[6, 11]


### 67. List & Dictionary - Creating a Reverse Complement Dictionary

#### Question: Write a function to create a dictionary that maps each codon to its reverse complement in a DNA sequence "ATGCGTACGTT".

##### Output: {'ATG': 'CAT', 'CGT': 'ACG', 'ACG': 'CGT', 'GTT': 'AAC'}


In [67]:
def codon_reverse_complement_dict(dna_sequence):
    reverse_complement_dict = {}
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        reverse_complement_dict[codon] = reverse_complement(codon)
    return reverse_complement_dict

dna_sequence = "ATGCGTACGTT"
print(codon_reverse_complement_dict(dna_sequence))
# Output: {'ATG': 'CAT', 'CGT': 'ACG', 'ACG': 'CGT', 'GTT': 'AAC'}


{'ATG': 'CAT', 'CGT': 'ACG', 'ACG': 'CGT'}


### 68. List & String - Transcribing DNA with Ambiguous Bases

#### Question: Write a function to transcribe a DNA sequence with ambiguous bases "ATGCGTNCGTT" into an RNA sequence, where 'N' represents any nucleotide.

##### Output: 'UACGCANGCAA'

In [68]:
def transcribe_with_ambiguity(dna_sequence):
    transcription = {'A': 'U', 'T': 'A', 'C': 'G', 'G': 'C', 'N': 'N'}
    return ''.join(transcription[base] for base in dna_sequence)

dna_sequence = "ATGCGTNCGTT"
print(transcribe_with_ambiguity(dna_sequence))  # Output: 'UACGCANGCAA'


UACGCANGCAA


### 69. String & List - Translating Overlapping Codons with Degenerate Bases

#### Question: Write a function to translate overlapping codons in a DNA sequence "ATGCGTACGTNNN" using a genetic code that includes degenerate bases.

##### Output: ['Methionine', 'Arginine', 'Threonine', 'Any']


In [69]:
def translate_with_degenerate_bases(dna_sequence, genetic_code):
    protein = []
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        amino_acid = genetic_code.get(codon, 'Unknown')
        if amino_acid != 'Unknown':
            protein.append(amino_acid)
    return protein

degenerate_genetic_code = {
    'ATG': 'Methionine', 'CGT': 'Arginine', 'ACG': 'Threonine', 'TNN': 'Any'
}

dna_sequence = "ATGCGTACGTNNN"
print(translate_with_degenerate_bases(dna_sequence, degenerate_genetic_code))
# Output: ['Methionine', 'Arginine', 'Threonine', 'Any']


['Methionine', 'Arginine', 'Threonine', 'Any']


### 70. List & String - Finding Longest Protein Coding Sequence

#### Question: Write a function to find the longest protein coding sequence in the DNA sequence "ATGCGTACGTTGAAATGCCGTTAG".

In [70]:
def longest_protein_coding_sequence(dna_sequence, genetic_code):
    stop_codons = {'TAA', 'TAG', 'TGA'}
    longest_sequence = ''
    for i in range(len(dna_sequence) - 2):
        if dna_sequence[i:i+3] == 'ATG':
            for j in range(i, len(dna_sequence) - 2, 3):
                codon = dna_sequence[j:j+3]
                if codon in stop_codons:
                    protein_sequence = dna_sequence[i:j+3]
                    if len(protein_sequence) > len(longest_sequence):
                        longest_sequence = protein_sequence
                    break
    return longest_sequence

genetic_code = {
    'ATG': 'Methionine', 'CGT': 'Arginine', 'TAC': 'Tyrosine', 'GTT': 'Valine',
    'TAA': 'Stop', 'TAG': 'Stop', 'TGA': 'Stop'
}

dna_sequence = "ATGCGTACGTTGAAATGCCGTTAG"
print(longest_protein_coding_sequence(dna_sequence, genetic_code))
# Output: 'ATGCGTACGTTGAAATGCCGTTAG'


ATGCGTACGTTGAAATGCCGTTAG


## Please use the below seq for verifying the results

In [71]:
dna_sequences = [
    "ATGCGTACGTTGACGTAGCCTAGCGTACGATTACGCGTATGGGCTACTGCGTACGTTGCGTATGCGTACGTTGAATGCGT",
    "GCTAGCGTACGTTGCGTAGCGTACGTGACGTACTGCGTAGCTAGCGTTACGTTACGCGTACGATGCGTACGTGCGTGACG",
    "ATGCGTACGTTGCGTATGCGTACGTTGACGTAGCTAGCGTACGTTACGCGTACGCTGCGTAGCGTACGCGTATGCGTACG",
    "CGTACGTTGACGTAGCGTACGTGCGTACGCGTACGCTAGCGTACGTTGCGTACGTACGCGTACGTTGCGTACGCGTACGT",
    "ACGTTGCGTACGCGTACGTTGACGTAGCGTACGTTACGCGTACGTGCGTAGCGTACGCTGCGTACGCGTACGTTGCGTAC"
]

protein_sequences = [
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY",
    "MVVVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY",
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY",
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY",
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
]

In [72]:
dna_sequences = [
    "ATGCGTACGTTGACGTAGCCTAGCGTACGATTACGCGTATGGGCTACTGCGTACGTTGCGTATGCGTACGTTGAATGCGT"
    "TACGTTGACGTAGCGTACGTGCGTAGCTAGCGTACGTTACGCGTACGATGCGTACGTGCGTGACGTACGTTACGCGTACG"
    "TAGCGTACGTTGCGTACGTTGACGTAGCGTACGCTGCGTAGCGTACGCGTATGCGTACGTTGACGTAGCGTACGTGCGTA"
    "GCTAGCGTACGTTGCGTACGTACGCGTACGTTGCGTACGCGTACGTTGACGTAGCGTACGTTACGCGTACGTGCGTAGCG",

    "GCTAGCGTACGTTGCGTAGCGTACGTGACGTACTGCGTAGCTAGCGTTACGTTACGCGTACGATGCGTACGTGCGTGACG"
    "TACGTTGACGTAGCGTACGTTACGCGTACGATGCGTACGTGCGTGACGTACGTTACGCGTACGTGCGTAGCGTACGTTAC"
    "GCGTACGTGCGTACGTTGACGTAGCGTACGTTACGCGTACGTGCGTAGCGTACGCTGCGTACGCGTACGTTGCGTACGCG"
    "TACGTGCGTACGTTGACGTAGCGTACGTTACGCGTACGTGCGTAGCGTACGTTACGCGTACGTTGACGTAGCGTACGTTG",

    "ATGCGTACGTTGCGTATGCGTACGTTGACGTAGCTAGCGTACGTTACGCGTACGCTGCGTAGCGTACGCGTATGCGTACG"
    "GCGTACGTTGCGTAGCGTACGTTGACGTAGCGTACGCTGCGTAGCGTACGCGTATGCGTACGTTGACGTAGCGTACGTGC"
    "GTTACGCGTACGTGCGTAGCGTACGCTGCGTACGCGTACGTTGCGTACGTGCGTACGTTGACGTAGCGTACGTTACGCGT"
    "ACGTGCGTAGCGTACGTTGACGTAGCGTACGTTACGCGTACGTGCGTAGCGTACGTTGCGTACGCGTACGTTGACGTAGC",

    "CGTACGTTGACGTAGCGTACGTGCGTACGCGTACGCTAGCGTACGTTGCGTACGTACGCGTACGTTGCGTACGCGTACGT"
    "GCGTACGTTGACGTAGCGTACGTGCGTACGCGTACGTTGACGTAGCGTACGTTACGCGTACGCTGCGTACGCGTACGTTG"
    "CGTACGTTGACGTAGCGTACGTGCGTACGCGTACGCTAGCGTACGTTGCGTACGTACGCGTACGTTGCGTACGCGTACGT"
    "GCGTACGTTGACGTAGCGTACGTGCGTACGCGTACGTTGACGTAGCGTACGTTACGCGTACGCTGCGTACGCGTACGTTG",

    "ACGTTGCGTACGCGTACGTTGACGTAGCGTACGTTACGCGTACGTGCGTAGCGTACGCTGCGTACGCGTACGTTGCGTAC"
    "GCGTACGTGCGTACGTTGACGTAGCGTACGTTACGCGTACGTGCGTAGCGTACGCTGCGTACGCGTACGTTGCGTACGCG"
    "TACGTGCGTAGCGTACGTTGACGTAGCGTACGTTACGCGTACGTGCGTAGCGTACGCTGCGTACGCGTACGTTGCGTACG"
    "GTTGACGTAGCGTACGTTACGCGTACGTGCGTAGCGTACGCTGCGTACGCGTACGTTGCGTACGCGTACGTGCGTAGCGT"
]


In [73]:
protein_sequences = [
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY",

    "MVVVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY",

    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY",

    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY",

    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
    "MKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFYMKVLYRLYRFY"
]
