# Decoding The DNA of Strings

**Introduction**   
Strings are the backbone of many programming problems, from text processing to genetic analysis. This lesson dives into how we can work with DNA sequences in Python using string slicing and iteration, for loops, and dictionaries. These concepts are essential tools for parsing, analyzing, and storing data in programming.

**What You’ll Learn**   
- How to slice and iterate through strings to analyze data.
- Use dictionaries to count and store occurrences.
- Extend the solution to recognize special "codons" (substrings) that signal specific biological processes.

**Step 1: Slicing and Iterating Strings**   
Learn how to traverse a DNA string and extract individual nucleotides (A, T, G, C).

In [1]:
# Example DNA sequence
dna_sequence = "ATGCGTAATCG"

# Step 1: Iterate through each character in the string
for nucleotide in dna_sequence:
    print(f"Nucleotide: {nucleotide}")


Nucleotide: A
Nucleotide: T
Nucleotide: G
Nucleotide: C
Nucleotide: G
Nucleotide: T
Nucleotide: A
Nucleotide: A
Nucleotide: T
Nucleotide: C
Nucleotide: G


**Key Takeaways**   
- Strings are iterable in Python, meaning you can process each character using a for loop.
- This is the foundation for extracting and analyzing data in sequences.

**Step 2: Counting Nucleotides with a Dictionary**    
Use a dictionary to store and count the occurrences of each nucleotide in the DNA sequence.

In [2]:
dna_sequence = "ATGCGTAATCG"

# Step 2: Initialize an empty dictionary
nucleotide_counts = {}

# Count each nucleotide
for nucleotide in dna_sequence:
    if nucleotide in nucleotide_counts:
        nucleotide_counts[nucleotide] += 1
    else:
        nucleotide_counts[nucleotide] = 1

# Print the results
print("Nucleotide Counts:", nucleotide_counts)


Nucleotide Counts: {'A': 3, 'T': 3, 'G': 3, 'C': 2}


**Key Takeaways**   
- Dictionaries are perfect for storing key-value pairs like nucleotide types (A, T, G, C) and their counts.
- This approach is efficient and easily extensible for larger datasets.

**Step 3: Checking for Special "Codons"**   
Identify specific codons (substrings of three characters) in the DNA sequence. For example, ATG often signals the start of a protein sequence.

In [3]:
dna_sequence = "ATGCGTAAATGCCC"
start_codon = "ATG"
stop_codon = "TAA"

# Step 3: Search for codons
codon_positions = {"start": [], "stop": []}

# Slice the string in steps of 3 (codons are three nucleotides long)
for i in range(0, len(dna_sequence) - 2):
    codon = dna_sequence[i:i+3]
    if codon == start_codon:
        codon_positions["start"].append(i)
    elif codon == stop_codon:
        codon_positions["stop"].append(i)

print("Codon Positions:", codon_positions)


Codon Positions: {'start': [0, 8], 'stop': [5]}


**Key Takeaways**   
- String slicing ([i:i+3]) lets you extract codons from the DNA sequence.
- Searching for specific substrings (like start/stop codons) is a powerful way to identify biological signals or patterns.

## Conclusion   
This lesson introduced fundamental tools for working with strings in Python:

- String slicing and iteration for processing sequence data.
- Dictionaries for efficiently counting and organizing information.
- Substring matching to identify meaningful patterns, like biological codons.

**Why It’s Important**    
Whether you're analyzing DNA sequences or text data, these techniques form the foundation of data parsing and pattern recognition. As an extension, try applying these concepts to longer sequences or add more codons to recognize!