#### 🤔 Why Use Functions?

#### Without Functions (Copy-Paste Hell):
```python
# Analyzing sequence 1
seq1 = "ATGC..."
gc_count = 0
for base in seq1:
    if base in "GC":
        gc_count += 1
gc1 = gc_count / len(seq1) * 100

# Analyzing sequence 2 - COPY-PASTE!
seq2 = "GCTA..."
gc_count = 0
for base in seq2:
    if base in "GC":
        gc_count += 1
gc2 = gc_count / len(seq2) * 100
```

#### With Functions (Reusable!):
```python
def gc_content(seq):
    gc_count = sum(1 for base in seq if base in "GC")
    return gc_count / len(seq) * 100

gc1 = gc_content("ATGC...")
gc2 = gc_content("GCTA...")
```

**Benefits:** Less code, fewer bugs, easier to maintain!

#### 🏗️ Function Basics

#### Function Syntax:
```python
def function_name(parameters):
    """Docstring explaining what the function does."""
    # code here
    return result
```

In [1]:
# Simple function example
def greet(name):
    """Greet a person by name."""
    return f"Hello, {name}!"

# Call the function
message = greet("Bioinformatician")
print(message)

Hello, Bioinformatician!


#### 🧬 Function 1: GC Content Calculator

In [None]:
def gc_content(seq):
    """
    Calculate GC content percentage of a DNA sequence.
    
    Args:
        seq (str): DNA sequence string
    
    Returns:
        float: GC content as a percentage
    """
    g = seq.count("G")
    c = seq.count("C")
    return (g + c) / len(seq) * 100

# Test it
dna = "ATGCCGATTA"
result = gc_content(dna)
print(f"Sequence: {dna}")
print(f"GC content: {result:.1f}%")

#### 🔍 Function 2: DNA Sequence Validator

In [None]:
def validate_dna(seq):
    """
    Check if a sequence contains only valid DNA bases (A, T, G, C).
    
    Args:
        seq (str): DNA sequence to validate
    
    Returns:
        bool: True if valid, False otherwise
    """
    valid_bases = "ATGC"
    for base in seq.upper():
        if base not in valid_bases:
            return False
    return True

# Test with valid and invalid sequences
test_sequences = [
    "ATGCGTA",
    "ATGCXTA",
    "AUGCGUA",
    "atgcgta"
]

for seq in test_sequences:
    is_valid = validate_dna(seq)
    status = "✓ Valid" if is_valid else "✗ Invalid"
    print(f"{seq:15} {status}")

#### 📊 Function 3: Nucleotide Frequencies

In [None]:
def nucleotide_frequencies(seq):
    """
    Count the frequency of each nucleotide in a DNA sequence.
    
    Args:
        seq (str): DNA sequence
    
    Returns:
        dict: Dictionary with nucleotide counts
    """
    frequencies = {"A": 0, "T": 0, "G": 0, "C": 0}
    
    for base in seq.upper():
        if base in frequencies:
            frequencies[base] += 1
    
    return frequencies

# Test it
dna = "ATGCCGATTAatgc"
freqs = nucleotide_frequencies(dna)

print(f"Sequence: {dna}")
print(f"Frequencies: {freqs}")
print(f"\nFormatted:")
for base, count in freqs.items():
    print(f"  {base}: {count}")

#### 🔗 Chaining Functions Together

Let's combine our functions to create a complete analysis pipeline:

In [None]:
def analyze_sequence(seq):
    """
    Perform complete analysis of a DNA sequence.
    Only analyzes if the sequence is valid.
    
    Args:
        seq (str): DNA sequence to analyze
    """
    # Step 1: Validate
    if not validate_dna(seq):
        print(f"⚠️  Invalid DNA sequence: {seq}")
        print("    Contains non-DNA characters!")
        return
    
    # Step 2: Get frequencies
    freqs = nucleotide_frequencies(seq)
    
    # Step 3: Calculate GC content
    gc = gc_content(seq)
    
    # Step 4: Print report
    print(f"Sequence Analysis for: {seq}")
    print("=" * 50)
    print(f"Length: {len(seq)} bp")
    print(f"\nNucleotide Composition:")
    for base, count in freqs.items():
        percentage = count / len(seq) * 100
        print(f"  {base}: {count:3d} ({percentage:5.1f}%)")
    print(f"\nGC Content: {gc:.1f}%")
    print("=" * 50)

# Test with different sequences
analyze_sequence("ATGCCGATTA")
print()
analyze_sequence("ATGCXGATTA")  # Invalid
print()
analyze_sequence("GCGCGCGCGC")  # High GC

#### 🔄 Function 4: Reverse Complement

The reverse complement is important for DNA analysis (the other strand):

In [None]:
def reverse_complement(seq):
    """
    Return the reverse complement of a DNA sequence.
    
    Args:
        seq (str): DNA sequence
    
    Returns:
        str: Reverse complement sequence
    """
    # Complement mapping
    complement = {"A": "T", "T": "A", "G": "C", "C": "G"}
    
    # Get complement
    comp_seq = ""
    for base in seq.upper():
        comp_seq += complement.get(base, base)
    
    # Reverse it
    return comp_seq[::-1]

# Test it
dna = "ATGCCGATTA"
rev_comp = reverse_complement(dna)

print(f"Original:          5'-{dna}-3'")
print(f"Reverse Complement: 5'-{rev_comp}-3'")

# Verify: reverse complement of reverse complement = original
original_again = reverse_complement(rev_comp)
print(f"\nDouble reverse:    5'-{original_again}-3'")
print(f"Matches original? {original_again == dna}")

#### 🧮 Function 5: AT/GC Ratio

In [None]:
def at_gc_ratio(seq):
    """
    Calculate the AT/GC ratio of a DNA sequence.
    
    Args:
        seq (str): DNA sequence
    
    Returns:
        float: AT/GC ratio
    """
    at_count = seq.count("A") + seq.count("T")
    gc_count = seq.count("G") + seq.count("C")
    
    if gc_count == 0:
        return float('inf')  # Infinite ratio
    
    return at_count / gc_count

# Test with different sequences
sequences = {
    "AT-rich": "ATATATATATAT",
    "GC-rich": "GCGCGCGCGCGC",
    "Balanced": "ATGCATGCATGC"
}

for name, seq in sequences.items():
    ratio = at_gc_ratio(seq)
    gc = gc_content(seq)
    print(f"{name:12} | AT/GC ratio: {ratio:.2f} | GC%: {gc:.1f}%")

#### 🎯 Practice Exercise 1

Create a function that counts the number of specific codons:

In [None]:
def count_codon(seq, codon):
    """
    Count how many times a specific codon appears in a sequence.
    
    Args:
        seq (str): DNA sequence
        codon (str): The codon to search for (3 bases)
    
    Returns:
        int: Number of times the codon appears
    """
    # YOUR CODE HERE
    pass

# Test it
dna = "ATGATGCGTATGAAA"
print(f"Sequence: {dna}")
print(f"ATG count: {count_codon(dna, 'ATG')}")

#### 🎯 Practice Exercise 2

Create a function that finds all start codon positions:

In [None]:
def find_start_codons(seq):
    """
    Find all positions where ATG (start codon) appears.
    
    Args:
        seq (str): DNA sequence
    
    Returns:
        list: List of positions where ATG starts
    """
    # YOUR CODE HERE
    pass

# Test it
dna = "ATGATGCGTATGAAA"
positions = find_start_codons(dna)
print(f"Sequence: {dna}")
print(f"Start codons at positions: {positions}")

#### 📦 Building Your Bioinformatics Toolkit

Let's put all our functions together in one place:

In [None]:
# Complete Bioinformatics Toolkit

class DNAToolkit:
    """A collection of DNA analysis functions."""
    
    @staticmethod
    def validate_dna(seq):
        """Check if sequence is valid DNA."""
        return all(base in "ATGC" for base in seq.upper())
    
    @staticmethod
    def gc_content(seq):
        """Calculate GC content percentage."""
        gc_count = sum(1 for base in seq.upper() if base in "GC")
        return (gc_count / len(seq)) * 100 if len(seq) > 0 else 0
    
    @staticmethod
    def nucleotide_frequencies(seq):
        """Count nucleotide frequencies."""
        freqs = {"A": 0, "T": 0, "G": 0, "C": 0}
        for base in seq.upper():
            if base in freqs:
                freqs[base] += 1
        return freqs
    
    @staticmethod
    def reverse_complement(seq):
        """Return reverse complement."""
        complement = {"A": "T", "T": "A", "G": "C", "C": "G"}
        comp = "".join(complement.get(base, base) for base in seq.upper())
        return comp[::-1]

# Use the toolkit
dna = "ATGCCGATTA"
print(f"Analyzing: {dna}\n")
print(f"Valid DNA? {DNAToolkit.validate_dna(dna)}")
print(f"GC content: {DNAToolkit.gc_content(dna):.1f}%")
print(f"Frequencies: {DNAToolkit.nucleotide_frequencies(dna)}")
print(f"Rev Comp: {DNAToolkit.reverse_complement(dna)}")

#### 🧪 Comprehensive Analysis Function

In [None]:
def comprehensive_analysis(seq, seq_name="Unknown"):
    """
    Perform comprehensive DNA sequence analysis.
    
    Args:
        seq (str): DNA sequence
        seq_name (str): Name/ID for the sequence
    """
    print(f"\n{'='*60}")
    print(f"SEQUENCE ANALYSIS: {seq_name}")
    print(f"{'='*60}")
    
    # Validation
    if not validate_dna(seq):
        print("⚠️  ERROR: Invalid DNA sequence!")
        return
    
    # Basic info
    print(f"\nSequence: {seq}")
    print(f"Length: {len(seq)} bp")
    
    # Nucleotide composition
    freqs = nucleotide_frequencies(seq)
    print(f"\nNucleotide Composition:")
    for base in ["A", "T", "G", "C"]:
        count = freqs[base]
        pct = (count / len(seq)) * 100
        bar = "█" * (count * 2)
        print(f"  {base}: {bar} {count:2d} ({pct:5.1f}%)")
    
    # GC content
    gc = gc_content(seq)
    print(f"\nGC Content: {gc:.1f}%")
    
    # Classification
    if gc > 60:
        classification = "High GC (>60%)"
    elif gc >= 40:
        classification = "Medium GC (40-60%)"
    else:
        classification = "Low GC (<40%)"
    print(f"Classification: {classification}")
    
    # Reverse complement
    rev_comp = reverse_complement(seq)
    print(f"\nReverse Complement: {rev_comp}")
    
    # AT/GC ratio
    ratio = at_gc_ratio(seq)
    print(f"AT/GC Ratio: {ratio:.2f}")
    
    print(f"\n{'='*60}\n")

# Test with multiple sequences
test_sequences = [
    ("E. coli gene", "ATGAAACGCATTAGCACCACC"),
    ("GC-rich", "GCGCGCGCGCGCGC"),
    ("AT-rich", "ATATATATATATATAT")
]

for name, seq in test_sequences:
    comprehensive_analysis(seq, name)

#### 🤔 Reflection Questions

1. Why wrap code in functions instead of copy-pasting?
2. What is a docstring and why is it important?
3. How do parameters and return values work?
4. Give one example of how you could improve the functions above.

#### 🏠 Homework Challenge

Create these three functions:

1. **`translate_codon(codon)`** - Return the amino acid for a codon
2. **`find_stop_codons(seq)`** - Find all stop codon positions
3. **`extract_gene(seq, start, stop)`** - Extract sequence between positions

Then chain them together!

In [None]:
# Homework coding space
# YOUR CODE HERE

#### 🎉 Summary

You've learned:
- ✅ How to define and call functions
- ✅ Using parameters and return values
- ✅ Writing docstrings for documentation
- ✅ Creating reusable bioinformatics utilities
- ✅ Chaining functions together
- ✅ Building a complete DNA analysis toolkit

**Next lesson:** We'll read FASTA files and apply our functions to real data! 📄

---

## 🚀 Next Lesson

Ready to continue? Open the next lesson notebook:
**[Lesson 05: Fasta](lesson05_fasta_notebook.ipynb)**