# üß¨ Molecular Biology: From Atoms to Life - Hands-On Practice

## Table of Contents
1. [DNA Sequence Analysis](#practice-1-dna-sequence-analysis)
2. [Protein Structure Visualization](#practice-2-protein-structure-visualization)
3. [Transcription Simulation](#practice-3-transcription-simulation)
4. [Translation and Genetic Code](#practice-4-translation-and-genetic-code)
5. [pH and Biological Systems](#practice-5-ph-and-biological-systems)
6. [Exploring PDB Database](#practice-6-exploring-pdb-database)

**Learning Objectives:**
- Apply central dogma concepts through coding
- Analyze DNA sequences and predict protein products
- Visualize molecular structures using Python
- Calculate biological parameters (pH, energy, etc.)

## Installing and Importing Essential Libraries

In [None]:
# Import essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

# Visualization settings
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12
sns.set_style('whitegrid')

print("‚úÖ All libraries loaded successfully!")
print("Ready to explore molecular biology!")

---
## Practice 1: DNA Sequence Analysis

### üéØ Learning Objectives
- Understand DNA structure and base pairing rules
- Calculate GC content (important for DNA stability)
- Implement complementary strand generation

### üìñ Key Concepts
**Watson-Crick Base Pairing:**
- Adenine (A) pairs with Thymine (T) - 2 hydrogen bonds
- Guanine (G) pairs with Cytosine (C) - 3 hydrogen bonds
- **GC content** = (G + C) / Total bases √ó 100%
- Higher GC content ‚Üí More stable DNA (3 H-bonds vs 2)

In [None]:
# 1.1 DNA sequence analysis functions
def analyze_dna_sequence(dna_seq):
    """
    Analyze a DNA sequence and calculate its properties
    
    Parameters:
    -----------
    dna_seq : str
        DNA sequence string (A, T, G, C)
    
    Returns:
    --------
    dict : Dictionary containing sequence statistics
    """
    dna_seq = dna_seq.upper()
    
    # Count bases
    base_counts = Counter(dna_seq)
    total_bases = len(dna_seq)
    
    # Calculate GC content
    gc_count = base_counts['G'] + base_counts['C']
    gc_content = (gc_count / total_bases) * 100
    
    # Generate complementary strand
    complement_map = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
    complement = ''.join([complement_map[base] for base in dna_seq])
    
    # Calculate melting temperature (simplified)
    # Tm = 2(A+T) + 4(G+C) for short sequences
    tm = 2 * (base_counts['A'] + base_counts['T']) + 4 * (base_counts['G'] + base_counts['C'])
    
    return {
        'sequence': dna_seq,
        'length': total_bases,
        'base_counts': dict(base_counts),
        'gc_content': gc_content,
        'complement': complement,
        'melting_temp': tm
    }

# Example DNA sequence
sample_dna = "ATGGCTAGCGATCGATCGTAGCTAGCTAGC"

# Analyze
results = analyze_dna_sequence(sample_dna)

print("DNA Sequence Analysis")
print("=" * 50)
print(f"Original sequence:  5'-{results['sequence']}-3'")
print(f"Complement strand:  3'-{results['complement']}-5'")
print(f"\nLength: {results['length']} bp")
print(f"\nBase composition:")
for base, count in results['base_counts'].items():
    percentage = (count / results['length']) * 100
    print(f"  {base}: {count} ({percentage:.1f}%)")
print(f"\nüß¨ GC Content: {results['gc_content']:.1f}%")
print(f"üå°Ô∏è  Estimated Tm: {results['melting_temp']}¬∞C")
print("\nüí° Note: Higher GC content = More stable DNA (3 H-bonds vs 2)")

In [None]:
# 1.2 Visualize base composition
def visualize_dna_composition(dna_seq):
    """Create visualizations of DNA sequence composition"""
    results = analyze_dna_sequence(dna_seq)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Base composition bar chart
    bases = list(results['base_counts'].keys())
    counts = list(results['base_counts'].values())
    colors = ['#E91E63', '#2196F3', '#FF9800', '#4CAF50']  # A, T, G, C
    
    axes[0].bar(bases, counts, color=colors, alpha=0.8, edgecolor='black')
    axes[0].set_xlabel('Base', fontsize=12, fontweight='bold')
    axes[0].set_ylabel('Count', fontsize=12, fontweight='bold')
    axes[0].set_title('Base Composition', fontsize=14, fontweight='bold')
    axes[0].grid(axis='y', alpha=0.3)
    
    # GC content pie chart
    gc_count = results['base_counts']['G'] + results['base_counts']['C']
    at_count = results['base_counts']['A'] + results['base_counts']['T']
    
    axes[1].pie([gc_count, at_count], 
                labels=['GC', 'AT'], 
                colors=['#2196F3', '#E91E63'],
                autopct='%1.1f%%',
                startangle=90,
                textprops={'fontsize': 12, 'fontweight': 'bold'})
    axes[1].set_title('GC vs AT Content', fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    print(f"üìä GC Content: {results['gc_content']:.1f}%")
    if results['gc_content'] > 60:
        print("   ‚Üí High GC content: Very stable DNA")
    elif results['gc_content'] > 40:
        print("   ‚Üí Moderate GC content: Average stability")
    else:
        print("   ‚Üí Low GC content: Less stable DNA")

visualize_dna_composition(sample_dna)

---
## Practice 2: Transcription Simulation (DNA ‚Üí RNA)

### üéØ Learning Objectives
- Implement the transcription process
- Understand the relationship between DNA and RNA
- Learn about the template strand

### üìñ Key Concepts
**Transcription:**
- DNA template strand (3' ‚Üí 5') is read
- RNA synthesized in 5' ‚Üí 3' direction
- Thymine (T) replaced by Uracil (U) in RNA
- RNA polymerase catalyzes the reaction

In [None]:
# 2.1 Transcription: DNA ‚Üí RNA
def transcribe_dna_to_rna(dna_seq, template_strand=False):
    """
    Transcribe DNA sequence to RNA
    
    Parameters:
    -----------
    dna_seq : str
        DNA sequence
    template_strand : bool
        If True, treats input as template strand (3'‚Üí5')
        If False, treats input as coding strand (5'‚Üí3')
    
    Returns:
    --------
    str : RNA sequence
    """
    dna_seq = dna_seq.upper()
    
    if template_strand:
        # If template strand, create complement and replace T with U
        complement_map = {'A': 'U', 'T': 'A', 'G': 'C', 'C': 'G'}
        rna_seq = ''.join([complement_map[base] for base in dna_seq])
    else:
        # If coding strand, just replace T with U
        rna_seq = dna_seq.replace('T', 'U')
    
    return rna_seq

# Example
coding_strand = "ATGGCTAGCGATCGATCG"
template_strand = "TACCGATCGCTAGCTAGC"

rna_from_coding = transcribe_dna_to_rna(coding_strand, template_strand=False)
rna_from_template = transcribe_dna_to_rna(template_strand, template_strand=True)

print("Transcription Process")
print("=" * 60)
print("\nMethod 1: From Coding Strand")
print(f"  DNA (coding):  5'-{coding_strand}-3'")
print(f"  RNA (mRNA):    5'-{rna_from_coding}-3'")

print("\nMethod 2: From Template Strand")
print(f"  DNA (template):  3'-{template_strand}-5'")
print(f"  RNA (mRNA):      5'-{rna_from_template}-3'")

print("\nüí° Key Points:")
print("   ‚Ä¢ RNA polymerase reads template strand (3'‚Üí5')")
print("   ‚Ä¢ mRNA synthesized in 5'‚Üí3' direction")
print("   ‚Ä¢ Uracil (U) replaces Thymine (T) in RNA")

---
## Practice 3: Translation and Genetic Code (RNA ‚Üí Protein)

### üéØ Learning Objectives
- Implement the genetic code translation
- Understand codons and the universal genetic code
- Find start and stop codons

### üìñ Key Concepts
**Translation:**
- 3 nucleotides = 1 codon = 1 amino acid
- Start codon: AUG (Methionine)
- Stop codons: UAA, UAG, UGA
- 64 possible codons encode 20 amino acids (degeneracy)

In [None]:
# 3.1 Genetic code dictionary
GENETIC_CODE = {
    'UUU': 'F', 'UUC': 'F', 'UUA': 'L', 'UUG': 'L',
    'UCU': 'S', 'UCC': 'S', 'UCA': 'S', 'UCG': 'S',
    'UAU': 'Y', 'UAC': 'Y', 'UAA': '*', 'UAG': '*',
    'UGU': 'C', 'UGC': 'C', 'UGA': '*', 'UGG': 'W',
    'CUU': 'L', 'CUC': 'L', 'CUA': 'L', 'CUG': 'L',
    'CCU': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
    'CAU': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
    'CGU': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
    'AUU': 'I', 'AUC': 'I', 'AUA': 'I', 'AUG': 'M',
    'ACU': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',
    'AAU': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',
    'AGU': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
    'GUU': 'V', 'GUC': 'V', 'GUA': 'V', 'GUG': 'V',
    'GCU': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
    'GAU': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',
    'GGU': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'
}

# Amino acid full names
AMINO_ACID_NAMES = {
    'A': 'Alanine', 'C': 'Cysteine', 'D': 'Aspartic acid', 'E': 'Glutamic acid',
    'F': 'Phenylalanine', 'G': 'Glycine', 'H': 'Histidine', 'I': 'Isoleucine',
    'K': 'Lysine', 'L': 'Leucine', 'M': 'Methionine', 'N': 'Asparagine',
    'P': 'Proline', 'Q': 'Glutamine', 'R': 'Arginine', 'S': 'Serine',
    'T': 'Threonine', 'V': 'Valine', 'W': 'Tryptophan', 'Y': 'Tyrosine',
    '*': 'STOP'
}

def translate_rna_to_protein(rna_seq, start_codon=True):
    """
    Translate RNA sequence to protein
    
    Parameters:
    -----------
    rna_seq : str
        RNA sequence
    start_codon : bool
        If True, start translation from first AUG
    
    Returns:
    --------
    str : Protein sequence (amino acids)
    """
    rna_seq = rna_seq.upper()
    
    # Find start codon
    if start_codon:
        start_pos = rna_seq.find('AUG')
        if start_pos == -1:
            return "No start codon (AUG) found!"
        rna_seq = rna_seq[start_pos:]
    
    protein = []
    
    # Translate codons
    for i in range(0, len(rna_seq) - 2, 3):
        codon = rna_seq[i:i+3]
        amino_acid = GENETIC_CODE.get(codon, '?')
        
        if amino_acid == '*':  # Stop codon
            break
        
        protein.append(amino_acid)
    
    return ''.join(protein)

# Example: Full Central Dogma
print("Central Dogma: DNA ‚Üí RNA ‚Üí Protein")
print("=" * 60)

dna = "ATGGCTAGCGATCGATCGTAGCTAGCTAG"
print(f"\n1Ô∏è‚É£  DNA:     5'-{dna}-3'")

rna = transcribe_dna_to_rna(dna)
print(f"2Ô∏è‚É£  RNA:     5'-{rna}-3'")

protein = translate_rna_to_protein(rna)
print(f"3Ô∏è‚É£  Protein:    {protein}")

# Show detailed translation
print("\nüìñ Detailed Translation:")
print("   Codon ‚Üí Amino Acid")
print("   " + "-" * 30)

start_pos = rna.find('AUG')
rna_coding = rna[start_pos:]

for i in range(0, min(len(rna_coding), 30), 3):
    if i + 3 <= len(rna_coding):
        codon = rna_coding[i:i+3]
        aa = GENETIC_CODE.get(codon, '?')
        aa_name = AMINO_ACID_NAMES.get(aa, 'Unknown')
        print(f"   {codon} ‚Üí {aa} ({aa_name})")
        if aa == '*':
            break

---
## Practice 4: pH and Buffer Systems

### üéØ Learning Objectives
- Calculate pH from H‚Å∫ concentration
- Understand the Henderson-Hasselbalch equation
- Visualize pH effects on biological systems

### üìñ Key Concepts
**pH Calculations:**
- pH = -log[H‚Å∫]
- Henderson-Hasselbalch: pH = pKa + log([A‚Åª]/[HA])
- Blood pH: 7.35-7.45 (tightly regulated)
- Enzyme activity is pH-dependent

In [None]:
# 4.1 pH calculations
def calculate_ph(h_concentration):
    """Calculate pH from H+ concentration"""
    return -np.log10(h_concentration)

def henderson_hasselbalch(pKa, acid_conc, conjugate_base_conc):
    """Calculate pH using Henderson-Hasselbalch equation"""
    return pKa + np.log10(conjugate_base_conc / acid_conc)

# Example calculations
print("pH Calculations in Biological Systems")
print("=" * 60)

# Common biological pH values
systems = {
    'Stomach acid': 1e-2,
    'Lemon juice': 1e-2.5,
    'Blood': 4.5e-8,
    'Pure water': 1e-7,
    'Pancreatic juice': 1e-8.5
}

print("\nBiological pH Values:")
print("-" * 40)
for system, h_conc in systems.items():
    ph = calculate_ph(h_conc)
    print(f"{system:20s}: pH = {ph:.2f}")

# Buffer calculation example
print("\n\nBuffer System Example (Phosphate Buffer):")
print("-" * 40)
pKa_phosphate = 7.2
acid_conc = 0.1
base_conc = 0.1
buffer_ph = henderson_hasselbalch(pKa_phosphate, acid_conc, base_conc)
print(f"pKa = {pKa_phosphate}")
print(f"[H‚ÇÇPO‚ÇÑ‚Åª] = {acid_conc} M")
print(f"[HPO‚ÇÑ¬≤‚Åª] = {base_conc} M")
print(f"\nCalculated pH = {buffer_ph:.2f}")
print("\nüí° At equal concentrations, pH = pKa")

In [None]:
# 4.2 Visualize pH scale
def visualize_ph_scale():
    """Create visualization of pH scale with biological examples"""
    
    fig, ax = plt.subplots(figsize=(12, 6))
    
    # pH scale
    ph_range = np.arange(0, 15, 0.1)
    colors = plt.cm.RdYlBu_r(np.linspace(0, 1, len(ph_range)))
    
    # Draw pH scale
    for i, ph in enumerate(ph_range):
        ax.barh(0, 0.1, left=ph, height=0.5, color=colors[i], edgecolor='none')
    
    # Add biological examples
    examples = [
        (2, 'Stomach\n(pH 2)', -0.8),
        (7.4, 'Blood\n(pH 7.4)', 0.8),
        (8, 'Intestine\n(pH 8)', -0.8),
        (5.5, 'Skin\n(pH 5.5)', 0.8)
    ]
    
    for ph, label, y_offset in examples:
        ax.plot([ph, ph], [0, y_offset], 'k-', linewidth=2)
        ax.plot(ph, y_offset, 'ko', markersize=10)
        ax.text(ph, y_offset + np.sign(y_offset)*0.15, label, 
                ha='center', va='bottom' if y_offset > 0 else 'top',
                fontsize=10, fontweight='bold')
    
    # Labels
    ax.text(3, -1.5, 'ACIDIC', fontsize=14, fontweight='bold', 
            ha='center', color='#E91E63')
    ax.text(7, -1.5, 'NEUTRAL', fontsize=14, fontweight='bold', 
            ha='center', color='#4CAF50')
    ax.text(11, -1.5, 'BASIC', fontsize=14, fontweight='bold', 
            ha='center', color='#2196F3')
    
    ax.set_xlim(0, 14)
    ax.set_ylim(-2, 1.5)
    ax.set_xlabel('pH', fontsize=14, fontweight='bold')
    ax.set_title('pH Scale in Biological Systems', fontsize=16, fontweight='bold')
    ax.set_yticks([])
    ax.spines['left'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    
    plt.tight_layout()
    plt.show()

visualize_ph_scale()

---
## Practice 5: Energy in Biological Bonds

### üéØ Learning Objectives
- Compare bond energies in biological molecules
- Understand ATP hydrolysis
- Calculate energy transfers

### üìñ Key Concepts
**Bond Energies:**
- Covalent bonds: 50-200 kcal/mol (strongest)
- Hydrogen bonds: 1-5 kcal/mol
- ATP hydrolysis: ŒîG¬∞' = -7.3 kcal/mol
- Van der Waals: <1 kcal/mol (weakest)

In [None]:
# 5.1 Bond energy calculations
def calculate_atp_energy(n_molecules, cellular_conditions=False):
    """
    Calculate energy from ATP hydrolysis
    
    Parameters:
    -----------
    n_molecules : int
        Number of ATP molecules
    cellular_conditions : bool
        If True, use cellular ŒîG (~-12 kcal/mol)
        If False, use standard ŒîG¬∞' (-7.3 kcal/mol)
    """
    if cellular_conditions:
        delta_g = -12  # kcal/mol in cellular conditions
    else:
        delta_g = -7.3  # kcal/mol standard conditions
    
    total_energy = n_molecules * delta_g
    
    return total_energy, delta_g

print("ATP Energy Calculations")
print("=" * 60)

# Calculate for different processes
processes = {
    'Glycolysis (net)': 2,
    'Complete glucose oxidation': 32,
    'Muscle contraction (1 cycle)': 1
}

print("\nEnergy from different metabolic processes:")
print("-" * 60)

for process, atp_count in processes.items():
    energy_standard, _ = calculate_atp_energy(atp_count, cellular_conditions=False)
    energy_cellular, _ = calculate_atp_energy(atp_count, cellular_conditions=True)
    
    print(f"\n{process}:")
    print(f"  ATP produced: {atp_count} molecules")
    print(f"  Energy (standard): {abs(energy_standard):.1f} kcal/mol")
    print(f"  Energy (cellular): {abs(energy_cellular):.1f} kcal/mol")

# Compare bond energies
print("\n\nBond Energy Comparison:")
print("-" * 60)
bond_types = {
    'Covalent (C-C)': (50, 200),
    'Hydrogen bond': (1, 5),
    'Ionic interaction': (5, 10),
    'Van der Waals': (0.1, 1)
}

for bond, (min_e, max_e) in bond_types.items():
    print(f"{bond:20s}: {min_e:6.1f} - {max_e:6.1f} kcal/mol")

In [None]:
# 5.2 Visualize bond energies
def visualize_bond_energies():
    """Create bar chart of different bond energies"""
    
    bond_types = ['Covalent', 'Ionic', 'Hydrogen\nBond', 'Van der\nWaals']
    energies = [125, 7.5, 3, 0.5]  # Average values in kcal/mol
    colors = ['#1E64C8', '#FFA726', '#66BB6A', '#EF5350']
    
    fig, ax = plt.subplots(figsize=(10, 6))
    
    bars = ax.bar(bond_types, energies, color=colors, alpha=0.8, edgecolor='black', linewidth=2)
    
    # Add value labels
    for bar, energy in zip(bars, energies):
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{energy:.1f}\nkcal/mol',
                ha='center', va='bottom', fontsize=11, fontweight='bold')
    
    ax.set_ylabel('Bond Energy (kcal/mol)', fontsize=12, fontweight='bold')
    ax.set_title('‚ö° Bond Energy Comparison in Biology', fontsize=14, fontweight='bold')
    ax.set_ylim(0, max(energies) * 1.2)
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("üìä Key Insights:")
    print("   ‚Ä¢ Covalent bonds provide structural stability")
    print("   ‚Ä¢ Hydrogen bonds enable DNA base pairing")
    print("   ‚Ä¢ Weak bonds allow dynamic biological processes")

visualize_bond_energies()

---
## Practice 6: Exploring Protein Structures (PDB Database)

### üéØ Learning Objectives
- Learn to query the PDB database
- Understand protein structure files
- Analyze structure quality metrics

### üìñ Key Concepts
**PDB (Protein Data Bank):**
- Repository of 3D structures
- Resolution: <2√Ö is high quality
- Contains experimental data (X-ray, NMR, Cryo-EM)
- AlphaFold: AI-predicted structures

In [None]:
# 6.1 Simulated PDB data exploration
def create_sample_protein_data():
    """
    Create sample protein structure data
    (In real applications, you would fetch from RCSB PDB API)
    """
    
    proteins = {
        '1AKE': {
            'name': 'Adenylate Kinase',
            'organism': 'E. coli',
            'method': 'X-ray diffraction',
            'resolution': 2.0,
            'length': 214,
            'function': 'Energy metabolism'
        },
        '1HHO': {
            'name': 'Hemoglobin',
            'organism': 'Human',
            'method': 'X-ray diffraction',
            'resolution': 2.1,
            'length': 574,
            'function': 'Oxygen transport'
        },
        '2LYZ': {
            'name': 'Lysozyme',
            'organism': 'Chicken',
            'method': 'X-ray diffraction',
            'resolution': 1.6,
            'length': 129,
            'function': 'Antibacterial enzyme'
        }
    }
    
    return pd.DataFrame(proteins).T

# Create and display protein data
protein_df = create_sample_protein_data()

print("üß¨ Sample Protein Structures from PDB")
print("=" * 80)
print(protein_df.to_string())

print("\n\nüìä Structure Quality Assessment:")
print("-" * 80)
for pdb_id, row in protein_df.iterrows():
    print(f"\n{pdb_id} - {row['name']}:")
    print(f"  Resolution: {row['resolution']} √Ö", end="")
    
    if row['resolution'] < 2.0:
        print(" ‚Üí ‚≠ê Excellent quality")
    elif row['resolution'] < 2.5:
        print(" ‚Üí ‚úÖ Good quality")
    else:
        print(" ‚Üí ‚ö†Ô∏è  Moderate quality")
    
    print(f"  Length: {row['length']} amino acids")
    print(f"  Function: {row['function']}")

In [None]:
# 6.2 Visualize protein properties
def visualize_protein_properties(df):
    """Create visualizations of protein structure properties"""
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Resolution comparison
    axes[0].bar(df.index, df['resolution'], 
                color=['#4CAF50' if r < 2.0 else '#2196F3' for r in df['resolution']],
                alpha=0.8, edgecolor='black', linewidth=2)
    axes[0].axhline(y=2.0, color='red', linestyle='--', linewidth=2, label='2.0 √Ö threshold')
    axes[0].set_ylabel('Resolution (√Ö)', fontsize=12, fontweight='bold')
    axes[0].set_title('Structure Resolution', fontsize=14, fontweight='bold')
    axes[0].legend()
    axes[0].grid(axis='y', alpha=0.3)
    axes[0].invert_yaxis()  # Lower resolution = better
    
    # Protein length comparison
    axes[1].barh(df.index, df['length'],
                 color=['#FF9800', '#E91E63', '#9C27B0'],
                 alpha=0.8, edgecolor='black', linewidth=2)
    axes[1].set_xlabel('Number of Amino Acids', fontsize=12, fontweight='bold')
    axes[1].set_title('Protein Length', fontsize=14, fontweight='bold')
    axes[1].grid(axis='x', alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("\nüí° Resolution Guide:")
    print("   ‚Ä¢ <1.5 √Ö: Atomic detail visible")
    print("   ‚Ä¢ 1.5-2.5 √Ö: High quality, suitable for most studies")
    print("   ‚Ä¢ 2.5-4.0 √Ö: Moderate quality")
    print("   ‚Ä¢ >4.0 √Ö: Low resolution")

visualize_protein_properties(protein_df)

---
## üéØ Practice Complete!

### Summary of What We Learned:

1. **DNA Sequence Analysis**: Base composition, GC content, complementary strands
2. **Central Dogma**: DNA ‚Üí RNA (transcription) ‚Üí Protein (translation)
3. **Genetic Code**: 64 codons, start/stop codons, amino acid encoding
4. **pH Systems**: Buffer calculations, Henderson-Hasselbalch equation
5. **Biological Energy**: Bond energies, ATP hydrolysis, energy transfers
6. **Protein Structures**: PDB database, resolution metrics, structure quality

### Key Computational Skills Developed:
- ‚úÖ Sequence manipulation and analysis
- ‚úÖ Implementation of biological algorithms
- ‚úÖ Visualization of molecular data
- ‚úÖ Quantitative analysis of biological systems

### Next Steps:
- Advanced sequence alignment (BLAST)
- Phylogenetic tree construction
- Molecular dynamics simulations
- Machine learning for protein structure prediction

---

## üî¨ Try It Yourself!

**Challenge Problems:**

1. Find the longest open reading frame (ORF) in a DNA sequence
2. Calculate isoelectric point (pI) of a protein
3. Predict protein secondary structure from sequence
4. Analyze restriction enzyme cut sites
5. Simulate PCR amplification

In [None]:
# Bonus: Complete workflow example
def complete_molecular_analysis(dna_sequence):
    """
    Perform complete analysis: DNA ‚Üí RNA ‚Üí Protein ‚Üí Analysis
    """
    print("üß¨ Complete Molecular Biology Analysis")
    print("=" * 70)
    
    # 1. Analyze DNA
    dna_results = analyze_dna_sequence(dna_sequence)
    print(f"\n1Ô∏è‚É£  DNA Analysis:")
    print(f"   Sequence: {dna_results['sequence']}")
    print(f"   Length: {dna_results['length']} bp")
    print(f"   GC content: {dna_results['gc_content']:.1f}%")
    
    # 2. Transcribe to RNA
    rna = transcribe_dna_to_rna(dna_sequence)
    print(f"\n2Ô∏è‚É£  Transcription:")
    print(f"   mRNA: {rna}")
    
    # 3. Translate to protein
    protein = translate_rna_to_protein(rna)
    print(f"\n3Ô∏è‚É£  Translation:")
    print(f"   Protein: {protein}")
    print(f"   Length: {len(protein)} amino acids")
    
    # 4. Summary
    print(f"\nüìä Summary:")
    print(f"   DNA ‚Üí RNA ‚Üí Protein")
    print(f"   {dna_results['length']} bp ‚Üí {len(rna)} nt ‚Üí {len(protein)} aa")
    print(f"\n‚úÖ Analysis complete!")
    
    return dna_results, rna, protein

# Test with a sample sequence
test_dna = "ATGGCTAGCGATCGATCGTAGCTAGCTAGTAATAG"
complete_molecular_analysis(test_dna)