<a href="https://colab.research.google.com/github/nrflynn2/swe-molecular-sciences/blob/main/interactive/week10-student-notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Algorithms for Molecular Sciences
## UC Berkeley Masters in Molecular Sciences and Software Engineering
### Week 10: Graph Algorithms, String Matching, and Priority Queues

---

## Learning Objectives

By the end of this session, you will be able to:

1. **Apply Kosaraju-Sharir algorithm** to identify protein complexes in interaction networks
2. **Implement KMP string matching** for efficient DNA sequence analysis
3. **Design priority queue solutions** for molecular simulation problems
4. **Analyze algorithmic complexity** for large-scale biological datasets
5. **Integrate multiple algorithms** to solve complex molecular problems

---

## Lecture Agenda

| Time | Topic | Duration |
|------|-------|----------|
| 0:00-0:10 | Introduction & Biological Motivation | 10 min |
| 0:10-0:40 | Strongly Connected Components in Protein Networks | 30 min |
| 0:40-1:10 | KMP Algorithm for DNA Pattern Matching | 30 min |
| 1:10-1:40 | Priority Queues for Particle Simulations | 30 min |
| 1:40-1:55 | Performance Analysis & Integration | 15 min |
| 1:55-2:00 | Summary & Next Steps | 5 min |


## Required Imports and Setup

Let's start by importing all the libraries we'll need for today's advanced algorithms:

In [3]:
# Core Python libraries
import random
import heapq
import time
from collections import defaultdict, deque
from typing import List, Tuple, Dict, Set, Optional

# Scientific computing
import numpy as np

# Visualization (optional)
try:
    import matplotlib.pyplot as plt
    PLOTTING_AVAILABLE = True
except ImportError:
    PLOTTING_AVAILABLE = False
    print("Matplotlib not available. Visualizations will be skipped.")

# Set random seed for reproducibility
random.seed(42)
np.random.seed(42)

print("All libraries imported successfully!")
print("Ready to explore advanced algorithms in molecular sciences!")
print(f"Plotting available: {PLOTTING_AVAILABLE}")

All libraries imported successfully!
Ready to explore advanced algorithms in molecular sciences!
Plotting available: True


---

# Section 1: Introduction & Biological Motivation (10 minutes)

## Today's Focus: Advanced Algorithms for Molecular Analysis

In modern computational biology, we face three major challenges:

### 1. **Network Analysis Challenge**
- Protein interaction networks contain thousands of nodes
- Need to identify functional modules and regulatory circuits
- Solution: **Kosaraju-Sharir algorithm** for strongly connected components

### 2. **Sequence Analysis Challenge**
- DNA sequences contain billions of base pairs
- Must find patterns efficiently (promoters, restriction sites, etc.)
- Solution: **KMP algorithm** for linear-time pattern matching

### 3. **Simulation Challenge**
- Molecular dynamics involve millions of time-dependent events
- Need to process events in chronological order efficiently
- Solution: **Priority queues** using binary heaps

## Why These Algorithms Matter

| Problem | Naive Approach | Our Solution | Speedup |
|---------|---------------|--------------|---------|
| Finding protein complexes | O(n³) | O(n + m) Kosaraju | ~1000x for large networks |
| DNA pattern matching | O(nm) | O(n + m) KMP | ~100x for long sequences |
| Event scheduling | O(n²) sorting | O(n log n) heap | ~10x for simulations |

Let's dive into each algorithm with hands-on implementations!

In [2]:
# Quick demonstration of scale differences
def demonstrate_scale_impact():
    """Show why algorithm choice matters for molecular data"""
    
    print("="*60)
    print("SCALE IMPACT IN MOLECULAR SCIENCES")
    print("="*60)
    
    # Example: Human genome analysis
    genome_size = 3_000_000_000  # 3 billion base pairs
    pattern_size = 20  # typical primer length
    
    # Calculate operations needed
    naive_ops = genome_size * pattern_size
    kmp_ops = genome_size + pattern_size
    
    print(f"\nSearching for a {pattern_size}bp pattern in human genome:")
    print(f"  Genome size: {genome_size:,} base pairs")
    print(f"  Naive algorithm: {naive_ops:,} operations")
    print(f"  KMP algorithm: {kmp_ops:,} operations")
    print(f"  Speedup: {naive_ops/kmp_ops:.1f}x faster!")
    
    # Example: Protein network analysis
    n_proteins = 20_000  # typical for human proteome
    
    print(f"\nAnalyzing protein interaction network:")
    print(f"  Number of proteins: {n_proteins:,}")
    print(f"  Naive complexity: O(n³) = {n_proteins**3:,} operations")
    print(f"  Kosaraju complexity: O(n) = {n_proteins:,} operations")
    print(f"  Speedup: {n_proteins**2:,}x faster!")

demonstrate_scale_impact()

SCALE IMPACT IN MOLECULAR SCIENCES

Searching for a 20bp pattern in human genome:
  Genome size: 3,000,000,000 base pairs
  Naive algorithm: 60,000,000,000 operations
  KMP algorithm: 3,000,000,020 operations
  Speedup: 20.0x faster!

Analyzing protein interaction network:
  Number of proteins: 20,000
  Naive complexity: O(n³) = 8,000,000,000,000 operations
  Kosaraju complexity: O(n) = 20,000 operations
  Speedup: 400,000,000x faster!


---

# Section 2: Kosaraju-Sharir Algorithm for Protein Networks (30 minutes)

## Biological Context: Protein Regulatory Networks

Proteins form complex regulatory networks where:
- **Directed edges** represent regulatory relationships (A → B means A regulates B)
- **Feedback loops** create regulatory circuits essential for cellular function
- **Strongly connected components** represent functional modules

### Example: p53-MDM2 Feedback Loop
The tumor suppressor p53 and its regulator MDM2 form a critical feedback loop:
- p53 activates MDM2 expression
- MDM2 inhibits p53 activity
- This creates a strongly connected component that regulates cell cycle

In [None]:
class ProteinInteractionGraph:
    """
    Directed graph representing protein-protein regulatory interactions.
    Used to model gene regulatory networks and signaling pathways.
    """
    
    def __init__(self):
        self.graph = defaultdict(list)  # Adjacency list
        self.reverse_graph = defaultdict(list)  # Reversed edges
        self.vertices = set()
        self.protein_functions = {}  # Biological annotations
        
    def add_interaction(self, regulator: str, target: str, 
                       reg_function: str = None, target_function: str = None):
        """
        Add a regulatory interaction: regulator → target
        
        Args:
            regulator: Source protein (upstream regulator)
            target: Target protein (downstream target)
            reg_function: Biological function of regulator
            target_function: Biological function of target
        """
        self.graph[regulator].append(target)
        self.reverse_graph[target].append(regulator)
        self.vertices.add(regulator)
        self.vertices.add(target)
        
        if reg_function:
            self.protein_functions[regulator] = reg_function
        if target_function:
            self.protein_functions[target] = target_function
    
    def dfs_first_pass(self, vertex: str, visited: Set[str], stack: List[str]):
        """
        First DFS pass to determine finishing times.
        
        TODO: Implement DFS traversal
        Steps:
        1. Mark current vertex as visited
        2. Recursively visit all unvisited neighbors  
        3. After exploring all neighbors, add vertex to stack
        
        This creates a topological-like ordering that we'll use
        in the second pass to identify SCCs.
        """
        # TODO: Your implementation here
        pass
    
    def dfs_second_pass(self, vertex: str, visited: Set[str], component: List[str]):
        """
        Second DFS pass on reversed graph to identify SCCs.
        
        TODO: Implement DFS on reversed graph
        Steps:
        1. Mark vertex as visited
        2. Add vertex to current component
        3. Recursively visit unvisited neighbors in REVERSED graph
        
        This collects all vertices in the same SCC.
        """
        # TODO: Your implementation here
        pass
    
    def find_strongly_connected_components(self) -> List[List[str]]:
        """
        Kosaraju-Sharir Algorithm Implementation
        
        Time Complexity: O(V + E) where V = vertices, E = edges
        Space Complexity: O(V) for visited set and stack
        
        Returns:
            List of SCCs, each SCC is a list of protein names
        """
        # Phase 1: First DFS to get finishing times
        visited = set()
        stack = []
        
        print("Phase 1: Computing finish times...")
        for vertex in self.vertices:
            if vertex not in visited:
                self.dfs_first_pass(vertex, visited, stack)
        
        # Phase 2: Second DFS on reversed graph
        visited.clear()
        sccs = []
        
        print("Phase 2: Finding SCCs in reverse graph...")
        
        # TODO: Process vertices in reverse order of finishing times
        # while stack:
        #     vertex = stack.pop()
        #     if vertex not in visited:
        #         component = []
        #         self.dfs_second_pass(vertex, visited, component)
        #         sccs.append(component)
        
        return sccs

In [None]:
# Build a biologically realistic protein network
def build_cell_cycle_network():
    """
    Create a simplified cell cycle and DNA damage response network.
    Based on real regulatory relationships in human cells.
    """
    network = ProteinInteractionGraph()
    
    # p53-MDM2 feedback loop (tumor suppression)
    network.add_interaction("p53", "MDM2", "tumor_suppressor", "p53_inhibitor")
    network.add_interaction("MDM2", "p53", "p53_inhibitor", "tumor_suppressor")
    network.add_interaction("p53", "p21", "tumor_suppressor", "cdk_inhibitor")
    
    # Cell cycle machinery
    network.add_interaction("CDK2", "CyclinE", "kinase", "cyclin")
    network.add_interaction("CyclinE", "CDK2", "cyclin", "kinase")
    network.add_interaction("CDK2", "RB", "kinase", "cell_cycle_regulator")
    network.add_interaction("p21", "CDK2", "cdk_inhibitor", "kinase")
    
    # DNA damage response
    network.add_interaction("ATM", "p53", "damage_sensor", "tumor_suppressor")
    network.add_interaction("ATM", "CHK2", "damage_sensor", "checkpoint")
    network.add_interaction("CHK2", "p53", "checkpoint", "tumor_suppressor")
    
    # Apoptosis pathway
    network.add_interaction("p53", "BAX", "tumor_suppressor", "apoptosis")
    network.add_interaction("BAX", "CytC", "apoptosis", "apoptosis")
    
    return network

# Test your implementation
print("\n" + "="*60)
print("EXERCISE 1: Complete the Kosaraju-Sharir Implementation")
print("="*60)
print("\nUncomment the code below after implementing the DFS methods:")

network = build_cell_cycle_network()
print(f"\nNetwork Statistics:")
print(f"  Proteins: {len(network.vertices)}")
print(f"  Interactions: {sum(len(neighbors) for neighbors in network.graph.values())}")
 
sccs = network.find_strongly_connected_components()
print(f"\nFound {len(sccs)} strongly connected components:")
for i, scc in enumerate(sccs, 1):
    if len(scc) > 1:
        print(f"  Component {i}: {scc} (regulatory module)")
    else:
        print(f"  Component {i}: {scc[0]} (isolated)")

## Exercise 1.2: Pathway Reachability Analysis

After finding SCCs, we need to analyze which proteins can be affected by drug targets:

In [None]:
def find_downstream_targets(graph: ProteinInteractionGraph, drug_target: str) -> Set[str]:
    """
    Find all proteins downstream of a potential drug target.
    This represents the cascade of effects from inhibiting/activating a protein.
    
    TODO: Implement BFS or DFS to find reachable proteins
    
    Args:
        graph: Protein interaction network
        drug_target: Protein targeted by drug
    
    Returns:
        Set of all proteins affected downstream
    """
    if drug_target not in graph.vertices:
        return set()
    
    affected = set()
    queue = deque([drug_target])
    
    # TODO: Implement graph traversal
    # while queue:
    #     current = queue.popleft()
    #     ...
    
    return affected

# Test downstream analysis
print("\n" + "="*60)
print("DRUG TARGET ANALYSIS")
print("="*60)
downstream = find_downstream_targets(network, "p53")
print(f"\nInhibiting p53 would affect: {downstream}")

---

# Section 3: KMP Algorithm for DNA Pattern Matching (30 minutes)

## Biological Context: DNA Sequence Analysis

Pattern matching is fundamental in genomics for:
- **Finding regulatory elements** (promoters, enhancers)
- **Identifying restriction sites** for cloning
- **Detecting sequence motifs** in genes
- **Designing CRISPR guides** and checking off-targets

The KMP algorithm achieves O(n+m) time complexity by avoiding redundant comparisons.

In [None]:
class DNAPatternMatcher:
    """
    Efficient DNA sequence pattern matching using KMP algorithm.
    Handles IUPAC ambiguity codes and finds overlapping patterns.
    """
    
    @staticmethod
    def build_lps_array(pattern: str) -> List[int]:
        """
        Build the Longest Proper Prefix-Suffix (LPS) array.
        This is the key to KMP's efficiency.
        
        TODO: Complete the LPS array construction
        
        The LPS array stores the length of the longest proper prefix
        which is also a suffix for each position in the pattern.
        
        Example: pattern = "ACACAGT"
        LPS = [0, 0, 1, 2, 3, 0, 0]
        """
        m = len(pattern)
        lps = [0] * m
        length = 0
        i = 1
        
        # TODO: Build LPS array
        # while i < m:
        #     if pattern[i] == pattern[length]:
        #         length += 1
        #         lps[i] = length
        #         i += 1
        #     else:
        #         if length != 0:
        #             length = lps[length - 1]
        #         else:
        #             lps[i] = 0
        #             i += 1
        
        return lps
    
    @staticmethod
    def kmp_search(sequence: str, pattern: str) -> List[int]:
        """
        KMP pattern matching algorithm.
        
        TODO: Implement the main KMP search
        
        Time Complexity: O(n + m) where n = sequence length, m = pattern length
        Space Complexity: O(m) for the LPS array
        
        Returns:
            List of starting positions where pattern is found
        """
        if not pattern or not sequence:
            return []
        
        n, m = len(sequence), len(pattern)
        if m > n:
            return []
        
        # Build LPS array
        lps = DNAPatternMatcher.build_lps_array(pattern)
        
        positions = []
        i = 0  # Index for sequence
        j = 0  # Index for pattern
        
        # TODO: Implement KMP search
        # while i < n:
        #     if j < m and sequence[i] == pattern[j]:
        #         i += 1
        #         j += 1
        #     
        #     if j == m:
        #         positions.append(i - j)
        #         j = lps[j - 1]
        #     elif i < n and sequence[i] != pattern[j]:
        #         if j != 0:
        #             j = lps[j - 1]
        #         else:
        #             i += 1
        
        return positions

In [None]:
# Test KMP implementation with real DNA sequences
print("="*60)
print("EXERCISE 2: Complete the KMP Implementation")
print("="*60)

# Example: Finding TATA box in a promoter
promoter = "GCGCAATTATAAAACGTGACGGGAAAACCGTGTGTCAATTAACCACAAGATCGCTAGC"
tata_box = "TATAAA"

print(f"\nSearching for TATA box ({tata_box}) in promoter:")
print(f"Sequence: {promoter}")

# Uncomment after implementing KMP
positions = DNAPatternMatcher.kmp_search(promoter, tata_box)
if positions:
    print(f"✓ Found at position(s): {positions}")
    for pos in positions:
        print(f"  {' '*pos}^{'-'*(len(tata_box)-1)}")
else:
    print("✗ Not found")

## Exercise 2.2: Restriction Enzyme Mapping

Use KMP to create a restriction map showing where enzymes would cut DNA:

In [None]:
class RestrictionMapper:
    """Map restriction enzyme sites in DNA sequences for cloning design."""
    
    # Common restriction enzymes and their recognition sequences
    ENZYMES = {
        "EcoRI": "GAATTC",
        "BamHI": "GGATCC",
        "HindIII": "AAGCTT",
        "PstI": "CTGCAG",
        "XbaI": "TCTAGA",
        "SalI": "GTCGAC"
    }
    
    @staticmethod
    def create_restriction_map(dna_sequence: str) -> Dict[str, List[int]]:
        """
        TODO: Create a restriction map using KMP for each enzyme.
        
        Args:
            dna_sequence: DNA sequence to analyze
            
        Returns:
            Dictionary mapping enzyme names to cut positions
        """
        restriction_map = {}
        
        # TODO: For each enzyme, find all recognition sites
        # for enzyme, site in RestrictionMapper.ENZYMES.items():
        #     positions = DNAPatternMatcher.kmp_search(dna_sequence, site)
        #     if positions:
        #         restriction_map[enzyme] = positions
        
        return restriction_map

# Test restriction mapping
plasmid = promoter + "GAATTCAAAGGATCCAAGCTTCTGCAG"

print("\n" + "="*60)
print("RESTRICTION ENZYME MAPPING")
print("="*60)
print(f"\nAnalyzing plasmid ({len(plasmid)} bp)")

# Uncomment after implementing
restriction_map = RestrictionMapper.create_restriction_map(plasmid)
print("\nRestriction sites found:")
for enzyme, sites in restriction_map.items():
    print(f"  {enzyme}: {sites}")

---

# Section 4: Priority Queues for Particle Simulations (30 minutes)

## Biological Context: Molecular Dynamics and Decay

Priority queues are essential for:
- **Event-driven molecular dynamics** simulations
- **Radioactive tracer decay** in medical imaging
- **Chemical reaction scheduling** in systems biology
- **Time-dependent biological processes**

We'll implement a particle decay simulator that models stochastic processes.

In [None]:
class Particle:
    """
    Represents a particle with decay properties.
    Used in molecular dynamics and radioactive decay simulations.
    """
    
    def __init__(self, particle_id: int, decay_time: float):
        """
        TODO: Initialize particle with ID and decay time
        
        Args:
            particle_id: Unique identifier
            decay_time: Time when particle will decay
        """
        # TODO: Store id and decay_time as instance variables
        pass
    
    def __lt__(self, other):
        """
        TODO: Define comparison for heap ordering.
        Particles should be ordered by decay_time (earlier = higher priority).
        
        This method is crucial for the heap to work correctly!
        """
        # TODO: Return True if self.decay_time < other.decay_time
        pass
    
    def __repr__(self):
        """String representation for debugging."""
        # TODO: Return f"Particle(id={self.id}, decay_time={self.decay_time:.3f})"
        pass

In [None]:
class ParticleSimulator:
    """
    Event-driven particle decay simulator using a priority queue.
    Models stochastic decay processes with collision events.
    """
    
    def __init__(self):
        """Initialize the simulation with an empty heap."""
        self.heap = []  # Min-heap of particles
        self.decay_events = []  # Record of decay events
        
    def add_particle(self, particle: Particle):
        """
        TODO: Add a particle to the simulation.
        Use heapq.heappush to maintain heap property.
        """
        # TODO: Push particle onto heap
        pass
    
    def get_next_decay(self) -> Optional[Particle]:
        """
        TODO: Get the next particle to decay.
        Use heapq.heappop to remove minimum element.
        """
        # TODO: Pop and return next particle if heap not empty
        pass
    
    def simulate_collision(self, particle: Particle) -> Optional[Particle]:
        """
        Simulate a collision with 50% probability.
        Collisions create new particles (modeling decay products).
        
        TODO: Implement collision logic
        """
        if random.random() < 0.5:  # 50% collision probability
            # Create a new particle from collision
            new_id = particle.id * 10 + random.randint(1, 9)
            new_decay_time = particle.decay_time + random.random() * 5
            # TODO: Return new Particle(new_id, new_decay_time)
            pass
        return None
    
    def run_simulation(self, particles: List[Particle]) -> List[Tuple[int, float]]:
        """
        Run the complete decay simulation.
        
        TODO: Implement the main simulation loop
        """
        # Add all initial particles
        for p in particles:
            self.add_particle(p)
        
        results = []
        
        # TODO: Process particles in order of decay time
        # while self.heap:
        #     particle = self.get_next_decay()
        #     results.append((particle.id, particle.decay_time))
        #     
        #     # Check for collision
        #     new_particle = self.simulate_collision(particle)
        #     if new_particle:
        #         self.add_particle(new_particle)
        
        return results

In [None]:
def generate_particles(n: int) -> List[Particle]:
    """
    Generate n particles with random decay times.
    
    TODO: Create list of particles with random decay times
    """
    particles = []
    # TODO: Generate n particles with IDs 0 to n-1
    # for i in range(n):
    #     decay_time = random.random() * 10  # Random time between 0 and 10
    #     particles.append(Particle(i, decay_time))
    return particles

# Test particle simulation
print("="*60)
print("EXERCISE 3: Complete the Particle Simulation")
print("="*60)

# Uncomment after implementing
print("\nGenerating particles for simulation...")
particles = generate_particles(5)
for p in particles:
    print(f"  {p}")
 
print("\nRunning decay simulation...")
simulator = ParticleSimulator()
events = simulator.run_simulation(particles)
 
print("\nDecay events in chronological order:")
for i, (pid, time) in enumerate(events, 1):
    print(f"  Event {i}: Particle {pid} decayed at t={time:.3f}")

---

# Section 5: Performance Analysis & Integration (15 minutes)

## Comparing Algorithm Performance

Let's analyze the performance improvements from using advanced algorithms:

In [None]:
def performance_comparison():
    """
    Compare naive vs optimized algorithms on realistic datasets.
    """
    import time
    
    print("="*60)
    print("PERFORMANCE COMPARISON")
    print("="*60)
    
    # Test 1: String matching (KMP vs Naive)
    print("\nString Matching Performance:")
    dna_sizes = [1000, 5000, 10000]
    pattern = "ACGTACGT"
    
    for size in dna_sizes:
        # Generate random DNA
        dna = ''.join(random.choice("ACGT") for _ in range(size))
        
        # Naive search
        def naive_search(text, pattern):
            positions = []
            n, m = len(text), len(pattern)
            for i in range(n - m + 1):
                if text[i:i+m] == pattern:
                    positions.append(i)
            return positions
        
        start = time.time()
        naive_result = naive_search(dna, pattern)
        naive_time = time.time() - start
        
        # KMP search (when implemented)
        start = time.time()
        kmp_result = DNAPatternMatcher.kmp_search(dna, pattern)
        kmp_time = time.time() - start
        
        print(f"  DNA size: {size:,} bp")
        print(f"    Naive: {naive_time:.4f} seconds")
        print(f"    KMP:   {kmp_time:.4f} seconds")
        print(f"    Speedup: {naive_time/kmp_time:.2f}x")
    
    # Test 2: Priority Queue vs List sorting
    print("\nEvent Scheduling Performance:")
    n_particles = [100, 500, 1000]
    
    for n in n_particles:
        particles = [(i, random.random()*100) for i in range(n)]
        
        # Using sorted list (naive)
        start = time.time()
        sorted_list = sorted(particles, key=lambda x: x[1])
        list_time = time.time() - start
        
        # Using heap
        start = time.time()
        heap = []
        for p in particles:
            heapq.heappush(heap, (p[1], p[0]))
        heap_result = []
        while heap:
            heap_result.append(heapq.heappop(heap))
        heap_time = time.time() - start
        
        print(f"  Particles: {n:,}")
        print(f"    List sort: {list_time:.4f} seconds")
        print(f"    Heap:      {heap_time:.4f} seconds")
        if heap_time > 0:
            print(f"    Speedup:   {list_time/heap_time:.2f}x")

# Run performance comparison
performance_comparison()

## Integrated Analysis Pipeline

Real biological problems often require combining multiple algorithms:

In [None]:
class IntegratedMolecularAnalysis:
    """
    Combines graph, string, and priority queue algorithms for 
    comprehensive molecular analysis.
    """
    
    def __init__(self):
        self.protein_network = None
        self.dna_sequences = {}
        self.simulation_queue = []
    
    def analyze_drug_target(self, target_protein: str, drug_sequence: str):
        """
        Comprehensive analysis combining all three algorithms:
        1. Find protein complex using SCC
        2. Search for drug binding sites using KMP
        3. Simulate molecular dynamics using priority queue
        
        TODO: Implement integrated analysis
        """
        results = {
            'protein_complex': [],
            'binding_sites': [],
            'simulation_events': []
        }
        
        # TODO: Integrate all three algorithms
        
        return results

print("\n" + "="*60)
print("INTEGRATED ANALYSIS CHALLENGE")
print("="*60)
print("\nCombine all three algorithms to solve a complex problem:")
print("1. Find protein regulatory modules (Kosaraju-Sharir)")
print("2. Identify DNA binding motifs (KMP)")
print("3. Simulate molecular interactions (Priority Queue)")
print("\nThis mimics real drug discovery pipelines!")

---

# Section 6: Summary & Next Steps (5 minutes)

## Key Takeaways

### Algorithms Mastered Today:

| Algorithm | Time Complexity | Space | Application |
|-----------|----------------|-------|-------------|
| **Kosaraju-Sharir** | O(V + E) | O(V) | Protein complex detection |
| **KMP** | O(n + m) | O(m) | DNA pattern matching |
| **Binary Heap** | O(log n) | O(n) | Event simulation |

### Biological Applications:

1. **Protein Networks**: 
   - Identify functional modules and feedback loops
   - Predict drug effects through pathway analysis
   - Find master regulators in disease

2. **DNA Analysis**:
   - Efficiently find regulatory elements
   - Design primers and CRISPR guides
   - Map restriction sites for cloning

3. **Molecular Simulation**:
   - Model time-dependent processes
   - Simulate decay and collision events
   - Schedule reactions in metabolic models

## Final Thoughts
- **Graph algorithms** reveal hidden structure in biological networks
- **String algorithms** make genome-scale analysis feasible  
- **Priority queues** enable realistic molecular simulations