**Lesson 12: PAM Sequence Identification**

Learn to identify PAM sequences and find potential CRISPR target sites in DNA.

---

## 🎯 What is a PAM Sequence?

**PAM** = Protospacer Adjacent Motif

### Why PAM Matters
- Cas9 CANNOT cut DNA without a PAM nearby
- PAM acts as a "recognition signal"
- Different Cas proteins recognize different PAMs

### Common PAM Sequences

| Cas Protein | PAM Sequence | Example |
|-------------|--------------|----------|
| SpCas9 | NGG | AGG, TGG, CGG, GGG |
| SaCas9 | NNGRRT | AAGAAT, TTGCTT |
| Cas12a | TTTV | TTTA, TTTC, TTTG |

**N** = any nucleotide (A, T, G, or C)  
**R** = purine (A or G)  
**V** = not T (A, C, or G)

In [None]:
# Let's find all NGG PAM sequences in a DNA sequence

def find_pam_sites(dna_sequence, pam_pattern="NGG"):
    """
    Find all PAM sequences (NGG) in a DNA sequence
    Returns list of positions where PAM starts
    """
    pam_positions = []
    
    # Search through sequence
    for i in range(len(dna_sequence) - 2):  # -2 because PAM is 3bp
        potential_pam = dna_sequence[i:i+3]
        
        # Check if it matches NGG pattern
        if potential_pam[1:3] == "GG":  # Last two must be GG
            if potential_pam[0] in "ATGC":  # First can be any base
                pam_positions.append(i)
    
    return pam_positions

# Test sequence
test_dna = "ATGCTAGCTGATCGATCGATAGGCTAGCTGATCGATCGATAGGTACGATCGATCGA"

pam_sites = find_pam_sites(test_dna)
print(f"Found {len(pam_sites)} PAM sequences")
print(f"PAM positions: {pam_sites}")

# Show each PAM
for pos in pam_sites:
    pam = test_dna[pos:pos+3]
    print(f"Position {pos}: {pam}")

## 🧬 Finding Complete CRISPR Target Sites

A complete CRISPR target site includes:
1. **20bp guide sequence** (before PAM)
2. **3bp PAM sequence** (NGG)

```
Position in DNA:
        |----- 20bp guide -----|PAM
5'- ... GCACTGCCTAGTACGATCGA AGG ... -3'
```

### Important: Direction Matters!
- DNA has two strands (forward and reverse complement)
- PAMs can be on either strand
- Must check both strands for CRISPR sites

In [None]:
def get_reverse_complement(dna):
    """Get reverse complement of DNA sequence"""
    complement = {"A": "T", "T": "A", "G": "C", "C": "G"}
    return "".join([complement[base] for base in dna[::-1]])

def find_crispr_targets(dna_sequence):
    """
    Find all complete CRISPR target sites (20bp guide + PAM)
    Returns targets from both forward and reverse strands
    """
    targets = []
    
    # Find PAM sites
    pam_sites = find_pam_sites(dna_sequence)
    
    for pam_pos in pam_sites:
        # Need 20bp before PAM for guide sequence
        if pam_pos >= 20:
            guide_start = pam_pos - 20
            guide_seq = dna_sequence[guide_start:pam_pos]
            pam_seq = dna_sequence[pam_pos:pam_pos+3]
            full_target = guide_seq + pam_seq
            
            targets.append({
                "position": guide_start,
                "strand": "forward",
                "guide": guide_seq,
                "pam": pam_seq,
                "full_site": full_target
            })
    
    return targets

# Find targets in our test sequence
dna = "ATGCTAGCTGATCGATCGATAGGCTAGCTGATCGATCGATAGGTACGATCGATCGA"
targets = find_crispr_targets(dna)

print(f"Found {len(targets)} CRISPR target sites:\n")
for i, target in enumerate(targets, 1):
    print(f"Target {i}:")
    print(f"  Position: {target['position']}")
    print(f"  Guide RNA: {target['guide']}")
    print(f"  PAM: {target['pam']}")
    print(f"  Full site: {target['full_site']}")
    print()

## 🔍 Visualizing CRISPR Target Sites

Let's create a visual representation of where CRISPR can cut:

In [None]:
def visualize_targets(dna_sequence, targets):
    """
    Create a visual representation of CRISPR target sites
    """
    print("DNA Sequence:")
    print(dna_sequence)
    print()
    
    for i, target in enumerate(targets, 1):
        # Create marker line
        marker = " " * target['position'] + "^" * 20 + "[PAM]"
        print(f"Target {i}: {marker}")
    
    print()
    print("Legend: ^ = guide RNA binding site, [PAM] = PAM location")

# Visualize our targets
visualize_targets(dna, targets)

## 🧪 Real Gene Example

Let's find CRISPR targets in a real gene sequence (partial human BRCA1):

In [None]:
# Partial BRCA1 gene sequence (first 200bp)
brca1_partial = (
    "ATGGATTTATCTGCTCTTCGCGTTGAAGAAGTACAAAATGTCATTAATGCTATGCAGAAAATC"
    "TTAGAGTGTCCCATCTGTCTGGAGTTGATCAAGGAACCTGTCTCCACAAAGTGTGACCACATA"
    "TTTTGCAAATTTTGCATGCTGAAACTTCTCAACCAGAAGAAAGGGCCTTCACAGTGTCCTTAT"
)

print("Searching BRCA1 gene for CRISPR targets...\n")
brca1_targets = find_crispr_targets(brca1_partial)

print(f"Found {len(brca1_targets)} potential CRISPR target sites in BRCA1\n")

for i, target in enumerate(brca1_targets, 1):
    print(f"BRCA1 Target {i}:")
    print(f"  Guide: {target['guide']}")
    print(f"  PAM: {target['pam']}")
    print()

## 💡 Challenge Exercise

Enhance the `find_crispr_targets()` function to:

1. Search the **reverse complement strand** too
2. Calculate the **GC content** of each guide sequence
3. Filter targets with GC content between 40-60% (optimal range)

Hint: Use your GC content function from Lesson 3!

In [None]:
# Your enhanced function here:
def calculate_gc_content(sequence):
    """Calculate GC content percentage"""
    # Your code from Lesson 3
    pass

def find_optimal_crispr_targets(dna_sequence, min_gc=40, max_gc=60):
    """
    Find CRISPR targets with optimal GC content
    Check both forward and reverse strands
    """
    # Your code here
    pass


---

## 🚀 Next Lesson

Ready to continue? Open the next lesson notebook:
**[Lesson 13: On Target Scoring.Ipynb](lesson13_on_target_scoring.ipynb)**