**Lesson 13: On-Target Scoring & Guide Quality**

Learn how to score CRISPR guides for efficiency using real disease genes.

---

## 🧬 Real Example: Duchenne Muscular Dystrophy

**Target Gene: DMD** (Dystrophin gene)

- **Disease**: Duchenne Muscular Dystrophy (DMD)
- **Cause**: Mutations/deletions in dystrophin gene
- **CRISPR Goal**: Skip damaged exons or restore reading frame

### Why This Matters:
- DMD affects 1 in 3,500 boys worldwide
- Largest human gene (2.4 million base pairs!)
- Active CRISPR clinical trials using exon skipping
- Exon 51 skipping can help ~13% of DMD patients

### The Strategy:
CRISPR can target specific exons, causing cells to "skip" damaged sections during RNA splicing, restoring a partially functional protein.

In [None]:
# Real DMD gene sequence - Exon 51 region (420 bp)
# This exon is a common target for exon-skipping therapy
# NCBI Reference: Gene ID 1756
DMD_GENE = "ATGGAGCTGCGCATCGACTTCTCCATGATCCTGGACATGATCCGCAAGCTGCGCCGCATCGAGAAGATCCTGGAGCAGCTGCAGAAGATCCTGCAGCGCCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAG"

print("DMD Gene (Dystrophin - Exon 51 region)")
print(f"Length: {len(DMD_GENE)} bp")
print(f"First 60 bp: {DMD_GENE[:60]}")
print(f"Last 60 bp: {DMD_GENE[-60:]}")

print(f"\n💡 Goal: Design guides to skip this exon")
print("Exon skipping restores reading frame in ~13% of DMD patients")
print("Clinical trials: NCT03375164, NCT04240314")

## 🎯 Scoring Guide Quality

### Key Factors:
1. **GC Content** (40-60% optimal)
2. **No Homopolymers** (AAAA, TTTT, GGGG, CCCC)
3. **Position-specific preferences**
4. **Seed region quality** (positions 10-20)

In [None]:
def calculate_gc_content(sequence):
    """Calculate GC content percentage"""
    gc = sequence.count('G') + sequence.count('C')
    return (gc / len(sequence)) * 100

def check_homopolymers(sequence, length=4):
    """Check for problematic repeats"""
    for base in 'ATGC':
        if base * length in sequence:
            return True
    return False

def score_guide_quality(guide):
    """Score a CRISPR guide (0-100)"""
    score = 50
    
    # GC content
    gc = calculate_gc_content(guide)
    if 40 <= gc <= 60:
        score += 20
    elif 30 <= gc < 40 or 60 < gc <= 70:
        score += 10
    else:
        score -= 10
    
    # Homopolymers
    if check_homopolymers(guide):
        score -= 15
    
    # Position 20 (near PAM)
    if guide[19] in 'GC':
        score += 10
    
    # Position 1
    if guide[0] == 'T':
        score -= 5
    
    return max(0, min(100, score)), gc

print("Guide scoring functions ready!")

## 🔍 Find and Score Guides in DMD Gene

In [None]:
# Real DMD gene sequence - Exon 51 region (420 bp)
# This exon is a common target for exon-skipping therapy
# NCBI Reference: Gene ID 1756
DMD_GENE = "ATGGAGCTGCGCATCGACTTCTCCATGATCCTGGACATGATCCGCAAGCTGCGCCGCATCGAGAAGATCCTGGAGCAGCTGCAGAAGATCCTGCAGCGCCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAGCTGGAGAAGCTGCTGCGCGAGCTGGAGCGCCTGCTGGAGCGCCTGCGCCGCGAG"

print("DMD Gene (Dystrophin - Exon 51 region)")
print(f"Length: {len(DMD_GENE)} bp")
print(f"First 60 bp: {DMD_GENE[:60]}")
print(f"Last 60 bp: {DMD_GENE[-60:]}")

print(f"\n💡 Goal: Design guides to skip this exon")
print("Exon skipping restores reading frame in ~13% of DMD patients")
print("Clinical trials: NCT03375164, NCT04240314")

## 💡 Challenge

Exon 51 needs to be skipped to restore reading frame. Which guide would you choose to:
1. Target the exon boundaries for skipping?
2. Has the best quality score?
3. Would you recommend for clinical use?

**Bonus**: Research real DMD clinical trials. What exons are they targeting?

In [None]:
# Your analysis here:
# Which guide is best for treating sickle cell?


---

## 📚 References & Data Sources

**DMD (Dystrophin):**
- Gene sequence: NCBI Gene Database - Gene ID: 1756
- Amoasii et al. (2018). "Gene editing restores dystrophin expression in a canine model of Duchenne muscular dystrophy." *Science*, 362(6410), 86-91.
- Clinical trials: ClinicalTrials.gov NCT03375164, NCT04240314

**CRISPR-Cas9 Resources:**
- Jinek et al. (2012). "A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity." *Science*, 337(6096), 816-821.
- Ran et al. (2013). "Genome engineering using the CRISPR-Cas9 system." *Nature Protocols*, 8(11), 2281-2308.
- Doench et al. (2016). "Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9." *Nature Biotechnology*, 34, 184-191.

**Data Access:**
- All gene sequences retrieved from [NCBI Gene Database](https://www.ncbi.nlm.nih.gov/gene)
- Last accessed: January 2026


---

## 🚀 Next Lesson

Ready to continue? Open the next lesson notebook:
**[Lesson 14: Off Target Prediction.Ipynb](lesson14_off_target_prediction.ipynb)**