**Lesson 13: On-Target Scoring & Guide Quality**

Learn how to score CRISPR guides for efficiency using real disease genes.

---

## 🧬 Real Example: Sickle Cell Disease

**Target Gene: HBB** (Beta-globin gene)

- **Disease**: Sickle cell anemia
- **Mutation**: Single base change (GAG → GTG)
- **CRISPR Goal**: Fix the mutation

### Why This Matters:
- First CRISPR clinical trials are treating sickle cell
- Real patients are being cured with this approach
- Guide quality is CRITICAL for patient safety

In [None]:
# Real HBB gene sequence (447 bp)
HBB_GENE = "ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGG"

print("HBB Gene (Beta-globin)")
print(f"Length: {len(HBB_GENE)} bp")
print(f"First 60 bp: {HBB_GENE[:60]}")
print(f"Last 60 bp: {HBB_GENE[-60:]}")

# The sickle cell mutation is at position 17 (GAG → GTG)
mutation_region = HBB_GENE[10:30]
print(f"\nMutation region: {mutation_region}")
print("Position 17: GAG (codes for Glutamic acid)")
print("In sickle cell: GTG (codes for Valine) - causes disease")

## 🎯 Scoring Guide Quality

### Key Factors:
1. **GC Content** (40-60% optimal)
2. **No Homopolymers** (AAAA, TTTT, GGGG, CCCC)
3. **Position-specific preferences**
4. **Seed region quality** (positions 10-20)

In [None]:
def calculate_gc_content(sequence):
    """Calculate GC content percentage"""
    gc = sequence.count('G') + sequence.count('C')
    return (gc / len(sequence)) * 100

def check_homopolymers(sequence, length=4):
    """Check for problematic repeats"""
    for base in 'ATGC':
        if base * length in sequence:
            return True
    return False

def score_guide_quality(guide):
    """Score a CRISPR guide (0-100)"""
    score = 50
    
    # GC content
    gc = calculate_gc_content(guide)
    if 40 <= gc <= 60:
        score += 20
    elif 30 <= gc < 40 or 60 < gc <= 70:
        score += 10
    else:
        score -= 10
    
    # Homopolymers
    if check_homopolymers(guide):
        score -= 15
    
    # Position 20 (near PAM)
    if guide[19] in 'GC':
        score += 10
    
    # Position 1
    if guide[0] == 'T':
        score -= 5
    
    return max(0, min(100, score)), gc

print("Guide scoring functions ready!")

## 🔍 Find and Score Guides in HBB Gene

In [None]:
def find_pam_sites(dna):
    """Find NGG PAM sequences"""
    pams = []
    for i in range(len(dna) - 2):
        if dna[i:i+3].endswith('GG') and dna[i] in 'ATGC':
            pams.append(i)
    return pams

def find_and_score_guides(gene_sequence, gene_name="Gene"):
    """Find all guides and score them"""
    pam_sites = find_pam_sites(gene_sequence)
    
    guides = []
    for pam_pos in pam_sites:
        if pam_pos >= 20:  # Need 20bp before PAM
            guide = gene_sequence[pam_pos-20:pam_pos]
            pam = gene_sequence[pam_pos:pam_pos+3]
            score, gc = score_guide_quality(guide)
            
            guides.append({
                'guide': guide,
                'pam': pam,
                'position': pam_pos-20,
                'score': score,
                'gc_content': gc,
                'quality': 'High' if score >= 70 else 'Medium' if score >= 50 else 'Low'
            })
    
    # Sort by score
    guides.sort(key=lambda x: x['score'], reverse=True)
    return guides

# Find guides in HBB gene
print("Finding CRISPR guides in HBB gene...\n")
hbb_guides = find_and_score_guides(HBB_GENE, "HBB")

print(f"Found {len(hbb_guides)} potential guides\n")
print("Top 5 Guides for Sickle Cell Treatment:\n" + "="*70)

for i, guide in enumerate(hbb_guides[:5], 1):
    print(f"\nGuide #{i} - {guide['quality']} Quality")
    print(f"  Sequence: 5'-{guide['guide']}-3'")
    print(f"  PAM: {guide['pam']}")
    print(f"  Position: {guide['position']} bp")
    print(f"  Score: {guide['score']:.0f}/100")
    print(f"  GC%: {guide['gc_content']:.1f}%")
    
    # Check if near mutation site
    if 0 <= guide['position'] <= 40:
        print(f"  ⭐ NEAR SICKLE CELL MUTATION SITE")

## 💡 Challenge

The sickle cell mutation is at position ~17. Which guide would you choose to:
1. Target the mutation site?
2. Has the best quality score?
3. Would you recommend for clinical use?

In [None]:
# Your analysis here:
# Which guide is best for treating sickle cell?


---

## 📚 References & Data Sources

**HBB (Beta-globin):**
- Gene sequence: NCBI Gene Database - Gene ID: 3043
- Frangoul et al. (2021). "CRISPR-Cas9 Gene Editing for Sickle Cell Disease and β-Thalassemia." *N Engl J Med*, 384(3), 252-260.

**CRISPR-Cas9 Resources:**
- Jinek et al. (2012). "A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity." *Science*, 337(6096), 816-821.
- Ran et al. (2013). "Genome engineering using the CRISPR-Cas9 system." *Nature Protocols*, 8(11), 2281-2308.
- Doench et al. (2016). "Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9." *Nature Biotechnology*, 34, 184-191.

**Data Access:**
- All gene sequences retrieved from [NCBI Gene Database](https://www.ncbi.nlm.nih.gov/gene)
- Last accessed: January 2026


---

## 🚀 Next Lesson

Ready to continue? Open the next lesson notebook:
**[Lesson 14: Off Target Prediction.Ipynb](lesson14_off_target_prediction.ipynb)**