# üß© Notebook 06: Motif Scaffolding

**Learning Objective**: Design proteins around specific functional motifs using conditional diffusion

## üíª GPU Requirements

**‚ö†Ô∏è GPU Optional but Recommended**
- Conditional generation has similar computational cost to unconditional
- GPU provides 10-100x speedup
- Recommended: T4 GPU or better (available free on Google Colab)

**Running on Google Colab**:
1. Runtime ‚Üí Change runtime type ‚Üí T4 GPU
2. See [colab_gpu_test.ipynb](../../colab_gpu_test.ipynb) to verify GPU

---

## üìö What You'll Learn

1. **Conditional Diffusion** - Generate proteins with constraints
2. **Motif Preservation** - Keep functional sites intact while designing scaffold
3. **Inpainting Strategy** - Fill in missing regions around fixed motifs
4. **GPU-Accelerated Sampling** - Efficient conditional generation
5. **Quality Metrics** - Evaluate motif preservation and scaffold quality

## üéØ Why Motif Scaffolding?

Many protein design tasks require preserving specific structural elements:
- **Binding sites**: Keep residues that interact with ligands/proteins
- **Catalytic sites**: Preserve enzyme active site geometry
- **Epitopes**: Maintain antibody recognition sites
- **Structural motifs**: Zinc fingers, helix-turn-helix, beta-hairpins

RFDiffusion enables **conditional generation** where we:
1. Fix certain residues at their target positions
2. Generate the rest of the protein around them
3. Ensure the scaffold is stable and designable

In [None]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Setup GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## üß¨ Understanding Motif Scaffolding

### The Problem

Imagine you want to design a protein that:
- Binds to a specific target (drug, protein, DNA)
- Has a known binding motif from existing structures
- Needs a stable scaffold to present the motif

**Example**: Design a protein binder using a known antibody CDR loop.

### The Solution: Conditional Diffusion

Instead of generating from pure noise, we:
1. **Start with partial structure**: Motif residues at target positions
2. **Add noise only to scaffold**: Keep motif clean
3. **Denoise conditionally**: Re-fix motif after each step
4. **Result**: Protein with preserved motif and novel scaffold

### Mathematical Framework

Unconditional: $x_0 \sim p(x_0)$

Conditional: $x_0 \sim p(x_0 | \text{motif})$

In practice: **Hard constraint** - just re-copy motif coordinates after each denoising step!

In [None]:
def create_helix_motif(n_residues=8, radius=2.3, rise=1.5):
    """
    Create an ideal alpha-helix motif.
    
    Args:
        n_residues: number of residues in helix
        radius: helix radius in Angstroms
        rise: rise per residue along helix axis
    
    Returns:
        coords: (n_residues, 3) CŒ± coordinates
    """
    # Alpha helix: 3.6 residues per turn
    angles = np.linspace(0, 2*np.pi * (n_residues/3.6), n_residues)
    
    coords = np.zeros((n_residues, 3))
    coords[:, 0] = radius * np.cos(angles)  # X
    coords[:, 1] = radius * np.sin(angles)  # Y
    coords[:, 2] = np.arange(n_residues) * rise  # Z (helix axis)
    
    return coords

def create_beta_strand_motif(n_residues=6, spacing=3.3):
    """
    Create a beta strand motif.
    
    Args:
        n_residues: number of residues
        spacing: distance between residues (√Ö)
    
    Returns:
        coords: (n_residues, 3) CŒ± coordinates
    """
    coords = np.zeros((n_residues, 3))
    # Beta strand is extended, alternating up/down slightly
    coords[:, 0] = np.arange(n_residues) * spacing
    coords[:, 1] = 0.5 * ((-1) ** np.arange(n_residues))  # Slight zigzag
    coords[:, 2] = 0.0
    
    return coords

# Create different motifs to experiment with
helix_motif = create_helix_motif(8)
strand_motif = create_beta_strand_motif(6)

print("Created motifs:")
print(f"  Alpha helix: {helix_motif.shape}")
print(f"  Beta strand: {strand_motif.shape}")

# Visualize motifs
fig = plt.figure(figsize=(14, 6))

ax1 = fig.add_subplot(121, projection='3d')
ax1.plot(helix_motif[:, 0], helix_motif[:, 1], helix_motif[:, 2],
         'o-', linewidth=3, markersize=10, color='red', alpha=0.8)
ax1.set_title('Alpha Helix Motif', fontsize=14, fontweight='bold')
ax1.set_xlabel('X (√Ö)')
ax1.set_ylabel('Y (√Ö)')
ax1.set_zlabel('Z (√Ö)')

ax2 = fig.add_subplot(122, projection='3d')
ax2.plot(strand_motif[:, 0], strand_motif[:, 1], strand_motif[:, 2],
         'o-', linewidth=3, markersize=10, color='blue', alpha=0.8)
ax2.set_title('Beta Strand Motif', fontsize=14, fontweight='bold')
ax2.set_xlabel('X (√Ö)')
ax2.set_ylabel('Y (√Ö)')
ax2.set_zlabel('Z (√Ö)')

plt.tight_layout()
plt.show()

# Check geometry
helix_distances = np.linalg.norm(np.diff(helix_motif, axis=0), axis=1)
strand_distances = np.linalg.norm(np.diff(strand_motif, axis=0), axis=1)

print(f"\nHelix CŒ±-CŒ± distances: {helix_distances.mean():.2f} ¬± {helix_distances.std():.2f} √Ö")
print(f"Strand CŒ±-CŒ± distances: {strand_distances.mean():.2f} ¬± {strand_distances.std():.2f} √Ö")

## üîß Build Conditional Diffusion Model

We'll extend our diffusion model from Notebook 05 to support conditional generation.

In [None]:
class ConditionalRFDiffusion(nn.Module):
    """
    Simplified RFDiffusion model with masking support for conditional generation.
    """
    
    def __init__(self, hidden_dim=128, num_layers=4):
        super().__init__()
        self.hidden_dim = hidden_dim
        
        # Embed timestep
        self.time_embed = nn.Sequential(
            nn.Linear(1, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim)
        )
        
        # Embed coordinates + mask indicator
        self.coord_embed = nn.Linear(4, hidden_dim)  # 3 coords + 1 mask bit
        
        # Transformer layers
        self.layers = nn.ModuleList([
            nn.TransformerEncoderLayer(
                d_model=hidden_dim,
                nhead=4,
                dim_feedforward=hidden_dim*4,
                batch_first=True
            )
            for _ in range(num_layers)
        ])
        
        # Output head
        self.coord_out = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 3)
        )
    
    def forward(self, coords, t, mask=None):
        """
        Args:
            coords: (batch, n_res, 3)
            t: (batch,) - timestep
            mask: (batch, n_res) - 1 for motif residues, 0 for scaffold
        
        Returns:
            coord_updates: (batch, n_res, 3)
        """
        batch_size, n_res, _ = coords.shape
        
        if mask is None:
            mask = torch.zeros(batch_size, n_res, device=coords.device)
        
        # Embed timestep
        t_embed = self.time_embed(t.view(-1, 1))
        t_embed = t_embed.unsqueeze(1).expand(-1, n_res, -1)
        
        # Concatenate coordinates with mask
        mask_expanded = mask.unsqueeze(-1)  # (batch, n_res, 1)
        coords_masked = torch.cat([coords, mask_expanded], dim=-1)  # (batch, n_res, 4)
        
        # Embed
        coord_feat = self.coord_embed(coords_masked)
        
        # Combine with time
        x = coord_feat + t_embed
        
        # Apply transformer layers
        for layer in self.layers:
            x = layer(x)
        
        # Predict updates
        coord_updates = self.coord_out(x)
        
        return coord_updates

# Initialize model on GPU
model = ConditionalRFDiffusion(hidden_dim=128, num_layers=4).to(device)

print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Model device: {next(model.parameters()).device}")

# Test forward pass
test_coords = torch.randn(2, 30, 3, device=device)
test_t = torch.rand(2, device=device)
test_mask = torch.zeros(2, 30, device=device)
test_mask[:, 10:15] = 1.0  # Residues 10-14 are motif

with torch.no_grad():
    output = model(test_coords, test_t, test_mask)
    print(f"\nTest forward pass:")
    print(f"  Input shape: {test_coords.shape}")
    print(f"  Output shape: {output.shape}")
    print(f"  Output range: [{output.min():.2f}, {output.max():.2f}]")

## üé® Conditional Sampling Algorithm

The key difference from unconditional generation: **re-fix motif coordinates** after each denoising step.

In [None]:
class ConditionalDiffusionProcess:
    """Diffusion process with hard constraints for motif preservation."""
    
    def __init__(self, num_timesteps=100, device='cpu'):
        self.num_timesteps = num_timesteps
        self.device = device
        
        # Cosine schedule
        self.betas = self._cosine_beta_schedule(num_timesteps).to(device)
        self.alphas = 1.0 - self.betas
        self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)
        
        print(f"Diffusion process on device: {self.alphas_cumprod.device}")
    
    def _cosine_beta_schedule(self, timesteps, s=0.008):
        """Cosine schedule from Improved DDPM."""
        steps = timesteps + 1
        x = torch.linspace(0, timesteps, steps)
        alphas_cumprod = torch.cos(((x / timesteps) + s) / (1 + s) * np.pi * 0.5) ** 2
        alphas_cumprod = alphas_cumprod / alphas_cumprod[0]
        betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
        return torch.clip(betas, 0.0001, 0.9999)
    
    @torch.no_grad()
    def denoise_step(self, model, xt, t_idx, motif_coords=None, motif_indices=None, mask=None):
        """
        Single reverse diffusion step with motif constraints.
        
        Args:
            model: denoising model
            xt: (batch, n_res, 3) current noisy coordinates
            t_idx: timestep index
            motif_coords: (n_motif, 3) target motif coordinates
            motif_indices: (n_motif,) indices where motif should be fixed
            mask: (batch, n_res) mask indicating motif residues
        
        Returns:
            x_prev: coordinates at previous timestep with motif fixed
        """
        batch_size = xt.shape[0]
        
        # Timestep as continuous value
        t = torch.full((batch_size,), t_idx / self.num_timesteps, device=self.device)
        
        # Predict noise
        predicted_noise = model(xt, t, mask=mask)
        
        # Compute x_0 prediction
        alpha_t = self.alphas_cumprod[t_idx]
        x0_pred = (xt - torch.sqrt(1 - alpha_t) * predicted_noise) / torch.sqrt(alpha_t)
        
        # Compute x_{t-1}
        if t_idx > 0:
            alpha_prev = self.alphas_cumprod[t_idx - 1]
            beta_t = self.betas[t_idx]
            
            # Posterior mean
            x_prev = torch.sqrt(alpha_prev) * x0_pred + \
                     torch.sqrt(1 - alpha_prev - beta_t) * predicted_noise
            
            # Add noise
            noise = torch.randn_like(xt) * torch.sqrt(beta_t)
            x_prev = x_prev + noise
        else:
            x_prev = x0_pred
        
        # CRITICAL: Re-fix motif coordinates
        if motif_coords is not None and motif_indices is not None:
            x_prev[:, motif_indices] = motif_coords.unsqueeze(0).expand(batch_size, -1, -1)
        
        return x_prev
    
    @torch.no_grad()
    def conditional_sample(self, model, n_residues, motif_coords, motif_indices):
        """
        Generate protein with fixed motif using inpainting strategy.
        
        Args:
            model: trained diffusion model
            n_residues: total number of residues
            motif_coords: (n_motif, 3) motif coordinates to preserve
            motif_indices: (n_motif,) where to place motif
        
        Returns:
            final_coords: (n_residues, 3) generated coordinates
            trajectory: list of intermediate structures
        """
        # Initialize: random noise everywhere, then place motif
        xt = torch.randn(1, n_residues, 3, device=self.device) * 10.0
        xt[0, motif_indices] = motif_coords
        
        # Create mask
        mask = torch.zeros(1, n_residues, device=self.device)
        mask[0, motif_indices] = 1.0
        
        trajectory = [xt[0].cpu().numpy().copy()]
        
        print(f"Conditional sampling: {n_residues} residues, {len(motif_indices)} fixed")
        
        # Reverse diffusion with constraint
        for t in range(self.num_timesteps - 1, -1, -1):
            if t % 20 == 0:
                print(f"  Step {self.num_timesteps - t}/{self.num_timesteps}")
            
            xt = self.denoise_step(model, xt, t, motif_coords, motif_indices, mask)
            
            # Save snapshots
            if t % 10 == 0:
                trajectory.append(xt[0].cpu().numpy().copy())
        
        print("‚úÖ Conditional sampling complete!")
        
        return xt[0].cpu().numpy(), trajectory

# Initialize diffusion process
diffusion = ConditionalDiffusionProcess(num_timesteps=100, device=device)
print(f"Ready for conditional generation with {diffusion.num_timesteps} steps")

## üöÄ Generate Protein with Helix Motif

Now let's scaffold a protein around our alpha-helix motif!

In [None]:
# Setup parameters
total_residues = 50
motif_start_idx = 20  # Place motif in the middle

# Use helix motif
motif_coords_torch = torch.from_numpy(helix_motif).float().to(device)
motif_indices_torch = torch.arange(motif_start_idx, motif_start_idx + len(helix_motif), device=device)

print(f"Generating {total_residues}-residue protein:")
print(f"  Motif: {len(helix_motif)} residues at positions {motif_indices_torch.cpu().numpy()}")
print(f"  Scaffold: {total_residues - len(helix_motif)} residues to design")

import time
start_time = time.time()

# Generate!
coords_scaffolded, trajectory = diffusion.conditional_sample(
    model,
    n_residues=total_residues,
    motif_coords=motif_coords_torch,
    motif_indices=motif_indices_torch
)

elapsed = time.time() - start_time
print(f"\n‚è±Ô∏è  Generation time: {elapsed:.2f}s")
print(f"   ({elapsed/total_residues:.3f}s per residue)")
print(f"\nGenerated structure shape: {coords_scaffolded.shape}")

# Verify motif preservation
motif_final = coords_scaffolded[motif_indices_torch.cpu().numpy()]
motif_rmsd = np.sqrt(np.mean((motif_final - helix_motif)**2))
print(f"Motif RMSD: {motif_rmsd:.6f} √Ö (should be ~0.0 for perfect preservation)")

## üìä Visualize Scaffolded Protein

In [None]:
def visualize_scaffolded_protein(coords, motif_indices):
    """Visualize protein with motif highlighted."""
    fig = plt.figure(figsize=(16, 6))
    
    # 3D structure
    ax1 = fig.add_subplot(131, projection='3d')
    
    # Create mask for scaffold vs motif
    scaffold_mask = np.ones(len(coords), dtype=bool)
    scaffold_mask[motif_indices] = False
    
    # Plot scaffold
    scaffold_coords = coords[scaffold_mask]
    if len(scaffold_coords) > 0:
        ax1.plot(scaffold_coords[:, 0], scaffold_coords[:, 1], scaffold_coords[:, 2],
                'o-', linewidth=2, markersize=5, color='#2E86AB', alpha=0.6, label='Scaffold')
    
    # Plot motif
    motif_coords_viz = coords[motif_indices]
    ax1.plot(motif_coords_viz[:, 0], motif_coords_viz[:, 1], motif_coords_viz[:, 2],
            'o-', linewidth=4, markersize=10, color='#E63946', alpha=0.9, label='Motif')
    
    ax1.set_xlabel('X (√Ö)', fontsize=11)
    ax1.set_ylabel('Y (√Ö)', fontsize=11)
    ax1.set_zlabel('Z (√Ö)', fontsize=11)
    ax1.set_title('Scaffolded Protein', fontsize=13, fontweight='bold')
    ax1.legend(fontsize=10)
    ax1.grid(True, alpha=0.3)
    
    # Distance distribution
    ax2 = fig.add_subplot(132)
    distances = np.linalg.norm(np.diff(coords, axis=0), axis=1)
    
    ax2.hist(distances, bins=20, edgecolor='black', alpha=0.7, color='#457B9D')
    ax2.axvline(3.8, color='red', linestyle='--', linewidth=2, label='Ideal CŒ±-CŒ± (3.8√Ö)')
    ax2.set_xlabel('CŒ±-CŒ± Distance (√Ö)', fontsize=11)
    ax2.set_ylabel('Count', fontsize=11)
    ax2.set_title(f'Bond Lengths (Œº={distances.mean():.2f}√Ö)', fontsize=13, fontweight='bold')
    ax2.legend(fontsize=10)
    ax2.grid(True, alpha=0.3)
    
    # Residue-wise deviation from ideal
    ax3 = fig.add_subplot(133)
    deviation = np.abs(distances - 3.8)
    x_pos = np.arange(len(deviation))
    
    # Color code: scaffold vs motif regions
    colors = ['#E63946' if (i in motif_indices or i+1 in motif_indices) else '#2E86AB' 
              for i in range(len(deviation))]
    
    ax3.bar(x_pos, deviation, color=colors, alpha=0.7, edgecolor='black', linewidth=0.5)
    ax3.axhline(0.5, color='orange', linestyle='--', linewidth=2, label='Warning (>0.5√Ö)')
    ax3.set_xlabel('Residue Index', fontsize=11)
    ax3.set_ylabel('|Distance - 3.8√Ö|', fontsize=11)
    ax3.set_title('Geometry Deviation', fontsize=13, fontweight='bold')
    ax3.legend(fontsize=10)
    ax3.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.show()

# Visualize the result
visualize_scaffolded_protein(coords_scaffolded, motif_indices_torch.cpu().numpy())

# Quality metrics
distances = np.linalg.norm(np.diff(coords_scaffolded, axis=0), axis=1)
print(f"\nüìä Quality Metrics:")
print(f"   Mean CŒ±-CŒ± distance: {distances.mean():.3f} √Ö (ideal: 3.8 √Ö)")
print(f"   Std CŒ±-CŒ± distance:  {distances.std():.3f} √Ö")
print(f"   Min distance: {distances.min():.3f} √Ö")
print(f"   Max distance: {distances.max():.3f} √Ö")
print(f"   Motif RMSD: {motif_rmsd:.6f} √Ö")

## üé¨ Visualize Generation Process

Watch how the protein emerges around the fixed motif!

In [None]:
def visualize_trajectory(trajectory, motif_indices, n_frames=6):
    """Show how protein emerges from noise with fixed motif."""
    indices = np.linspace(0, len(trajectory)-1, n_frames, dtype=int)
    
    fig = plt.figure(figsize=(18, 3))
    
    for i, idx in enumerate(indices):
        coords = trajectory[idx]
        ax = fig.add_subplot(1, n_frames, i+1, projection='3d')
        
        # Create mask
        scaffold_mask = np.ones(len(coords), dtype=bool)
        scaffold_mask[motif_indices] = False
        
        # Plot scaffold
        scaffold = coords[scaffold_mask]
        if len(scaffold) > 0:
            ax.plot(scaffold[:, 0], scaffold[:, 1], scaffold[:, 2],
                   'o-', linewidth=1.5, markersize=4, color='lightblue', alpha=0.6)
        
        # Plot motif (always fixed)
        motif = coords[motif_indices]
        ax.plot(motif[:, 0], motif[:, 1], motif[:, 2],
               'o-', linewidth=3, markersize=8, color='red', alpha=0.9)
        
        # Title with step number
        step = (len(trajectory) - 1 - idx) * 10
        ax.set_title(f'Step {step}/{diffusion.num_timesteps}', fontweight='bold', fontsize=11)
        
        # Consistent scale
        all_coords = coords
        margin = 5
        ax.set_xlim(all_coords[:, 0].min()-margin, all_coords[:, 0].max()+margin)
        ax.set_ylim(all_coords[:, 1].min()-margin, all_coords[:, 1].max()+margin)
        ax.set_zlim(all_coords[:, 2].min()-margin, all_coords[:, 2].max()+margin)
        
        # Clean up axes
        ax.set_xticks([])
        ax.set_yticks([])
        ax.set_zticks([])
    
    plt.tight_layout()
    plt.show()

# Visualize the generation trajectory
print("Generation trajectory (motif in red stays fixed, scaffold emerges from noise):")
visualize_trajectory(trajectory, motif_indices_torch.cpu().numpy(), n_frames=6)

print("\nüí° Key Observation:")
print("   - Motif (red) remains perfectly fixed throughout")
print("   - Scaffold (blue) gradually denoises around the motif")
print("   - Final structure has stable geometry")

## üî¨ Different Motif Placement Strategies

Let's explore how motif position affects the scaffold design.

In [None]:
# Compare motif at different positions
positions = {
    'N-terminal': (2, 10),      # Near start
    'Middle': (20, 28),          # Center
    'C-terminal': (40, 48)       # Near end
}

results = {}

for name, (start, end) in positions.items():
    print(f"\nGenerating with motif at {name} (positions {start}-{end})...")
    
    motif_idx = torch.arange(start, end, device=device)
    coords, _ = diffusion.conditional_sample(
        model,
        n_residues=total_residues,
        motif_coords=motif_coords_torch,
        motif_indices=motif_idx
    )
    
    results[name] = (coords, motif_idx.cpu().numpy())

# Visualize all three
fig = plt.figure(figsize=(18, 5))

for i, (name, (coords, motif_idx)) in enumerate(results.items()):
    ax = fig.add_subplot(1, 3, i+1, projection='3d')
    
    # Scaffold
    scaffold_mask = np.ones(len(coords), dtype=bool)
    scaffold_mask[motif_idx] = False
    scaffold = coords[scaffold_mask]
    
    if len(scaffold) > 0:
        ax.plot(scaffold[:, 0], scaffold[:, 1], scaffold[:, 2],
               'o-', linewidth=2, markersize=5, color='#2E86AB', alpha=0.6, label='Scaffold')
    
    # Motif
    motif = coords[motif_idx]
    ax.plot(motif[:, 0], motif[:, 1], motif[:, 2],
           'o-', linewidth=4, markersize=10, color='#E63946', alpha=0.9, label='Motif')
    
    ax.set_title(f'Motif at {name}', fontsize=13, fontweight='bold')
    ax.set_xlabel('X (√Ö)')
    ax.set_ylabel('Y (√Ö)')
    ax.set_zlabel('Z (√Ö)')
    ax.legend()

plt.tight_layout()
plt.show()

print("\nüí° Observations:")
print("   - N-terminal: More scaffold C-terminal to motif")
print("   - Middle: Scaffold on both sides of motif")
print("   - C-terminal: More scaffold N-terminal to motif")
print("   - Model has context from neighboring regions")

## üîë Key Takeaways

### What We Learned

1. **Conditional Diffusion** üéØ
   - Generate proteins with specific constraints
   - Fix motif residues while designing scaffold
   - Hard constraints via coordinate re-assignment

2. **Inpainting Strategy** üé®
   - Start with motif at target position
   - Add noise only to scaffold regions
   - Denoise while keeping motif fixed

3. **GPU Acceleration** ‚ö°
   - Same benefits as unconditional generation
   - All tensor operations on GPU
   - 10-100x speedup for inference

4. **Quality Metrics** üìä
   - Motif RMSD (should be ~0 for perfect preservation)
   - Bond length distribution
   - Geometry validation

5. **Practical Considerations** üîß
   - Motif position matters (affects scaffold context)
   - Longer scaffolds need more denoising steps
   - Model needs to learn realistic protein geometry

### Real-World Applications

- **Protein Binders**: Design binders to target proteins using known CDR loops
- **Enzyme Design**: Create new enzymes with specific active sites
- **Vaccine Design**: Present epitopes on stable scaffolds
- **Therapeutic Proteins**: Engineer proteins with desired binding properties

### Limitations of This Demo

‚ö†Ô∏è **This model is untrained** - it demonstrates the algorithm but won't generate realistic proteins. A real implementation needs:
- Training on thousands of protein structures
- Proper SE(3) equivariant architecture (full IPA)
- Energy-based guidance for realistic geometries
- Multiple sampling runs with selection criteria

---

## üéØ Practice Exercises

### Exercise 1: Different Motifs (Easy)
Try scaffolding the beta-strand motif instead of the helix:
```python
# Use strand_motif defined earlier
strand_indices = torch.arange(15, 15+len(strand_motif), device=device)
strand_coords_torch = torch.from_numpy(strand_motif).float().to(device)

coords_strand, _ = diffusion.conditional_sample(
    model, 40, strand_coords_torch, strand_indices
)
```

### Exercise 2: Multiple Motifs (Medium)
Scaffold a protein with TWO fixed motifs:
```python
# Create two motifs far apart
motif1_coords = helix_motif[:5]
motif2_coords = helix_motif[:5] + np.array([20, 0, 0])

# Combine indices and coordinates
motif_indices = torch.cat([
    torch.arange(5, 10, device=device),
    torch.arange(30, 35, device=device)
])
motif_coords = torch.cat([
    torch.from_numpy(motif1_coords).float().to(device),
    torch.from_numpy(motif2_coords).float().to(device)
])
```

### Exercise 3: Motif Size Analysis (Medium)
How does motif size affect scaffold quality?
- Try motifs of different lengths (4, 8, 12, 16 residues)
- Measure geometry quality for each
- Plot motif size vs. bond length deviation

### Exercise 4: GPU Benchmarking (Medium)
Compare generation time on CPU vs GPU:
```python
# CPU
model_cpu = model.cpu()
diffusion_cpu = ConditionalDiffusionProcess(100, 'cpu')
# Time generation...

# GPU
model_gpu = model.to(device)
diffusion_gpu = ConditionalDiffusionProcess(100, device)
# Time generation...

# Calculate speedup
```

### Exercise 5: Custom Motif (Hard)
Create a custom motif from a real PDB structure:
1. Download a PDB file
2. Extract specific residues (e.g., binding site)
3. Center and align the motif
4. Scaffold around it

---

## üìö Further Reading

### Papers

1. **RFDiffusion Original Paper**  
   Watson et al., 2022 - "De novo design of protein structure and function with RFdiffusion"  
   [Nature](https://www.nature.com/articles/s41586-023-06415-8)

2. **Inpainting with Diffusion Models**  
   Lugmayr et al., 2022 - "RePaint: Inpainting using Denoising Diffusion Probabilistic Models"  
   [arXiv:2201.09865](https://arxiv.org/abs/2201.09865)

3. **Motif Scaffolding Applications**  
   Tischer et al., 2020 - "Design of proteins presenting discontinuous functional sites"  
   [Science](https://www.science.org/doi/10.1126/science.aay6785)

### Related Concepts

- **Classifier-Free Guidance**: Alternative to hard constraints
- **Partial Noising**: Add noise only to scaffold regions
- **Iterative Refinement**: Multiple rounds of generation

---

## ‚û°Ô∏è Next Steps

**Notebook 07: Symmetric Design**  
Learn to generate symmetric protein assemblies (dimers, trimers, etc.) with GPU-accelerated sampling.

**Key Difference**: Symmetric design uses:
- Symmetry operators (rotations/translations)
- Single-chain generation + symmetry application
- Special loss functions for interface design

---

## üí≠ Reflection Questions

1. Why do we re-fix the motif after each denoising step instead of just initializing it once?

2. How would you handle a motif that's flexible (multiple conformations)?

3. What happens if the motif geometry is impossible to scaffold (e.g., residues too far apart)?

4. How could you encourage specific secondary structure in the scaffold regions?

5. What quality metrics beyond bond lengths would indicate a good scaffold design?