[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/engelberger/tutorials-ai4pd-2025/blob/main/tutorial_alphafold2_i89_conformations_v2.ipynb)

# Tutorial: Prediction of Protein Structures and Multiple Conformations using AlphaFold2

## Clean Implementation Using AF2 Utils Package

**Duration:** 90 minutes  
**Instructor:** Felipe Engelberger  
**Date:** AI4PD Workshop 2025

---

## Learning Objectives

By the end of this tutorial, you will understand:

1. **MSA's role in conformation selection**: How evolutionary information biases AlphaFold2 predictions
2. **Recycling mechanics**: How iterative refinement affects structure quality and conformation
3. **Conformational sampling strategies**: Practical techniques using dropout and MSA subsampling
4. **Structure analysis tools**: RMSD calculations, visualization, and ensemble analysis
5. **Real-world applications**: When and how to apply these techniques to proteins of interest

## Tutorial Overview

We'll use the **i89 protein** as our model system. This 96-residue protein exhibits distinct conformational states that AlphaFold2 can capture through different prediction strategies:

- **State 1**: The conformation typically predicted with full MSA
- **State 2**: An alternative conformation accessible without MSA

We have experimental structures for both states (`state1.pdb` and `state2.pdb`) for validation.


## Section 1: Environment Setup

First, let's set up our environment with the AF2 Utils package that provides a clean wrapper around ColabDesign.


In [None]:
%%time
#@title Install Dependencies and Import AF2 Utils
#@markdown This cell handles all setup automatically

import os
import sys
import warnings
warnings.filterwarnings('ignore')

# Check if running in Colab
IN_COLAB = 'google.colab' in sys.modules

print("="*60)
print("ALPHAFOLD2 TUTORIAL SETUP")
print("="*60)

# Download af2_utils.py if not present
if not os.path.exists("af2_utils.py"):
    print("\nDownloading af2_utils.py...")
    os.system("wget -q https://raw.githubusercontent.com/engelberger/tutorials-ai4pd-2025/main/af2_utils.py")
    print("  - af2_utils.py downloaded")

# Download logmd_utils.py if not present
if not os.path.exists("logmd_utils.py"):
    print("\nDownloading logmd_utils.py...")
    os.system("wget -q https://raw.githubusercontent.com/engelberger/tutorials-ai4pd-2025/main/logmd_utils.py")
    print("  - logmd_utils.py downloaded")

# Import af2_utils
print("\nImporting AF2 Utils...")
import af2_utils as af2
print(f"  - AF2 Utils v{af2.__version__} loaded")

# Check installation status
print("\nChecking dependencies...")
status = af2.check_installation(verbose=False)
for component, installed in status.items():
    symbol = "+" if installed else "-"
    print(f"  {symbol} {component}: {'ready' if installed else 'missing'}")

# Check LogMD availability
print("\nChecking LogMD availability...")
logmd_available = af2.check_logmd()
if logmd_available:
    print("  + LogMD: available for interactive 3D visualization")
else:
    print("  - LogMD: not available (optional)")
    print("    Install with: pip install logmd")
    print("    Tutorial works without LogMD, but you'll miss interactive features!")

# Install missing dependencies if needed
missing = [k for k, v in status.items() if not v and k != 'environment_setup']
if missing:
    print(f"\nInstalling missing dependencies...")
    af2.install_dependencies(
        install_colabdesign='colabdesign' in missing,
        install_hhsuite='hhsuite' in missing,
        download_params='alphafold_params' in missing,
        verbose=True
    )

# Setup environment
print("\nConfiguring environment...")
af2.setup_environment(verbose=False)
print("  - JAX memory and environment configured")

print("\n" + "="*60)
print("SETUP COMPLETE - Ready for predictions!")
print("="*60)

In [None]:
#@title Import Additional Libraries
import numpy as np
import matplotlib.pyplot as plt
from Bio import PDB
from pathlib import Path
import json

print("Libraries imported successfully")


## Section 2: The i89 Protein - Our Model System

The i89 protein is a 96-residue protein that can adopt multiple conformational states. We'll use it to demonstrate how AlphaFold2's predictions can be influenced by MSA depth, recycling, and sampling parameters.


In [None]:
#@title Define i89 Sequence and Load Reference Structures

# i89 protein sequence (96 residues)
I89_SEQUENCE = "GSHMASMEDLQAEARAFLSEEMIAEFKAAFDMFDADGGGDISYKAVGTVFRMLGINPSKEVLDYLKEKIDVDGSGTIDFEEFLVLMVYIMKQDA"

print("i89 protein statistics:")
print(f"  Length: {len(I89_SEQUENCE)} residues")
print(f"  Sequence: {I89_SEQUENCE[:30]}...{I89_SEQUENCE[-20:]}")

# Check if reference structures exist, download if needed
if not os.path.exists("state1.pdb") or not os.path.exists("state2.pdb"):
    print("\nDownloading reference structures...")
    os.system("wget -q https://raw.githubusercontent.com/engelberger/tutorials-ai4pd-2025/main/state1.pdb")
    os.system("wget -q https://raw.githubusercontent.com/engelberger/tutorials-ai4pd-2025/main/state2.pdb")
    print("  - Reference structures downloaded")
else:
    print("\nReference structures found:")
    print("  - state1.pdb: Conformation typically predicted with MSA")
    print("  - state2.pdb: Alternative conformation accessible without MSA")

# Calculate RMSD between reference states
state1_coords = af2.load_pdb_coords("state1.pdb")
state2_coords = af2.load_pdb_coords("state2.pdb")
ref_rmsd = af2.calculate_rmsd(state1_coords, state2_coords)

print(f"\nRMSD between reference states: {ref_rmsd:.2f} Angstrom")
print("This indicates significant conformational difference!")


## Section 3: Basic Prediction with Full MSA

Let's start by predicting the i89 structure with a full MSA. This typically results in a conformation closer to State 1.


In [None]:
%%time
#@title Quick Prediction with Full MSA
#@markdown Using af2_utils high-level API for simple prediction

print("="*60)
print("PREDICTION WITH FULL MSA")
print("="*60)

# Use the high-level quick_predict function
result_with_msa = af2.quick_predict(
    sequence=I89_SEQUENCE,
    msa_mode="mmseqs2",  # Full MSA from MMseqs2
    num_recycles=3,
    jobname="i89_with_msa",
    verbose=True
)

# Calculate RMSD to reference states
pred_ca = result_with_msa['structure'][:, 1, :]  # CA atoms
rmsd_state1 = af2.calculate_rmsd(pred_ca, state1_coords)
rmsd_state2 = af2.calculate_rmsd(pred_ca, state2_coords)

print("\n" + "="*60)
print("RESULTS")
print("="*60)
print(f"RMSD to State 1: {rmsd_state1:.2f} Angstrom")
print(f"RMSD to State 2: {rmsd_state2:.2f} Angstrom")
print(f"Mean pLDDT: {result_with_msa['metrics']['plddt']*100:.1f}%")

if rmsd_state1 < rmsd_state2:
    print(f"\nPrediction is closer to State 1 (as expected with MSA)")
    print(f"Delta: {rmsd_state2 - rmsd_state1:.2f} Angstrom difference")
else:
    print(f"\nPrediction is closer to State 2")
    print(f"Delta: {rmsd_state1 - rmsd_state2:.2f} Angstrom difference")


In [None]:
#@title Visualize Structure and Confidence

# Plot 3D structure with pLDDT coloring
fig = af2.plot_3d_structure(
    atom_positions=result_with_msa['structure'],
    plddt=result_with_msa['plddt'],
    save_path="i89_with_msa_structure.png",
    show=True
)

# Plot confidence metrics
fig = af2.plot_confidence(
    plddt=result_with_msa['plddt'] * 100,
    pae=result_with_msa['pae'],
    save_path="i89_with_msa_confidence.png",
    show=True
)


In [None]:
#@title Interactive 3D Visualization (Optional - Requires LogMD)

if af2.check_logmd():
    print("Creating interactive 3D visualization...")
    print("This allows you to rotate, zoom, and explore the structure!\n")
    
    # Create simple trajectory with just the final structure
    traj = af2.create_trajectory_from_ensemble(
        predictions=[result_with_msa],
        sequence=I89_SEQUENCE,
        project="i89_with_msa_3d",
        align_structures=False,
        verbose=False
    )
    
    if traj:
        import logmd_utils
        logmd_utils.display_trajectory_in_notebook(traj)
        print("\nInteractive viewer features:")
        print("  - Left click + drag: Rotate structure")
        print("  - Scroll: Zoom in/out")
        print("  - Colors show pLDDT confidence (blue=high, red=low)")
    else:
        print("Failed to create visualization")
else:
    print("LogMD not available - skipping interactive 3D visualization")
    print("The static plots above show the structure and confidence metrics")
    print("\nTo enable interactive 3D:")
    print("  1. Run: !pip install logmd")
    print("  2. Restart kernel")
    print("  3. Re-run from the beginning")


## Section 4: MSA Manipulation - Exploring Conformational Control

Now let's see how removing MSA information affects the predicted conformation. Without MSA, AlphaFold2 relies more on learned structural patterns.


In [None]:
%%time
#@title Prediction without MSA (Single Sequence)

print("="*60)
print("PREDICTION WITHOUT MSA")
print("="*60)

# Predict with single sequence only
result_no_msa = af2.quick_predict(
    sequence=I89_SEQUENCE,
    msa_mode="single_sequence",  # No evolutionary information
    num_recycles=3,
    jobname="i89_no_msa",
    verbose=True
)

# Calculate RMSD to reference states
pred_ca_no_msa = result_no_msa['structure'][:, 1, :]
rmsd_state1_no_msa = af2.calculate_rmsd(pred_ca_no_msa, state1_coords)
rmsd_state2_no_msa = af2.calculate_rmsd(pred_ca_no_msa, state2_coords)

print("\n" + "="*60)
print("RESULTS WITHOUT MSA")
print("="*60)
print(f"RMSD to State 1: {rmsd_state1_no_msa:.2f} Angstrom")
print(f"RMSD to State 2: {rmsd_state2_no_msa:.2f} Angstrom")
print(f"Mean pLDDT: {result_no_msa['metrics']['plddt']*100:.1f}%")

if rmsd_state2_no_msa < rmsd_state1_no_msa:
    print(f"\nPrediction is closer to State 2 (as expected without MSA)")
    print(f"Delta: {rmsd_state1_no_msa - rmsd_state2_no_msa:.2f} Angstrom difference")
else:
    print(f"\nPrediction is closer to State 1")
    print(f"Delta: {rmsd_state2_no_msa - rmsd_state1_no_msa:.2f} Angstrom difference")


In [None]:
#@title Compare Both Predictions

# Prepare comparison data
comparison_results = [
    {'rmsd_state1': rmsd_state1, 'rmsd_state2': rmsd_state2},
    {'rmsd_state1': rmsd_state1_no_msa, 'rmsd_state2': rmsd_state2_no_msa}
]

# Create comparison plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# RMSD to State 1
labels = ['With MSA', 'Without MSA']
rmsd1_values = [r['rmsd_state1'] for r in comparison_results]
ax1.bar(labels, rmsd1_values, color='steelblue')
ax1.set_ylabel('RMSD (Å)')
ax1.set_title('RMSD to State 1')
ax1.set_ylim(0, max(rmsd1_values + [r['rmsd_state2'] for r in comparison_results]) * 1.2)

# RMSD to State 2
rmsd2_values = [r['rmsd_state2'] for r in comparison_results]
ax2.bar(labels, rmsd2_values, color='coral')
ax2.set_ylabel('RMSD (Å)')
ax2.set_title('RMSD to State 2')
ax2.set_ylim(0, max(rmsd1_values + rmsd2_values) * 1.2)

plt.suptitle('MSA Effect on Conformational Preference', fontsize=14)
plt.tight_layout()
plt.show()

print("\nKey Finding:")
print("MSA presence/absence can switch the predicted conformation!")
print(f"Conformational shift: {abs(rmsd_state1 - rmsd_state1_no_msa):.1f} Angstrom")


In [None]:
#@title Interactive 3D Comparison: With vs Without MSA
#@markdown Compare both conformations side-by-side in 3D

if af2.check_logmd():
    print("Creating side-by-side 3D visualizations...")
    print("You'll get two URLs to open in separate browser tabs!\n")
    
    # Trajectory 1: With MSA (should be closer to State 1)
    traj_with = af2.create_trajectory_from_ensemble(
        predictions=[result_with_msa],
        sequence=I89_SEQUENCE,
        project="i89_with_msa_final",
        sort_by_rmsd=True,
        reference_coords=state1_coords,
        verbose=False
    )
    
    # Trajectory 2: Without MSA (should be closer to State 2)
    traj_without = af2.create_trajectory_from_ensemble(
        predictions=[result_no_msa],
        sequence=I89_SEQUENCE,
        project="i89_without_msa_final",
        sort_by_rmsd=True,
        reference_coords=state2_coords,
        verbose=False
    )
    
    print("="*60)
    print("SIDE-BY-SIDE COMPARISON URLS")
    print("="*60)
    if traj_with:
        print(f"\nWith MSA (State 1-like):")
        print(f"  {traj_with.url}")
    
    if traj_without:
        print(f"\nWithout MSA (State 2-like):")
        print(f"  {traj_without.url}")
    
    print("\n" + "="*60)
    print("HOW TO COMPARE")
    print("="*60)
    print("1. Open both URLs in separate browser tabs")
    print("2. Arrange windows side-by-side")
    print("3. Rotate both structures to same orientation")
    print("4. Notice the conformational differences!")
    print("\nKey differences to look for:")
    print("  - Overall fold compactness")
    print("  - Loop positions and orientations")
    print("  - Domain arrangements")
else:
    print("LogMD not available - use static plots above for comparison")
    print("\nThe bar charts show RMSD differences quantitatively")
    print("Install LogMD for interactive 3D comparison!")


## Section 5: Recycling for Conformational Refinement

Recycling is AlphaFold2's iterative refinement process. Let's explore how the number of recycles affects structure quality and conformational preference.


## Section 5.5: Real-time Structure Visualization with LogMD

LogMD provides interactive 3D visualization of structures as they evolve during prediction. This makes it easy to see how recycling refines the structure and how different conditions affect the final conformation.


In [None]:
#@title Check LogMD Availability

# Check if LogMD is available
if af2.check_logmd():
    print("LogMD is available!")
    print("  - Interactive 3D visualization enabled")
    print("  - Real-time trajectory creation supported")
else:
    print("LogMD not available. Installing...")
    print("  Run: !pip install logmd")
    print("\nLogMD provides:")
    print("  - Interactive 3D structure viewer")
    print("  - Real-time visualization during prediction")
    print("  - Trajectory creation from ensembles")
    print("\nAfter installation, restart the kernel to use LogMD features.")


In [None]:
%%time
#@title Visualize Recycling Evolution with LogMD
#@markdown Watch how the structure refines through iterative recycling

if af2.check_logmd():
    print("="*60)
    print("RECYCLING EVOLUTION WITH REAL-TIME VISUALIZATION")
    print("="*60)
    print("Watch the structure refine through recycling iterations...")
    print("This demonstrates AlphaFold2's iterative refinement process!\n")
    
    # Run prediction with LogMD to capture every recycle
    result_logmd = af2.predict_with_logmd(
        sequence=I89_SEQUENCE,
        msa_mode="single_sequence",  # Use single sequence for faster demo
        num_recycles=6,
        project="i89_recycling_evolution",
        show_viewer=True,
        verbose=True
    )
    
    print("\n" + "="*60)
    print("RECYCLING INSIGHTS")
    print("="*60)
    print(f"Final pLDDT: {result_logmd['metrics']['plddt']*100:.1f}%")
    print(f"Total recycles: {len(result_logmd['all_structures'])}")
    print("\nWhat you're seeing:")
    print("  - Recycles 0-2: Large conformational changes")
    print("  - Recycles 3-4: Fine-tuning and refinement")
    print("  - Recycles 5-6: Convergence (minimal changes)")
    print("\nInteractive viewer controls:")
    print("  - Mouse drag: Rotate structure")
    print("  - Scroll: Zoom in/out")
    print("  - Play button: Animate through recycles")
    print("  - Slider: Jump to specific recycle")
    print("\nNotice how:")
    print("  - Early recycles show major structural rearrangements")
    print("  - Later recycles show convergence")
    print("  - pLDDT confidence improves (colors get bluer)")
else:
    print("LogMD not available - falling back to standard recycling analysis")
    print("\nFor real-time visualization:")
    print("  1. Install LogMD: !pip install logmd")
    print("  2. Restart kernel")
    print("  3. Re-run from Section 1")
    print("\nContinuing with quantitative analysis...")


In [None]:
%%time
#@title Test Recycling with Early Stopping

print("="*60)
print("TESTING RECYCLING WITH EARLY STOPPING")
print("="*60)

# Setup model
model = af2.setup_model(I89_SEQUENCE, verbose=False)

# Generate MSA
msa, deletion_matrix = af2.create_single_sequence_msa(I89_SEQUENCE)

# Run prediction with recycling and early stopping
result_recycling = af2.predict_with_recycling(
    model,
    msa=msa,
    deletion_matrix=deletion_matrix,
    max_recycles=6,
    early_stop_tolerance=0.5,  # Stop if RMSD change < 0.5 Angstrom
    seed=0,
    verbose=True
)

# Plot convergence
recycle_trajectory = result_recycling['trajectory']
recycles = [r['recycle'] for r in recycle_trajectory]
plddt_values = [r['metrics']['plddt'] * 100 for r in recycle_trajectory]
rmsd_changes = [r['rmsd_change'] if r['rmsd_change'] is not None else 0 for r in recycle_trajectory]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# pLDDT convergence
ax1.plot(recycles, plddt_values, 'o-', color='green', linewidth=2)
ax1.set_xlabel('Recycle')
ax1.set_ylabel('Mean pLDDT (%)')
ax1.set_title('pLDDT Convergence')
ax1.grid(True, alpha=0.3)

# RMSD changes
ax2.plot(recycles[1:], rmsd_changes[1:], 's-', color='purple', linewidth=2)
ax2.axhline(y=0.5, color='red', linestyle='--', label='Early stop threshold')
ax2.set_xlabel('Recycle')
ax2.set_ylabel('RMSD Change (Å)')
ax2.set_title('Structure Convergence')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nConverged at recycle {len(recycle_trajectory)-1}")
print(f"Final pLDDT: {result_recycling['metrics']['plddt']*100:.1f}%")


## Section 6: Sampling Multiple Conformations

Now let's explore techniques for sampling multiple conformations using dropout and different random seeds.


In [None]:
%%time
#@title Generate Conformational Ensemble

print("="*60)
print("GENERATING CONFORMATIONAL ENSEMBLE")
print("="*60)

# Use high-level API to generate ensemble with different MSA conditions
all_predictions = af2.predict_conformational_ensemble(
    sequence=I89_SEQUENCE,
    msa_modes=["mmseqs2", "single_sequence"],
    num_seeds=3,
    num_recycles=3,
    use_dropout=True,
    jobname="i89_ensemble",
    verbose=True
)

print(f"\nGenerated {len(all_predictions)} structures total")

# Calculate RMSD to references for all predictions
ensemble_rmsds = []
for pred in all_predictions:
    pred_ca = pred['structure'][:, 1, :]
    rmsd1 = af2.calculate_rmsd(pred_ca, state1_coords)
    rmsd2 = af2.calculate_rmsd(pred_ca, state2_coords)
    ensemble_rmsds.append({
        'msa_mode': pred['msa_mode'],
        'seed': pred['seed'],
        'rmsd_state1': rmsd1,
        'rmsd_state2': rmsd2,
        'plddt': pred['metrics']['plddt'] * 100
    })

# Analyze by MSA mode
with_msa = [r for r in ensemble_rmsds if r['msa_mode'] == 'mmseqs2']
without_msa = [r for r in ensemble_rmsds if r['msa_mode'] == 'single_sequence']

print("\n" + "="*60)
print("ENSEMBLE STATISTICS")
print("="*60)
print(f"\nWith MSA ({len(with_msa)} structures):")
print(f"  Mean RMSD to State 1: {np.mean([r['rmsd_state1'] for r in with_msa]):.2f} ± {np.std([r['rmsd_state1'] for r in with_msa]):.2f} Å")
print(f"  Mean RMSD to State 2: {np.mean([r['rmsd_state2'] for r in with_msa]):.2f} ± {np.std([r['rmsd_state2'] for r in with_msa]):.2f} Å")
print(f"  Mean pLDDT: {np.mean([r['plddt'] for r in with_msa]):.1f}%")

print(f"\nWithout MSA ({len(without_msa)} structures):")
print(f"  Mean RMSD to State 1: {np.mean([r['rmsd_state1'] for r in without_msa]):.2f} ± {np.std([r['rmsd_state1'] for r in without_msa]):.2f} Å")
print(f"  Mean RMSD to State 2: {np.mean([r['rmsd_state2'] for r in without_msa]):.2f} ± {np.std([r['rmsd_state2'] for r in without_msa]):.2f} Å")
print(f"  Mean pLDDT: {np.mean([r['plddt'] for r in without_msa]):.1f}%")


In [None]:
#@title Visualize Ensemble Distribution

# Create RMSD scatter plot
fig, ax = plt.subplots(figsize=(10, 8))

# Plot points colored by MSA mode
for r in ensemble_rmsds:
    color = 'steelblue' if r['msa_mode'] == 'mmseqs2' else 'coral'
    marker = 'o' if r['msa_mode'] == 'mmseqs2' else 's'
    label = 'With MSA' if r['msa_mode'] == 'mmseqs2' else 'Without MSA'
    ax.scatter(r['rmsd_state1'], r['rmsd_state2'], 
              c=color, marker=marker, s=100, alpha=0.7,
              label=label if r['seed'] == 0 else "")

# Add reference point (State 1 vs State 2)
ax.scatter([0], [ref_rmsd], marker='*', s=500, c='red', 
          label=f'State 1 vs State 2 ({ref_rmsd:.1f}Å)')

# Add diagonal line
max_rmsd = max(max([r['rmsd_state1'] for r in ensemble_rmsds]),
               max([r['rmsd_state2'] for r in ensemble_rmsds]))
ax.plot([0, max_rmsd], [0, max_rmsd], 'k--', alpha=0.3)

ax.set_xlabel('RMSD to State 1 (Å)', fontsize=12)
ax.set_ylabel('RMSD to State 2 (Å)', fontsize=12)
ax.set_title('Ensemble Distribution in RMSD Space', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Analyze ensemble diversity
structures = [pred['structure'] for pred in all_predictions]
ensemble_stats = af2.analyze_ensemble(structures, verbose=True)


In [None]:
#@title Create LogMD Trajectory from Ensemble
#@markdown Visualize the entire ensemble as an interactive 3D trajectory

if af2.check_logmd():
    print("Creating LogMD trajectory from ensemble predictions...")
    
    # Create trajectory sorted by RMSD to State 2
    trajectory = af2.create_trajectory_from_ensemble(
        predictions=all_predictions,
        sequence=I89_SEQUENCE,
        project="i89_ensemble",
        align_structures=True,
        sort_by_rmsd=True,
        reference_coords=state2_coords,
        max_structures=20,  # Limit for faster loading
        verbose=True
    )
    
    if trajectory:
        print("\n" + "="*60)
        print("ENSEMBLE TRAJECTORY CREATED")
        print("="*60)
        print("Features:")
        print("  - All structures aligned for comparison")
        print("  - Sorted by RMSD to State 2")
        print("  - Colored by pLDDT confidence")
        print("  - Animated transition between conformations")
        print("\nUse the viewer controls to:")
        print("  - Play/pause the animation")
        print("  - Step through individual frames")
        print("  - Rotate and zoom the view")
        
        # Display in notebook
        try:
            import logmd_utils
            logmd_utils.display_trajectory_in_notebook(trajectory)
        except:
            print(f"\nView trajectory at: {trajectory.url}")
    else:
        print("Failed to create trajectory")
else:
    print("LogMD not available - skipping ensemble trajectory")
    print("\nTo use this feature:")
    print("  1. Install LogMD: pip install logmd")
    print("  2. Restart kernel")
    print("  3. Re-run this cell")


In [None]:
#@title Compare MSA Conditions with LogMD
#@markdown Create separate trajectories for with/without MSA predictions

if af2.check_logmd():
    print("Creating separate trajectories for MSA comparison...")
    
    # Separate predictions by MSA mode
    with_msa_preds = [p for p in all_predictions if p.get('msa_mode') == 'mmseqs2']
    without_msa_preds = [p for p in all_predictions if p.get('msa_mode') == 'single_sequence']
    
    print(f"\nWith MSA: {len(with_msa_preds)} predictions")
    print(f"Without MSA: {len(without_msa_preds)} predictions")
    
    # Create trajectory for predictions with MSA
    if with_msa_preds:
        traj_with_msa = af2.create_trajectory_from_ensemble(
            predictions=with_msa_preds,
            sequence=I89_SEQUENCE,
            project="i89_with_msa",
            align_structures=True,
            sort_by_rmsd=True,
            reference_coords=state1_coords,  # Sort by state 1
            verbose=False
        )
        if traj_with_msa:
            print(f"\nWith MSA trajectory: {traj_with_msa.url}")
    
    # Create trajectory for predictions without MSA
    if without_msa_preds:
        traj_without_msa = af2.create_trajectory_from_ensemble(
            predictions=without_msa_preds,
            sequence=I89_SEQUENCE,
            project="i89_without_msa",
            align_structures=True,
            sort_by_rmsd=True,
            reference_coords=state2_coords,  # Sort by state 2
            verbose=False
        )
        if traj_without_msa:
            print(f"Without MSA trajectory: {traj_without_msa.url}")
    
    print("\n" + "="*60)
    print("MSA COMPARISON TRAJECTORIES")
    print("="*60)
    print("Key observations:")
    print("  - With MSA predictions cluster near State 1")
    print("  - Without MSA predictions cluster near State 2")
    print("  - MSA provides evolutionary bias toward native state")
    print("\nOpen both URLs in separate tabs to compare side-by-side!")
else:
    print("LogMD not available - skipping MSA comparison")


In [None]:
#@title Create Progressive Ensemble Animation
#@markdown Watch structures being added one by one to the ensemble

if af2.check_logmd():
    print("Creating progressive ensemble animation...")
    print("This shows how the ensemble builds up structure by structure!\n")
    
    # Sort by generation order (MSA mode first, then by seed)
    sorted_ensemble = sorted(
        all_predictions, 
        key=lambda x: (x.get('msa_mode', ''), x.get('seed', 0))
    )
    
    traj = af2.create_trajectory_from_ensemble(
        predictions=sorted_ensemble,
        sequence=I89_SEQUENCE,
        project="i89_ensemble_progressive",
        align_structures=True,
        max_structures=20,  # Limit for performance
        verbose=False
    )
    
    if traj:
        print("="*60)
        print("PROGRESSIVE ENSEMBLE ANIMATION")
        print("="*60)
        print(f"View progressive build-up: {traj.url}")
        print("\nThis animation shows:")
        print("  - First 3 frames: Predictions WITH MSA (seed 0, 1, 2)")
        print("  - Next 3 frames: Predictions WITHOUT MSA (seed 0, 1, 2)")
        print("  - Notice the conformational transition!")
        print("\nTips for viewing:")
        print("  - Play the animation to see ensemble grow")
        print("  - Pause to inspect individual structures")
        print("  - Compare how different seeds explore conformational space")
    else:
        print("Failed to create progressive animation")
else:
    print("LogMD not available - skipping progressive animation")
    print("This feature shows how the ensemble builds up incrementally")


In [None]:
#@title Explore Conformational Space with LogMD
#@markdown Create a "journey" from State 1-like to State 2-like conformations

if af2.check_logmd():
    print("Creating conformational space exploration...")
    print("This creates a smooth transition between conformational states!\n")
    
    # Sort by RMSD to State 1 (creates a "journey" from State 1 to State 2)
    traj_journey = af2.create_trajectory_from_ensemble(
        predictions=all_predictions,
        sequence=I89_SEQUENCE,
        project="i89_state1_to_state2_journey",
        align_structures=True,
        sort_by_rmsd=True,
        reference_coords=state1_coords,
        max_structures=15,  # Select representative structures
        verbose=False
    )
    
    if traj_journey:
        print("="*60)
        print("CONFORMATIONAL JOURNEY")
        print("="*60)
        print(f"View the journey: {traj_journey.url}")
        print("\nWhat this shows:")
        print("  - Structures sorted by similarity to State 1")
        print("  - Early frames: Most State 1-like (with MSA)")
        print("  - Middle frames: Intermediate conformations")
        print("  - Late frames: Most State 2-like (without MSA)")
        print("\nThis visualization demonstrates:")
        print("  - The conformational continuum between states")
        print("  - How MSA depth affects conformation selection")
        print("  - The smooth energy landscape AlphaFold2 explores")
        print("\nScientific insight:")
        print("  - Real proteins also sample conformational ensembles")
        print("  - AlphaFold2's predictions reflect this biological reality")
        print("  - Different MSA depths access different energy minima")
    else:
        print("Failed to create journey visualization")
else:
    print("LogMD not available - skipping conformational journey")
    print("\nThis feature would show a smooth conformational transition")
    print("from State 1-like to State 2-like structures")


## Section 7: Advanced Analysis - Coevolution and MSA Quality

Let's examine how MSA quality and coevolution patterns influence predictions.


In [None]:
#@title Analyze MSA and Coevolution

# Generate MSA for analysis
print("Generating MSA for analysis...")
msa_full, del_matrix = af2.get_msa([I89_SEQUENCE], "i89_msa_analysis", verbose=False)

# Compute coevolution
coev_matrix = af2.get_coevolution(msa_full)

# Plot MSA statistics
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Plot 1: MSA depth
ax = axes[0, 0]
ax.bar(range(len(msa_full)), [1]*len(msa_full), color='lightblue')
ax.set_xlabel('Sequence Index')
ax.set_ylabel('Presence')
ax.set_title(f'MSA Depth: {len(msa_full)} sequences')

# Plot 2: Sequence identity distribution
ax = axes[0, 1]
query_seq = msa_full[0]
identities = []
for seq in msa_full[1:]:
    identity = np.mean(seq == query_seq) * 100
    identities.append(identity)
if identities:
    ax.hist(identities, bins=20, color='green', alpha=0.7, edgecolor='black')
ax.set_xlabel('Sequence Identity (%)')
ax.set_ylabel('Count')
ax.set_title('Sequence Identity to Query')

# Plot 3: Coevolution matrix
ax = axes[1, 0]
im = ax.imshow(coev_matrix, cmap='RdBu_r', vmin=-np.max(np.abs(coev_matrix)), 
               vmax=np.max(np.abs(coev_matrix)))
ax.set_xlabel('Position')
ax.set_ylabel('Position')
ax.set_title('Coevolution Matrix')
plt.colorbar(im, ax=ax)

# Plot 4: Top coevolving pairs
ax = axes[1, 1]
upper_tri = np.triu_indices_from(coev_matrix, k=6)
coev_values = coev_matrix[upper_tri]
top_n = min(20, len(coev_values))
top_indices = np.argsort(coev_values)[-top_n:]
top_values = coev_values[top_indices]
ax.barh(range(top_n), top_values, color='purple', alpha=0.7)
ax.set_xlabel('Coevolution Score')
ax.set_ylabel('Pair Rank')
ax.set_title(f'Top {top_n} Coevolving Pairs')

plt.tight_layout()
plt.show()

print(f"\nMSA Statistics:")
print(f"  Number of sequences: {len(msa_full)}")
print(f"  Mean sequence identity: {np.mean(identities):.1f}%" if identities else "  Single sequence")
print(f"  Max coevolution score: {np.max(coev_matrix):.3f}")


## Section 8: Saving Results

Let's save our predictions for further analysis.


In [None]:
#@title Save Best Predictions to PDB

# Save predictions with highest confidence
best_with_msa = max(with_msa, key=lambda x: x['plddt'])
best_without_msa = max(without_msa, key=lambda x: x['plddt'])

# Find corresponding structures
for pred in all_predictions:
    if pred['msa_mode'] == 'mmseqs2' and pred['seed'] == best_with_msa['seed']:
        af2.save_pdb(
            atom_positions=pred['structure'],
            sequence=I89_SEQUENCE,
            output_path="i89_best_with_msa.pdb",
            plddt=pred['plddt']
        )
        print(f"Saved: i89_best_with_msa.pdb (pLDDT: {best_with_msa['plddt']:.1f}%)")
        break

for pred in all_predictions:
    if pred['msa_mode'] == 'single_sequence' and pred['seed'] == best_without_msa['seed']:
        af2.save_pdb(
            atom_positions=pred['structure'],
            sequence=I89_SEQUENCE,
            output_path="i89_best_without_msa.pdb",
            plddt=pred['plddt']
        )
        print(f"Saved: i89_best_without_msa.pdb (pLDDT: {best_without_msa['plddt']:.1f}%)")
        break

# Save ensemble statistics
import json
with open("i89_ensemble_stats.json", "w") as f:
    json.dump({
        'n_structures': len(all_predictions),
        'msa_modes': list(set(r['msa_mode'] for r in ensemble_rmsds)),
        'rmsd_stats': {
            'with_msa': {
                'mean_rmsd_state1': float(np.mean([r['rmsd_state1'] for r in with_msa])),
                'mean_rmsd_state2': float(np.mean([r['rmsd_state2'] for r in with_msa])),
                'mean_plddt': float(np.mean([r['plddt'] for r in with_msa]))
            },
            'without_msa': {
                'mean_rmsd_state1': float(np.mean([r['rmsd_state1'] for r in without_msa])),
                'mean_rmsd_state2': float(np.mean([r['rmsd_state2'] for r in without_msa])),
                'mean_plddt': float(np.mean([r['plddt'] for r in without_msa]))
            }
        },
        'ensemble_diversity': {
            'mean_pairwise_rmsd': float(ensemble_stats['mean_pairwise_rmsd']),
            'max_pairwise_rmsd': float(ensemble_stats['max_pairwise_rmsd'])
        }
    }, f, indent=2)

print("\nSaved ensemble statistics to i89_ensemble_stats.json")


## Summary and Key Takeaways

### What We've Learned

1. **MSA Controls Conformation**: 
   - With MSA → State 1 preference
   - Without MSA → State 2 preference
   - MSA depth can be tuned for intermediate states

2. **Recycling Refines Structure**:
   - Most improvement in first 3-6 recycles
   - Early stopping saves computation
   - Convergence can be monitored via RMSD changes

3. **Sampling Strategies**:
   - Dropout introduces stochasticity
   - Multiple seeds explore conformational space
   - MSA subsampling provides control

4. **Interactive Visualization with LogMD**:
   - Real-time structure evolution during prediction
   - Trajectory creation from ensemble predictions
   - Side-by-side comparison of different conditions
   - Immediate visual feedback on conformational changes

5. **Analysis Methods**:
   - RMSD for known references
   - Coevolution reveals functional coupling
   - Ensemble statistics quantify diversity

### Practical Guidelines

- **For single structure**: Use full MSA, 3-6 recycles
- **For conformational sampling**: Vary MSA depth, use dropout
- **For efficiency**: Implement early stopping
- **For validation**: Compare to known structures when available
- **For visualization**: Use LogMD to inspect structure evolution and compare ensembles

### Next Steps

Try these techniques on your proteins of interest:
1. Proteins with known conformational changes
2. Intrinsically disordered regions
3. Domain movements
4. Oligomeric assemblies

### Resources

- **AF2 Utils Documentation**: See README_tutorial.md
- **LogMD Utils Documentation**: Interactive 3D visualization tools
- **ColabDesign**: https://github.com/sokrypton/ColabDesign
- **AlphaFold**: https://alphafold.ebi.ac.uk/
- **LogMD**: https://logmd.dev for molecular visualization

---

**Thank you for participating in this tutorial!**
