[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/engelberger/tutorials-ai4pd-2025/blob/main/tutorial_alphafold2_i89_conformations_v2.ipynb)

# Tutorial: Prediction of Protein Structures and Multiple Conformations using AlphaFold2

## Clean Implementation Using AF2 Utils Package

**Duration:** 90 minutes  
**Instructor:** Felipe Engelberger  
**Date:** AI4PD Workshop 2025

---

## Learning Objectives

By the end of this tutorial, you will understand:

1. **MSA's role in conformation selection**: How evolutionary information biases AlphaFold2 predictions
2. **Recycling mechanics**: How iterative refinement affects structure quality and conformation
3. **Conformational sampling strategies**: Practical techniques using dropout and MSA subsampling
4. **Structure analysis tools**: RMSD calculations, visualization, and ensemble analysis
5. **Real-world applications**: When and how to apply these techniques to proteins of interest

## Tutorial Overview

We'll use the **i89 protein** as our model system. This 96-residue protein exhibits distinct conformational states that AlphaFold2 can capture through different prediction strategies:

- **State 1**: The conformation typically predicted with full MSA
- **State 2**: An alternative conformation accessible without MSA

We have experimental structures for both states (`state1.pdb` and `state2.pdb`) for validation.


## Section 1: Environment Setup

First, let's set up our environment with the AF2 Utils package that provides a clean wrapper around ColabDesign.


In [None]:
%%time
#@title Install Dependencies and Import AF2 Utils
#@markdown This cell handles all setup automatically

import os
import sys
import warnings
warnings.filterwarnings('ignore')

# Check if running in Colab
IN_COLAB = 'google.colab' in sys.modules

print("="*60)
print("ALPHAFOLD2 TUTORIAL SETUP")
print("="*60)

# Download af2_utils.py if not present
if not os.path.exists("af2_utils.py"):
    print("\nDownloading af2_utils.py...")
    os.system("wget -q https://raw.githubusercontent.com/engelberger/tutorials-ai4pd-2025/main/af2_utils.py")
    print("  - af2_utils.py downloaded")

# Import af2_utils
print("\nImporting AF2 Utils...")
import af2_utils as af2
print(f"  - AF2 Utils v{af2.__version__} loaded")

# Check installation status
print("\nChecking dependencies...")
status = af2.check_installation(verbose=False)
for component, installed in status.items():
    symbol = "+" if installed else "-"
    print(f"  {symbol} {component}: {'ready' if installed else 'missing'}")

# Install missing dependencies if needed
missing = [k for k, v in status.items() if not v and k != 'environment_setup']
if missing:
    print(f"\nInstalling missing dependencies...")
    af2.install_dependencies(
        install_colabdesign='colabdesign' in missing,
        install_hhsuite='hhsuite' in missing,
        download_params='alphafold_params' in missing,
        verbose=True
    )

# Setup environment
print("\nConfiguring environment...")
af2.setup_environment(verbose=False)
print("  - JAX memory and environment configured")

print("\n" + "="*60)
print("SETUP COMPLETE - Ready for predictions!")
print("="*60)

In [None]:
#@title Import Additional Libraries
import numpy as np
import matplotlib.pyplot as plt
from Bio import PDB
from pathlib import Path
import json

print("Libraries imported successfully")


## Section 2: The i89 Protein - Our Model System

The i89 protein is a 96-residue protein that can adopt multiple conformational states. We'll use it to demonstrate how AlphaFold2's predictions can be influenced by MSA depth, recycling, and sampling parameters.


In [None]:
#@title Define i89 Sequence and Load Reference Structures

# i89 protein sequence (96 residues)
I89_SEQUENCE = "GSHMASMEDLQAEARAFLSEEMIAEFKAAFDMFDADGGGDISYKAVGTVFRMLGINPSKEVLDYLKEKIDVDGSGTIDFEEFLVLMVYIMKQDA"

print("i89 protein statistics:")
print(f"  Length: {len(I89_SEQUENCE)} residues")
print(f"  Sequence: {I89_SEQUENCE[:30]}...{I89_SEQUENCE[-20:]}")

# Check if reference structures exist, download if needed
if not os.path.exists("state1.pdb") or not os.path.exists("state2.pdb"):
    print("\nDownloading reference structures...")
    os.system("wget -q https://raw.githubusercontent.com/engelberger/tutorials-ai4pd-2025/main/state1.pdb")
    os.system("wget -q https://raw.githubusercontent.com/engelberger/tutorials-ai4pd-2025/main/state2.pdb")
    print("  - Reference structures downloaded")
else:
    print("\nReference structures found:")
    print("  - state1.pdb: Conformation typically predicted with MSA")
    print("  - state2.pdb: Alternative conformation accessible without MSA")

# Calculate RMSD between reference states
state1_coords = af2.load_pdb_coords("state1.pdb")
state2_coords = af2.load_pdb_coords("state2.pdb")
ref_rmsd = af2.calculate_rmsd(state1_coords, state2_coords)

print(f"\nRMSD between reference states: {ref_rmsd:.2f} Angstrom")
print("This indicates significant conformational difference!")


## Section 3: Basic Prediction with Full MSA

Let's start by predicting the i89 structure with a full MSA. This typically results in a conformation closer to State 1.


In [None]:
%%time
#@title Quick Prediction with Full MSA
#@markdown Using af2_utils high-level API for simple prediction

print("="*60)
print("PREDICTION WITH FULL MSA")
print("="*60)

# Use the high-level quick_predict function
result_with_msa = af2.quick_predict(
    sequence=I89_SEQUENCE,
    msa_mode="mmseqs2",  # Full MSA from MMseqs2
    num_recycles=3,
    jobname="i89_with_msa",
    verbose=True
)

# Calculate RMSD to reference states
pred_ca = result_with_msa['structure'][:, 1, :]  # CA atoms
rmsd_state1 = af2.calculate_rmsd(pred_ca, state1_coords)
rmsd_state2 = af2.calculate_rmsd(pred_ca, state2_coords)

print("\n" + "="*60)
print("RESULTS")
print("="*60)
print(f"RMSD to State 1: {rmsd_state1:.2f} Angstrom")
print(f"RMSD to State 2: {rmsd_state2:.2f} Angstrom")
print(f"Mean pLDDT: {result_with_msa['metrics']['plddt']*100:.1f}%")

if rmsd_state1 < rmsd_state2:
    print(f"\nPrediction is closer to State 1 (as expected with MSA)")
    print(f"Delta: {rmsd_state2 - rmsd_state1:.2f} Angstrom difference")
else:
    print(f"\nPrediction is closer to State 2")
    print(f"Delta: {rmsd_state1 - rmsd_state2:.2f} Angstrom difference")
