# Antibody Validation with IgFold on Google Colab

This notebook validates your antibody generation model using IgFold on Google Colab's FREE GPU.

**Requirements**: Upload your checkpoint file when prompted

**GPU**: Make sure to enable GPU (Runtime ‚Üí Change runtime type ‚Üí GPU)

---

## 1. Check GPU Availability

In [None]:
import torch

if torch.cuda.is_available():
    print(f"‚úÖ GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"   VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("‚ùå No GPU found!")
    print("\n‚ö†Ô∏è IMPORTANT: Enable GPU in Runtime ‚Üí Change runtime type ‚Üí GPU (T4)")

## 2. Install Dependencies

In [None]:
print("Installing IgFold and dependencies...")
print("This may take 3-4 minutes...\n")

# Fix dependency conflicts by upgrading huggingface_hub first
print("Step 1: Upgrading core dependencies...")
!pip install -q --upgrade huggingface_hub

print("Step 2: Installing IgFold...")
!pip install -q igfold

print("Step 3: Downgrading transformers for compatibility...")
!pip install -q transformers==4.35.0

print("\n‚úÖ Installation complete!")
print("\n‚ö†Ô∏è IMPORTANT: You may need to restart the runtime:")
print("   Runtime ‚Üí Restart runtime")
print("   Then re-run cells 1-2 before continuing")

## 3. Upload Your Model Files

You'll need to upload:
1. **Model checkpoint** (`improved_small_2025_10_31_best.pt`)
2. **Validation data** (`val.json`)
3. **Model code files** (all `.py` files from `generators/` folder)

**OR** you can clone your entire project if it's on GitHub.

In [None]:
from google.colab import files
import os

print("Upload your checkpoint file:")
print("File: improved_small_2025_10_31_best.pt")
print("\nClick 'Choose Files' below...\n")

uploaded = files.upload()

print("\n‚úÖ Uploaded:", list(uploaded.keys()))

## 4. Create Model Code

We'll recreate your model architecture here.

In [None]:
# Create generators directory
!mkdir -p generators

# You'll need to paste your model code here or upload the files
# For now, let's create a simple structure

print("\n‚ö†Ô∏è IMPORTANT: You need to upload your model code files:")
print("   - generators/transformer_seq2seq.py")
print("   - generators/tokenizer.py")
print("   - generators/data_loader.py")
print("\nYou can either:")
print("   1. Upload them manually (use files.upload() again)")
print("   2. Clone from GitHub (if your code is there)")
print("\nFor now, let's try a simpler approach...")

## 5. Simple IgFold Test

Let's test IgFold with a sample antibody sequence first.

In [None]:
from igfold import IgFoldRunner
import tempfile
import numpy as np

print("Initializing IgFold...")
igfold = IgFoldRunner()
print("‚úÖ IgFold loaded!\n")

# Test with a sample antibody
heavy = "EVQLVESGGGLVQPGGSLRLSCAASGFTISDYAIHWVRQAPGKGLEWVAGITPAGGYTAYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARFVFFLPYAMDYWGQGTLVTVSS"
light = "DIQMTQSPSSLSASVGDRVTITCRASQDVSTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSGSGTDFTLTISSSLQPEDFATYCQQSYTTPPTFGQGTKVEIKR"

print(f"Testing with sample antibody:")
print(f"  Heavy: {len(heavy)} aa")
print(f"  Light: {len(light)} aa")
print("\nPredicting structure... (this may take 30-60 seconds)")

# Create temp file for output
with tempfile.NamedTemporaryFile(mode='w', suffix='.pdb', delete=False) as tmp:
    pdb_file = tmp.name

# Run IgFold
sequences = {'H': heavy, 'L': light}
igfold.fold(pdb_file=pdb_file, sequences=sequences, do_refine=False, do_renum=False)

# Read PDB and extract pLDDT
with open(pdb_file, 'r') as f:
    pdb_string = f.read()

# Extract pLDDT scores from B-factor column
plddt_scores = []
for line in pdb_string.split('\n'):
    if line.startswith('ATOM'):
        try:
            bfactor = float(line[60:66].strip())
            plddt_scores.append(bfactor)
        except:
            pass

plddt_scores = np.array(plddt_scores)

# Convert from fraction (0-1) to percentage (0-100) and clip outliers
plddt_scores = np.clip(plddt_scores * 100, 0, 100)

mean_plddt = plddt_scores.mean()

print(f"\n‚úÖ Structure predicted!")
print(f"   Mean pLDDT: {mean_plddt:.2f}")
print(f"   Min pLDDT:  {plddt_scores.min():.2f}")
print(f"   Max pLDDT:  {plddt_scores.max():.2f}")

if mean_plddt > 70:
    print("\nüéâ Good quality structure!")
else:
    print("\n‚ö†Ô∏è Lower quality structure")

# Save the PDB file
with open('test_antibody.pdb', 'w') as f:
    f.write(pdb_string)
    
print("\nPDB file saved as: test_antibody.pdb")

## 6. Download PDB Structure

In [None]:
from google.colab import files

print("Downloading PDB file...")
files.download('test_antibody.pdb')
print("\n‚úÖ Download complete!")
print("You can view this in PyMOL or ChimeraX")

## 7. Validate Your Generated Antibodies

**Once you upload your model files**, you can validate your generated antibodies here.

For now, let's create a simple validation function:

In [None]:
def validate_antibodies(antibody_sequences, save_pdbs=True):
    """
    Validate a list of antibody sequences using IgFold.
    
    Args:
        antibody_sequences: List of tuples (heavy_chain, light_chain)
        save_pdbs: Whether to save PDB files
    
    Returns:
        List of validation results
    """
    from igfold import IgFoldRunner
    import tempfile
    import numpy as np
    import os
    
    print(f"Validating {len(antibody_sequences)} antibodies...\n")
    
    # Initialize IgFold
    igfold = IgFoldRunner()
    
    results = []
    
    for i, (heavy, light) in enumerate(antibody_sequences):
        print(f"[{i+1}/{len(antibody_sequences)}] Processing antibody {i+1}...")
        
        try:
            # Create temp file
            with tempfile.NamedTemporaryFile(mode='w', suffix='.pdb', delete=False) as tmp:
                pdb_file = tmp.name
            
            # Run IgFold
            sequences = {'H': heavy, 'L': light}
            igfold.fold(pdb_file=pdb_file, sequences=sequences, do_refine=False, do_renum=False)
            
            # Read PDB
            with open(pdb_file, 'r') as f:
                pdb_string = f.read()
            
            # Extract pLDDT
            plddt_scores = []
            for line in pdb_string.split('\n'):
                if line.startswith('ATOM'):
                    try:
                        bfactor = float(line[60:66].strip())
                        plddt_scores.append(bfactor)
                    except:
                        pass
            
            plddt_scores = np.array(plddt_scores)
            
            # Convert from fraction (0-1) to percentage (0-100) and clip outliers
            plddt_scores = np.clip(plddt_scores * 100, 0, 100)
            
            mean_plddt = float(plddt_scores.mean())
            
            # Quality grade
            if mean_plddt > 90:
                quality = "Excellent"
            elif mean_plddt > 70:
                quality = "Good"
            elif mean_plddt > 50:
                quality = "Fair"
            else:
                quality = "Poor"
            
            result = {
                'antibody_id': i,
                'mean_plddt': mean_plddt,
                'min_plddt': float(plddt_scores.min()),
                'max_plddt': float(plddt_scores.max()),
                'quality': quality
            }
            
            results.append(result)
            
            # Save PDB
            if save_pdbs:
                pdb_filename = f'antibody_{i:03d}_plddt{mean_plddt:.0f}.pdb'
                with open(pdb_filename, 'w') as f:
                    f.write(pdb_string)
            
            print(f"    pLDDT: {mean_plddt:.1f} - {quality}")
            
            # Clean up temp file
            os.unlink(pdb_file)
            
        except Exception as e:
            print(f"    ‚ùå Error: {e}")
            results.append({'antibody_id': i, 'error': str(e)})
    
    # Summary
    valid_results = [r for r in results if 'error' not in r]
    if valid_results:
        plddt_values = [r['mean_plddt'] for r in valid_results]
        print(f"\n" + "="*70)
        print("Validation Summary")
        print("="*70)
        print(f"Total antibodies:     {len(antibody_sequences)}")
        print(f"Successful:           {len(valid_results)}")
        print(f"Failed:               {len(antibody_sequences) - len(valid_results)}")
        print(f"\nMean pLDDT:           {np.mean(plddt_values):.2f} ¬± {np.std(plddt_values):.2f}")
        print(f"Median pLDDT:         {np.median(plddt_values):.2f}")
        print(f"Range:                {np.min(plddt_values):.2f} - {np.max(plddt_values):.2f}")
        print("="*70)
    
    return results

print("‚úÖ Validation function ready!")
print("\nUsage:")
print('  antibodies = [(heavy1, light1), (heavy2, light2), ...]')
print('  results = validate_antibodies(antibodies)')

## 8. Example: Validate Multiple Antibodies

Replace these with your generated sequences:

In [None]:
# Example antibodies (replace with your generated ones)
my_antibodies = [
    (
        "EVQLVESGGGLVQPGGSLRLSCAASGFTISDYAIHWVRQAPGKGLEWVAGITPAGGYTAYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARFVFFLPYAMDYWGQGTLVTVSS",
        "DIQMTQSPSSLSASVGDRVTITCRASQDVSTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSGSGTDFTLTISSSLQPEDFATYCQQSYTTPPTFGQGTKVEIKR"
    ),
    # Add more antibodies here...
]

# Run validation
results = validate_antibodies(my_antibodies, save_pdbs=True)

# Display results
import pandas as pd
df = pd.DataFrame([r for r in results if 'error' not in r])
print("\nDetailed Results:")
print(df)

## 9. Download All Results

In [None]:
import zipfile
import os

# Create zip file with all PDB structures
pdb_files = [f for f in os.listdir('.') if f.endswith('.pdb')]

if pdb_files:
    with zipfile.ZipFile('antibody_structures.zip', 'w') as zipf:
        for pdb_file in pdb_files:
            zipf.write(pdb_file)
    
    print(f"Created zip file with {len(pdb_files)} PDB structures")
    
    # Download
    files.download('antibody_structures.zip')
    print("\n‚úÖ Download complete!")
else:
    print("No PDB files found")

---

## Summary

**What this notebook does**:
1. ‚úÖ Uses Google Colab's FREE GPU (T4 with 16GB VRAM)
2. ‚úÖ Installs IgFold automatically
3. ‚úÖ Validates antibody structures
4. ‚úÖ Calculates pLDDT quality scores
5. ‚úÖ Saves PDB structure files
6. ‚úÖ Downloads results to your computer

**Next steps**:
- Upload your checkpoint and model code
- Generate antibodies or paste existing sequences
- Run validation
- Download PDB structures
- Visualize in PyMOL or ChimeraX

**Questions?** Check the documentation:
- IgFold: https://github.com/Graylab/IgFold
- Google Colab: https://colab.research.google.com
