# Structure Server - Comprehensive Test Suite

This notebook tests all MCP tools in `structure_server.py`:

1. **fetch_molecules** - Fetch structures from PDB/AlphaFold/PDB-REDO
2. **inspect_molecules** - Inspect structure files to analyze chains and molecules
3. **split_molecules** - Split multi-chain structures into individual chains
4. **clean_protein** - Clean protein structures for MD simulation
5. **clean_ligand** - Clean and prepare ligands using SMILES template matching
6. **run_antechamber_robust** - GAFF2 parameterization with AM1-BCC charges
7. **prepare_complex** - Complete workflow (split + clean + parameterize)

Each tool is tested for:
- Normal operation
- Edge cases
- Error handling (LLM-friendly error responses)
- Boltz-2 predicted structures (computational models)
- Ligand preparation and force field generation


In [None]:
# Setup
import sys
sys.path.insert(0, '..')

from pathlib import Path
import json
import importlib
import asyncio

# For running async functions in notebook
import nest_asyncio
nest_asyncio.apply()

print("Setup complete")


In [None]:
# Check dependencies
print("Checking dependencies...\n")

deps = {
    "gemmi": "Structure parsing (mmCIF/PDB)",
    "pdbfixer": "Protein structure cleaning",
    "openmm": "Molecular simulation",
    "httpx": "Async HTTP client",
    "rdkit": "Ligand processing and charge estimation"
}

for module, desc in deps.items():
    try:
        __import__(module)
        print(f"✓ {module}: {desc}")
    except ImportError:
        print(f"✗ {module}: {desc} (NOT INSTALLED)")

# Check external tools
print("\nChecking external tools...")
from common.base import BaseToolWrapper

tools = {
    "pdb4amber": "Amber naming conventions",
    "antechamber": "GAFF2 parameterization",
    "parmchk2": "Missing parameter generation",
    "obabel": "Format conversion"
}

for tool, desc in tools.items():
    wrapper = BaseToolWrapper(tool)
    print(f"{'✓' if wrapper.is_available() else '✗'} {tool} ({desc})")


In [None]:
# Import and reload the structure server module
import servers.structure_server as structure_module
importlib.reload(structure_module)

# Import tools directly
from servers.structure_server import (
    fetch_molecules,
    inspect_molecules,
    split_molecules,
    clean_protein,
    clean_ligand,
    run_antechamber_robust,
    prepare_complex
)

print("Structure server tools imported successfully")


In [None]:
# Helper function to display results nicely
def show_result(result: dict, title: str = "Result"):
    """Display result dictionary with formatting"""
    print(f"\n{'='*60}")
    print(f" {title}")
    print(f"{'='*60}")
    
    # Check success status
    if result.get('success'):
        print("\n✓ SUCCESS")
    else:
        print("\n✗ FAILED")
    
    # Show errors if any
    if result.get('errors'):
        print("\nErrors:")
        for err in result['errors']:
            print(f"  - {err}")
    
    # Show warnings if any
    if result.get('warnings'):
        print("\nWarnings:")
        for warn in result['warnings']:
            print(f"  - {warn}")
    
    # Show key fields
    skip_keys = {'success', 'errors', 'warnings', 'operations'}
    print("\nDetails:")
    for k, v in result.items():
        if k not in skip_keys:
            if isinstance(v, (dict, list)) and len(str(v)) > 100:
                print(f"  {k}: [complex data, {len(v) if isinstance(v, list) else 'dict'}]")
            else:
                print(f"  {k}: {v}")
    
    # Show operations if present
    if result.get('operations'):
        print("\nOperations:")
        for op in result['operations']:
            status_icon = "✓" if op.get('status') in ['success', 'detected', 'added', 'replaced'] else "○"
            print(f"  {status_icon} {op.get('step')}: {op.get('status')} - {op.get('details', '')[:60]}")

print("Helper function defined")


---
## Test 1: fetch_molecules

Test fetching structures from different sources.


In [None]:
# Test 1.1: Fetch from PDB (small protein: 1CRN - crambin)
print("Test 1.1: Fetch 1CRN from PDB")

result = asyncio.run(fetch_molecules("1CRN", source="pdb"))
show_result(result, "Fetch 1CRN from PDB")

# Verify file exists
if result['success'] and result['file_path']:
    print(f"\nFile size: {Path(result['file_path']).stat().st_size} bytes")


In [None]:
# Test 1.2: Fetch non-existent PDB ID (error handling)
print("Test 1.2: Fetch non-existent PDB ID")

result = asyncio.run(fetch_molecules("XXXX", source="pdb"))
show_result(result, "Fetch Invalid PDB ID")

# Check that error handling is LLM-friendly
assert not result['success'], "Should fail for invalid PDB ID"
assert len(result['errors']) > 0, "Should have error messages"
print("\n✓ Error handling works correctly")


---
## Test 2: inspect_molecules

Test inspecting structure files to analyze chains and molecular composition.


In [None]:
# Test 2.1: Inspect 1AKE (homodimer with ligand)
print("Test 2.1: Inspect 1AKE structure")

# First fetch 1AKE
fetch_result = asyncio.run(fetch_molecules("1AKE", source="pdb"))
if fetch_result['success']:
    result = inspect_molecules(fetch_result['file_path'])
    show_result(result, "Inspect 1AKE")
    
    # Show detailed chain information
    if result['success']:
        print("\n--- Header Information ---")
        for k, v in result.get('header', {}).items():
            print(f"  {k}: {v}")
        
        print("\n--- Entities (from header) ---")
        for entity in result.get('entities', []):
            print(f"  Entity {entity['entity_id']}: {entity.get('name') or '(no name)'}")
            print(f"    Type: {entity['entity_type']}, Polymer: {entity.get('polymer_type')}")
            print(f"    Chains: {entity['chain_ids']}")
        
        print("\n--- Chain Summary ---")
        summary = result.get('summary', {})
        print(f"  Proteins: {summary.get('num_protein_chains', 0)} chains {summary.get('protein_chain_ids', [])}")
        print(f"  Ligands: {summary.get('num_ligand_chains', 0)} chains {summary.get('ligand_chain_ids', [])}")
        print(f"  Waters: {summary.get('num_water_chains', 0)} chains {summary.get('water_chain_ids', [])}")
        print(f"  Ions: {summary.get('num_ion_chains', 0)} chains {summary.get('ion_chain_ids', [])}")
        
        print("\n--- Chains Detail ---")
        for chain in result.get('chains', []):
            print(f"  Chain {chain['chain_id']} ({chain['author_chain']}): {chain['chain_type']}")
            print(f"    Entity: {chain.get('entity_name') or chain.get('entity_id') or 'N/A'}")
            print(f"    Residues: {chain['num_residues']}, Atoms: {chain['num_atoms']}")
            if chain.get('sequence'):
                seq = chain['sequence']
                print(f"    Sequence: {seq[:50]}{'...' if len(seq) > 50 else ''}")
else:
    print("Failed to fetch 1AKE for inspect test")


In [None]:
# Test 2.2: Inspect Boltz-2 predicted structure (computational model)
print("Test 2.2: Inspect Boltz-2 predicted structure")

boltz_cif = "boltz_results_ligand/predictions/ligand/ligand_model_0.cif"
result = inspect_molecules(boltz_cif)
show_result(result, "Inspect Boltz-2 Prediction")

# Show detailed information for AI-generated structure
if result['success']:
    print("\n--- Header Information ---")
    for k, v in result.get('header', {}).items():
        print(f"  {k}: {v}")
    
    print("\n--- Entities ---")
    for entity in result.get('entities', []):
        print(f"  Entity {entity['entity_id']}: {entity.get('name') or '(no name)'}")
        print(f"    Type: {entity['entity_type']}, Polymer: {entity.get('polymer_type')}")
        print(f"    Chains: {entity['chain_ids']}")
    
    print("\n--- Chains Summary ---")
    summary = result.get('summary', {})
    print(f"  Proteins: {summary.get('num_protein_chains', 0)} chains")
    print(f"  Ligands: {summary.get('num_ligand_chains', 0)} chains")
    
    print("\n--- Chain Details ---")
    for chain in result.get('chains', []):
        print(f"  Chain {chain['chain_id']}: {chain['chain_type']}")
        print(f"    Residues: {chain['num_residues']}, Atoms: {chain['num_atoms']}")
        print(f"    Residue types: {chain['residue_names']}")


---
## Test 3: split_molecules

Test splitting multi-chain structures into individual chain files.
The `split_molecules` function uses `inspect_molecules` internally.


In [None]:
# Test 3.1: Split 1AKE (homodimer with ligand)
print("Test 3.1: Split 1AKE structure")

# First fetch 1AKE (if not already available)
fetch_result = asyncio.run(fetch_molecules("1AKE", source="pdb"))
if fetch_result['success']:
    result = split_molecules(fetch_result['file_path'])
    show_result(result, "Split 1AKE")
    
    # Show chain files
    if result['success']:
        print("\nProtein files:")
        for f in result['protein_files']:
            print(f"  - {f}")
        print("\nLigand files:")
        for f in result['ligand_files']:
            print(f"  - {f}")
        if result['ion_files']:
            print("\nIon files:")
            for f in result['ion_files']:
                print(f"  - {f}")
        
        # Show chain mapping
        print("\nChain to file mapping:")
        for info in result.get('chain_file_info', []):
            print(f"  Chain {info['chain_id']} ({info['chain_type']}): {info['file']}")
else:
    print("Failed to fetch 1AKE for split test")


In [None]:
# Test 3.2: Split with chain selection
print("Test 3.2: Split 1AKE - select only chain A")

if fetch_result['success']:
    result = split_molecules(
        fetch_result['file_path'],
        select_chains=['A']
    )
    show_result(result, "Split 1AKE (Chain A only)")
    
    if result['success']:
        print(f"\nExtracted {len(result['protein_files'])} protein chain(s)")
        print(f"Output directory: {result['output_dir']}")
else:
    print("Skipped - 1AKE not available")


In [None]:
# Test 3.3: Split Boltz-2 predicted structure
print("Test 3.3: Split Boltz-2 predicted structure")

boltz_cif = "boltz_results_ligand/predictions/ligand/ligand_model_0.cif"
result = split_molecules(boltz_cif)
show_result(result, "Split Boltz-2 Prediction")

if result['success']:
    print("\nProtein files:")
    for f in result['protein_files']:
        print(f"  - {f}")
    print("\nLigand files:")
    for f in result['ligand_files']:
        print(f"  - {f}")
    if result['ion_files']:
        print("\nIon files:")
        for f in result['ion_files']:
            print(f"  - {f}")
    
    # Show chain mapping
    print("\nChain to file mapping:")
    for info in result.get('chain_file_info', []):
        print(f"  Chain {info['chain_id']} ({info['chain_type']}): {info['file']}")


---
## Test 4: clean_protein

Test protein structure cleaning with PDBFixer.


In [None]:
# Test 4.1: Clean 1CRN (crambin - has disulfide bonds)
print("Test 4.1: Clean 1CRN (crambin with disulfide bonds)")

# First fetch and split
fetch_result = asyncio.run(fetch_molecules("1CRN", source="pdb"))
if fetch_result['success']:
    split_result = split_molecules(fetch_result['file_path'])
    if split_result['success'] and split_result['protein_files']:
        protein_pdb = split_result['protein_files'][0]
        
        result = clean_protein(protein_pdb)
        show_result(result, "Clean 1CRN")
        
        # Check disulfide bonds
        if result.get('disulfide_bonds'):
            print("\nDisulfide bonds detected:")
            for bond in result['disulfide_bonds']:
                print(f"  {bond['residue1']} <-> {bond['residue2']}")
    else:
        print("Failed to split 1CRN")
else:
    print("Failed to fetch 1CRN")


In [None]:
# Test 4.2: Clean with custom options
print("Test 4.2: Clean with custom options (with termini capping)")

if fetch_result['success'] and split_result['success']:
    protein_pdb = split_result['protein_files'][0]
    
    result = clean_protein(
        protein_pdb,
        cap_termini=True,
        ph=7.0
    )
    show_result(result, "Clean with Custom Options")
else:
    print("Skipped - previous test failed")


In [None]:
# Test 4.3: Clean non-existent file (error handling)
print("Test 4.3: Clean non-existent file")

result = clean_protein("/nonexistent/protein.pdb")
show_result(result, "Clean Non-existent File")

assert not result['success'], "Should fail for non-existent file"
print("\n✓ File not found error handling works")


---
## Test 5: clean_ligand

Test ligand cleaning using SMILES template matching.


In [None]:
# Test 5.1: Clean ligand from 1AKE (AP5A inhibitor)
print("Test 5.1: Clean ligand from 1AKE")

# Fetch and split 1AKE to get ligand
fetch_result = asyncio.run(fetch_molecules("1AKE", source="pdb"))
if fetch_result['success']:
    # Split to get ligand chains
    split_result = split_molecules(fetch_result['file_path'])
    
    if split_result['success'] and split_result['ligand_files']:
        ligand_pdb = split_result['ligand_files'][0]
        print(f"Ligand PDB: {ligand_pdb}")
        
        # Get ligand ID from chain info
        ligand_info = [c for c in split_result['chain_file_info'] if c['chain_type'] == 'ligand'][0]
        ligand_id = split_result['all_chains'][2]['residue_names'][0]  # Get ligand name
        print(f"Ligand ID: {ligand_id}")
        
        # Clean ligand using SMILES template matching
        result = clean_ligand(
            ligand_pdb=ligand_pdb,
            ligand_id=ligand_id,  # AP5A
            target_ph=7.4,
            optimize=True
        )
        show_result(result, "Clean 1AKE Ligand (AP5A)")
        
        if result['success']:
            print(f"\nOutput SDF: {result['sdf_file']}")
            print(f"Net charge: {result['net_charge']}")
            print(f"SMILES source: {result['smiles_source']}")
    else:
        print("No ligand files found")
else:
    print("Failed to fetch 1AKE")


In [None]:
# Test 5.2: Clean SAH ligand from Boltz-2 prediction
print("Test 5.2: Clean SAH ligand from Boltz-2 prediction")

boltz_cif = "boltz_results_ligand/predictions/ligand/ligand_model_0.cif"
split_result = split_molecules(boltz_cif)

if split_result['success'] and split_result['ligand_files']:
    # Find SAH ligand chain
    sah_file = None
    sah_chain = None
    for info in split_result['chain_file_info']:
        if info['chain_type'] == 'ligand':
            # Get residue name from all_chains
            for chain in split_result['all_chains']:
                if chain['chain_id'] == info['chain_id']:
                    if 'SAH' in chain['residue_names']:
                        sah_file = info['file']
                        sah_chain = chain
                        break
        if sah_file:
            break
    
    if sah_file:
        print(f"SAH ligand PDB: {sah_file}")
        
        result = clean_ligand(
            ligand_pdb=sah_file,
            ligand_id="SAH",
            target_ph=7.4,
            optimize=True
        )
        show_result(result, "Clean Boltz-2 SAH Ligand")
        
        if result['success']:
            print(f"\nOutput SDF: {result['sdf_file']}")
            print(f"Net charge: {result['net_charge']}")
    else:
        print("SAH ligand not found in Boltz-2 structure")
else:
    print("Failed to split Boltz-2 structure")


In [None]:
# Test 5.3: Clean ligand with user-provided SMILES
print("Test 5.3: Clean ligand with user-provided SMILES")

# Use Boltz-2 SAH ligand with explicit SMILES
boltz_cif = "boltz_results_ligand/predictions/ligand/ligand_model_0.cif"
split_result = split_molecules(boltz_cif)

if split_result['success'] and split_result['ligand_files']:
    # Get first ligand file
    ligand_file = split_result['ligand_files'][0]
    ligand_chain = split_result['chain_file_info'][2]  # First ligand
    ligand_name = split_result['all_chains'][2]['residue_names'][0]
    
    print(f"Ligand: {ligand_name}")
    print(f"File: {ligand_file}")
    
    # SAH SMILES from PDB CCD
    sah_smiles = "Nc1ncnc2c1ncn2[C@@H]1O[C@H](CSCC[C@H](N)C(=O)O)[C@@H](O)[C@H]1O"
    
    result = clean_ligand(
        ligand_pdb=ligand_file,
        ligand_id=ligand_name,
        smiles=sah_smiles,  # User-provided SMILES
        target_ph=7.4,
        optimize=False  # Skip optimization for speed
    )
    show_result(result, "Clean Ligand with User SMILES")
    
    if result['success']:
        print(f"\nSMILES source: {result['smiles_source']}")  # Should be 'user'
else:
    print("Failed to get ligand from Boltz-2")


---
## Test 6: run_antechamber_robust

Test GAFF2 parameterization with AM1-BCC charges.


In [None]:
# Test 6.1: Run antechamber on cleaned SAH ligand
print("Test 6.1: Run antechamber on SAH ligand (GAFF2 + AM1-BCC)")

# First clean the SAH ligand
boltz_cif = "boltz_results_ligand/predictions/ligand/ligand_model_0.cif"
split_result = split_molecules(boltz_cif)

if split_result['success'] and split_result['ligand_files']:
    # Find SAH ligand
    sah_file = None
    for info in split_result['chain_file_info']:
        if info['chain_type'] == 'ligand':
            for chain in split_result['all_chains']:
                if chain['chain_id'] == info['chain_id'] and 'SAH' in chain['residue_names']:
                    sah_file = info['file']
                    break
        if sah_file:
            break
    
    if sah_file:
        # Clean ligand first
        clean_result = clean_ligand(
            ligand_pdb=sah_file,
            ligand_id="SAH",
            target_ph=7.4
        )
        
        if clean_result['success']:
            print(f"Cleaned SDF: {clean_result['sdf_file']}")
            print(f"Net charge: {clean_result['net_charge']}")
            
            # Run antechamber
            result = run_antechamber_robust(
                ligand_file=clean_result['sdf_file'],
                net_charge=clean_result['net_charge'],
                residue_name="SAH"
            )
            show_result(result, "Antechamber SAH")
            
            if result['success']:
                print(f"\nGenerated files:")
                print(f"  MOL2: {result['mol2']}")
                print(f"  FRCMOD: {result['frcmod']}")
                print(f"  Total charge: {result['total_charge']:.4f}")
                
                # Check frcmod validation
                if result['frcmod_validation']:
                    if result['frcmod_validation']['valid']:
                        print("  frcmod: ✓ Valid")
                    else:
                        print(f"  frcmod: ✗ {result['frcmod_validation']['attn_count']} parameters need attention")
        else:
            print(f"Clean failed: {clean_result['errors']}")
    else:
        print("SAH not found")
else:
    print("Failed to split structure")


In [None]:
# Test 6.2: Run antechamber with auto charge estimation
print("Test 6.2: Run antechamber with auto charge estimation")

# Use the same cleaned SDF but let antechamber estimate the charge
if 'clean_result' in dir() and clean_result['success']:
    result = run_antechamber_robust(
        ligand_file=clean_result['sdf_file'],
        net_charge=None,  # Auto-estimate
        residue_name="SAH",
        charge_method="bcc",
        atom_type="gaff2"
    )
    show_result(result, "Antechamber (Auto Charge)")
    
    if result['success']:
        print(f"\nCharge estimation:")
        if result['charge_estimation']:
            print(f"  Estimated: {result['charge_estimation'].get('estimated_charge_at_ph')}")
            print(f"  Confidence: {result['charge_estimation'].get('confidence')}")
        print(f"  Charge used: {result['charge_used']}")
else:
    print("Skipped - clean_result not available")


---
## Test 7: Integration Test

Test complete workflows: fetch -> split -> clean_protein + clean_ligand -> antechamber


In [None]:
# Test 7.1: Complete workflow using prepare_complex
print("Test 7.1: Complete Boltz-2 workflow using prepare_complex")
print("="*60)

boltz_cif = "boltz_results_ligand/predictions/ligand/ligand_model_0.cif"

# Run complete workflow with a single function call
result = prepare_complex(
    structure_file=boltz_cif,
    ph=7.4,
    cap_termini=False,
    process_proteins=True,
    process_ligands=True,
    run_parameterization=True,
    optimize_ligands=True,
    # Optional: provide SMILES for specific ligands
    # ligand_smiles={"SAH": "Nc1ncnc2c1ncn2[C@@H]1O[C@H](CSCC[C@H](N)C(=O)O)[C@@H](O)[C@H]1O"}
)

show_result(result, "prepare_complex Result")

if result['success']:
    print("\n--- Summary ---")
    print(f"Output directory: {result['output_dir']}")
    
    # Show inspection summary
    if result['inspection']:
        summary = result['inspection'].get('summary', {})
        print(f"\nStructure: {summary.get('num_protein_chains', 0)} proteins, "
              f"{summary.get('num_ligand_chains', 0)} ligands")
    
    # Show processed proteins
    print(f"\nProteins processed: {len(result['proteins'])}")
    for p in result['proteins']:
        status = "✓" if p['success'] else "✗"
        print(f"  {status} Chain {p['chain_id']}: {p.get('output_file', 'N/A')}")
        if p['success'] and p.get('statistics'):
            print(f"      Atoms: {p['statistics'].get('final_atoms', 'N/A')}")
    
    # Show processed ligands
    print(f"\nLigands processed: {len(result['ligands'])}")
    for l in result['ligands']:
        status = "✓" if l['success'] else "✗"
        print(f"  {status} {l['ligand_id']} (Chain {l['chain_id']})")
        if l['success']:
            print(f"      SDF: {l.get('sdf_file', 'N/A')}")
            print(f"      MOL2: {l.get('mol2_file', 'N/A')}")
            print(f"      Charge: {l.get('net_charge', 'N/A')}")

print("\n" + "="*60)
print("Workflow complete!")


---
## Summary

This notebook tested all tools in `structure_server.py`:

| Tool | Tests | Purpose |
|------|-------|---------|
| `fetch_molecules` | 2 | Download structures from PDB |
| `inspect_molecules` | 2 | Inspect structure files and analyze chains/molecules |
| `split_molecules` | 3 | Split multi-chain structures (including Boltz-2) |
| `clean_protein` | 3 | Clean and prepare proteins for MD |
| `clean_ligand` | 3 | Clean ligands using SMILES template matching |
| `run_antechamber_robust` | 2 | GAFF2 parameterization with AM1-BCC charges |
| `prepare_complex` | 1 | Complete workflow (split + clean + parameterize) |

### Key Features Tested:
- **LLM-friendly error handling**: All tools return structured `success`/`errors`/`warnings` fields
- **SMILES template matching**: Correct bond orders from CCD or user-provided SMILES
- **pH-dependent protonation**: Dimorphite-DL for correct protonation state
- **GAFF2 parameterization**: AM1-BCC charges with robust error handling
- **frcmod validation**: Check for missing/estimated parameters
- **Boltz-2 support**: Full workflow for AI-predicted protein-ligand complexes
- **One-step workflow**: `prepare_complex` combines all steps for convenience


In [None]:
print("All tests completed!")
