# 6W4H NSP10-NSP16 Interface Analysis

**Date:** January 28, 2025  
**Target:** NSP10-NSP16 methyltransferase complex  
**PDB ID:** 6W4H  
**Reference:** Trepte et al. (2024) Molecular Systems Biology 20:428-457

---

## Project Structure

```
PROJECT/
‚îú‚îÄ‚îÄ data/
‚îÇ   ‚îî‚îÄ‚îÄ structures/pdb/
‚îÇ       ‚îî‚îÄ‚îÄ 6W4H.pdb
‚îú‚îÄ‚îÄ notebooks/
‚îÇ   ‚îî‚îÄ‚îÄ 02_analyze_6W4H_interface.ipynb  ‚Üê YOU ARE HERE
```

---

## Objectives

1. ‚úÖ Visualize 6W4H NSP10-NSP16 structure in 3D
2. ‚úÖ Locate and visualize hot spot residues (NSP10 Lys93, NSP16 Asp106)
3. ‚úÖ Identify interface residues within 10 √Ö of Lys93
4. ‚úÖ Define docking grid box center for virtual screening
5. ‚úÖ Export data for next steps (fpocket, docking)

---

## Key Findings from Trepte et al. 2024

**Validated Hot Spots:**
- **NSP10 Lys93:** Critical hot spot (lowest ŒîG contribution)
- **NSP16 Asp106:** Critical hot spot (forms salt bridge with Lys93)
- **K93E mutation:** Abolishes NSP10-NSP16 binding
- **D106K mutation:** Abolishes NSP10-NSP16 binding

**Virtual Screening Details:**
- Docking box centered on NSP10 Lys93
- Original box size: 75.6 √ó 16.8 √ó 17.6 √Ö (asymmetric)
- Screened ~350M compounds (Enamine REAL)
- Top hit (Compound 459): Kd 12.97 ¬µM, IC50 9.2 ¬µM (PPI disruption)

---

## Section 1: Setup and Import Libraries

In [None]:
# Suppress nglview deprecation warning (harmless)
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='nglview')

# Import required libraries
import nglview as nv
from Bio import PDB
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import json
import os
from datetime import datetime

# Configure paths (notebook is in notebooks/ directory)
PDB_FILE = '../data/structures/pdb/6W4H.pdb'
RESULTS_DIR = '../data/analysis_results'

# Create results directory
os.makedirs(RESULTS_DIR, exist_ok=True)

# Verify PDB file exists
if not os.path.exists(PDB_FILE):
    raise FileNotFoundError(f"6W4H.pdb not found at {PDB_FILE}")

print("="*60)
print("6W4H NSP10-NSP16 INTERFACE ANALYSIS")
print("="*60)
print(f"Analysis date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Working directory: {os.getcwd()}")
print(f"PDB file: {os.path.abspath(PDB_FILE)}")
print(f"Results directory: {os.path.abspath(RESULTS_DIR)}")
print(f"nglview version: {nv.__version__}")
print("="*60)
print("‚úì All packages loaded successfully!")
print("‚úì Paths configured correctly!")
print()

## Section 2: Load and Visualize 6W4H Structure

**Expected:**
- 2 chains: NSP10 (shorter, ~139 residues) and NSP16 (longer, ~298 residues)
- Complex structure with interface visible

In [None]:
# Load structure with nglview
view = nv.show_file(PDB_FILE)

# Clear default representation
view.clear_representations()

# Add cartoon representation
# Chain A = NSP10 (marine blue)
# Chain B = NSP16 (salmon)
view.add_cartoon('protein and :A', color='marine', opacity=0.8)
view.add_cartoon('protein and :B', color='salmon', opacity=0.8)

# Add labels
view.add_label(':A', labelType='text', labelText='NSP10', 
               color='white', fontsize=16, backgroundColor='blue', backgroundOpacity=0.7)
view.add_label(':B', labelType='text', labelText='NSP16', 
               color='white', fontsize=16, backgroundColor='red', backgroundOpacity=0.7)

# Center view
view.center()

print("‚úì 6W4H structure loaded")
print("  Marine blue = NSP10")
print("  Salmon = NSP16")
print()

view

## Section 3: Parse Structure and Identify Chains

Determine which chain is NSP10 (shorter) and which is NSP16 (longer).

In [None]:
# Parse PDB structure
parser = PDB.PDBParser(QUIET=True)
structure = parser.get_structure('6W4H', PDB_FILE)
model = structure[0]

print("="*60)
print("STRUCTURE INFORMATION")
print("="*60)
print()

# Get chain information
chain_lengths = {}
for chain in model:
    residues = [r for r in chain.get_residues() if r.id[0] == ' ']
    chain_lengths[chain.id] = len(residues)
    print(f"Chain {chain.id}: {len(residues)} standard residues")

print()

# Identify NSP10 (shorter) and NSP16 (longer)
# NSP10 ~139 residues, NSP16 ~298 residues
nsp10_chain = min(chain_lengths, key=chain_lengths.get)
nsp16_chain = max(chain_lengths, key=chain_lengths.get)

print("Chain assignments:")
print(f"  ‚úì Chain {nsp10_chain} = NSP10 ({chain_lengths[nsp10_chain]} residues)")
print(f"  ‚úì Chain {nsp16_chain} = NSP16 ({chain_lengths[nsp16_chain]} residues)")
print()
print("="*60)

## Section 4: Locate and Visualize Hot Spot Residues

**Critical residues from Trepte et al. 2024:**
- NSP10 **Lys93** (K93) - Primary hot spot
- NSP16 **Asp106** (D106) - Primary hot spot
- Form salt bridge interaction

In [None]:
# Create new view with hot spots highlighted
view2 = nv.show_file(PDB_FILE)
view2.clear_representations()

# Proteins as semi-transparent cartoon
view2.add_cartoon(f'protein and :{nsp10_chain}', color='lightblue', opacity=0.6)
view2.add_cartoon(f'protein and :{nsp16_chain}', color='lightsalmon', opacity=0.6)

# Hot spot residues as large spheres
# NSP10 Lys93 (RED)
view2.add_ball_and_stick(f'93:{nsp10_chain}', color='red', radius=2.5)
view2.add_label(f'93:{nsp10_chain} and .CA', labelType='text', 
                labelText='NSP10 Lys93\nHOT SPOT', 
                color='red', fontsize=14, backgroundColor='white', 
                backgroundOpacity=0.8)

# NSP16 Asp106 (BLUE)
view2.add_ball_and_stick(f'106:{nsp16_chain}', color='blue', radius=2.5)
view2.add_label(f'106:{nsp16_chain} and .CA', labelType='text', 
                labelText='NSP16 Asp106\nHOT SPOT', 
                color='blue', fontsize=14, backgroundColor='white',
                backgroundOpacity=0.8)

# Draw distance line between hot spots
view2.add_distance(f'93:{nsp10_chain} and .CA', f'106:{nsp16_chain} and .CA', 
                   color='magenta', labelColor='magenta')

# Center on hot spots
view2.center(f'93:{nsp10_chain} or 106:{nsp16_chain}')
view2.camera = 'orthographic'

print("="*60)
print("HOT SPOT VISUALIZATION")
print("="*60)
print(f"‚úì NSP10 Lys93 (Chain {nsp10_chain}) - RED spheres")
print(f"‚úì NSP16 Asp106 (Chain {nsp16_chain}) - BLUE spheres")
print(f"‚úì Distance shown in MAGENTA")
print("="*60)
print()

view2

## Section 5: Extract Hot Spot Coordinates

In [None]:
print("="*60)
print("HOT SPOT RESIDUE COORDINATES")
print("="*60)
print()

# Get Lys93 coordinates
lys93 = model[nsp10_chain][93]
print(f"NSP10 Lys93 (Chain {nsp10_chain}, residue 93):")
print(f"  Residue name: {lys93.get_resname()}")

if 'CA' in lys93:
    lys93_ca = lys93['CA'].get_coord()
    print(f"  CA coordinates: X={lys93_ca[0]:7.3f}, Y={lys93_ca[1]:7.3f}, Z={lys93_ca[2]:7.3f}")
else:
    raise ValueError("CA atom not found in Lys93")

# Calculate center of mass of all atoms
atoms = list(lys93.get_atoms())
coords = np.array([atom.get_coord() for atom in atoms])
center_lys93 = coords.mean(axis=0)
print(f"  Center of mass: X={center_lys93[0]:7.3f}, Y={center_lys93[1]:7.3f}, Z={center_lys93[2]:7.3f}")
print(f"  Number of atoms: {len(atoms)}")
print()

# Get Asp106 coordinates
asp106 = model[nsp16_chain][106]
print(f"NSP16 Asp106 (Chain {nsp16_chain}, residue 106):")
print(f"  Residue name: {asp106.get_resname()}")

if 'CA' in asp106:
    asp106_ca = asp106['CA'].get_coord()
    print(f"  CA coordinates: X={asp106_ca[0]:7.3f}, Y={asp106_ca[1]:7.3f}, Z={asp106_ca[2]:7.3f}")
else:
    raise ValueError("CA atom not found in Asp106")

# Calculate center of mass
atoms = list(asp106.get_atoms())
coords = np.array([atom.get_coord() for atom in atoms])
center_asp106 = coords.mean(axis=0)
print(f"  Center of mass: X={center_asp106[0]:7.3f}, Y={center_asp106[1]:7.3f}, Z={center_asp106[2]:7.3f}")
print(f"  Number of atoms: {len(atoms)}")
print()

# Calculate distance between hot spots
distance = np.linalg.norm(lys93_ca - asp106_ca)

print("="*60)
print("HOT SPOT INTERACTION ANALYSIS")
print("="*60)
print(f"Distance between Lys93 CA and Asp106 CA: {distance:.2f} √Ö")
print()

if distance < 5.0:
    print("‚úì CLOSE proximity detected")
    print("  ‚Üí Salt bridge LIKELY (typical: 2.5-4.0 √Ö)")
    print("  ‚Üí Strong electrostatic interaction expected")
    interaction_type = 'Salt bridge (likely)'
elif distance < 8.0:
    print("‚úì MODERATE proximity detected")
    print("  ‚Üí Hydrogen bond POSSIBLE (typical: 2.5-3.5 √Ö)")
    print("  ‚Üí Water-mediated interaction possible")
    interaction_type = 'H-bond (possible)'
else:
    print("‚ö† DISTANT residues detected")
    print("  ‚Üí Check chain assignments")
    print("  ‚Üí May need to verify structure")
    interaction_type = 'Distant'

print("="*60)
print()

## Section 6: Identify Interface Residues (within 10 √Ö of Lys93)

Find all residues within 10 √Ö of NSP10 Lys93 to define the interface region.

In [None]:
print("="*60)
print("INTERFACE RESIDUE IDENTIFICATION")
print("="*60)
print(f"Searching for residues within 10 √Ö of Lys93 (Chain {nsp10_chain})...")
print()

nearby_residues = []

for chain in model:
    for residue in chain:
        if residue.id[0] == ' ':  # Standard residue (not heteroatom)
            if 'CA' in residue:
                ca = residue['CA'].get_coord()
                dist = np.linalg.norm(ca - lys93_ca)
                
                if dist <= 10.0:
                    nearby_residues.append({
                        'Chain': chain.id,
                        'Residue': residue.get_resname(),
                        'Number': residue.id[1],
                        'Distance (√Ö)': dist
                    })

# Create DataFrame
df_interface = pd.DataFrame(nearby_residues)
df_interface = df_interface.sort_values('Distance (√Ö)').reset_index(drop=True)

print(f"‚úì Found {len(df_interface)} residues within 10 √Ö of Lys93")
print()
print(df_interface.to_string(index=False))
print()

# Separate by chain
nsp10_interface = df_interface[df_interface['Chain'] == nsp10_chain]
nsp16_interface = df_interface[df_interface['Chain'] == nsp16_chain]

print("="*60)
print("INTERFACE SUMMARY")
print("="*60)
print(f"NSP10 (Chain {nsp10_chain}): {len(nsp10_interface)} residues")
print(f"NSP16 (Chain {nsp16_chain}): {len(nsp16_interface)} residues")
print(f"Total interface residues: {len(df_interface)}")
print("="*60)
print()

## Section 7: Visualize Complete Interface

Show all interface residues with hot spots highlighted.

In [None]:
# Create comprehensive interface view
view3 = nv.show_file(PDB_FILE)
view3.clear_representations()

# Proteins as very transparent cartoon (background)
view3.add_cartoon('protein', color='gray', opacity=0.2)

# Hot spots as large spheres (PRIMARY FOCUS)
view3.add_spacefill(f'93:{nsp10_chain}', color='red', radius=3.0)
view3.add_spacefill(f'106:{nsp16_chain}', color='blue', radius=3.0)

# Interface residues as sticks
nsp10_nums = nsp10_interface['Number'].tolist()
nsp16_nums = nsp16_interface['Number'].tolist()

if nsp10_nums:
    nsp10_selection = ' or '.join([f'{num}:{nsp10_chain}' for num in nsp10_nums])
    view3.add_licorice(nsp10_selection, color='yellow')

if nsp16_nums:
    nsp16_selection = ' or '.join([f'{num}:{nsp16_chain}' for num in nsp16_nums])
    view3.add_licorice(nsp16_selection, color='cyan')

# Distance line between hot spots
view3.add_distance(f'93:{nsp10_chain} and .CA', f'106:{nsp16_chain} and .CA', 
                   color='white', labelColor='white', labelSize=2.0)

# Labels
view3.add_label(f'93:{nsp10_chain} and .CA', labelType='text', 
                labelText=f'Lys93\n{distance:.1f}√Ö', 
                color='red', fontsize=16, backgroundColor='black', backgroundOpacity=0.7)
view3.add_label(f'106:{nsp16_chain} and .CA', labelType='text', 
                labelText='Asp106', 
                color='blue', fontsize=16, backgroundColor='black', backgroundOpacity=0.7)

# Center on interface
view3.center(f'93:{nsp10_chain}')
view3.camera = 'orthographic'

print("="*60)
print("COMPLETE INTERFACE VISUALIZATION")
print("="*60)
print("Color scheme:")
print(f"  üî¥ RED sphere   = NSP10 Lys93 (hot spot)")
print(f"  üîµ BLUE sphere  = NSP16 Asp106 (hot spot)")
print(f"  üü° YELLOW sticks = NSP10 interface residues ({len(nsp10_interface)})")
print(f"  üî∑ CYAN sticks   = NSP16 interface residues ({len(nsp16_interface)})")
print(f"  ‚ö™ WHITE line    = Distance ({distance:.2f} √Ö)")
print("="*60)
print()

view3

## Section 8: Define Docking Grid Box

Calculate grid box parameters for AutoDock Vina docking (Week 5+).

In [None]:
print("="*60)
print("DOCKING GRID BOX PARAMETERS")
print("="*60)
print()

print("Grid box center (Lys93 center of mass):")
print(f"  center_x = {center_lys93[0]:8.3f}")
print(f"  center_y = {center_lys93[1]:8.3f}")
print(f"  center_z = {center_lys93[2]:8.3f}")
print()

print("Suggested grid box size (cubic):")
print(f"  size_x = 25.0")
print(f"  size_y = 25.0")
print(f"  size_z = 25.0")
print()

print("Reference from Trepte et al. 2024:")
print("  Original box: 75.6 √ó 16.8 √ó 17.6 √Ö (asymmetric)")
print("  Screened: ~350M compounds (Enamine REAL)")
print("  Top hits: ~-8.5 kcal/mol")
print()
print("Recommendation:")
print("  Start with 25 √Ö cubic box")
print("  Expand to 30-35 √Ö if needed")
print("  Consider asymmetric if initial results poor")
print()

print("-"*60)
print("COPY THIS FOR AUTODOCK VINA CONFIG FILE:")
print("-"*60)
print(f"center_x = {center_lys93[0]:.3f}")
print(f"center_y = {center_lys93[1]:.3f}")
print(f"center_z = {center_lys93[2]:.3f}")
print("size_x = 25.0")
print("size_y = 25.0")
print("size_z = 25.0")
print("="*60)
print()

# Save to dictionary for export
grid_params = {
    'pdb_id': '6W4H',
    'target': 'NSP10-NSP16',
    'hot_spot': f'NSP10 Lys93 (Chain {nsp10_chain})',
    'center_x': float(center_lys93[0]),
    'center_y': float(center_lys93[1]),
    'center_z': float(center_lys93[2]),
    'size_x': 25.0,
    'size_y': 25.0,
    'size_z': 25.0,
    'reference': 'Trepte et al. 2024',
    'notes': 'Centered on validated hot spot NSP10 Lys93'
}

print("‚úì Grid parameters saved to 'grid_params' dictionary")
print()

## Section 9: Analysis Summary Table

In [None]:
summary_data = {
    'Property': [
        'PDB ID',
        'Complex',
        'Analysis Date',
        'NSP10 Chain',
        'NSP16 Chain',
        'NSP10 Length',
        'NSP16 Length',
        'NSP10 Hot Spot',
        'NSP16 Hot Spot',
        'Hot Spot Distance',
        'Interaction Type',
        'Total Interface Residues (10 √Ö)',
        'NSP10 Interface Residues',
        'NSP16 Interface Residues',
        'Grid Box Center X',
        'Grid Box Center Y',
        'Grid Box Center Z',
        'Grid Box Size',
        'Validation Status',
        'Reference'
    ],
    'Value': [
        '6W4H',
        'NSP10-NSP16 (2\'-O-methyltransferase)',
        datetime.now().strftime('%Y-%m-%d'),
        nsp10_chain,
        nsp16_chain,
        f'{chain_lengths[nsp10_chain]} residues',
        f'{chain_lengths[nsp16_chain]} residues',
        f'Lys93 (K93) - Chain {nsp10_chain}',
        f'Asp106 (D106) - Chain {nsp16_chain}',
        f'{distance:.2f} √Ö',
        interaction_type,
        f'{len(df_interface)} residues',
        f'{len(nsp10_interface)} residues',
        f'{len(nsp16_interface)} residues',
        f'{center_lys93[0]:.3f} √Ö',
        f'{center_lys93[1]:.3f} √Ö',
        f'{center_lys93[2]:.3f} √Ö',
        '25 √ó 25 √ó 25 √Ö¬≥',
        '‚úì Validated (Trepte et al. 2024)',
        'Trepte et al. (2024) Mol Syst Biol'
    ]
}

df_summary = pd.DataFrame(summary_data)

print("="*60)
print("6W4H NSP10-NSP16 ANALYSIS SUMMARY")
print("="*60)
print()
print(df_summary.to_string(index=False))
print()
print("="*60)

## Section 10: Export Results

Save analysis results for use in subsequent steps.

In [None]:
print("="*60)
print("EXPORTING RESULTS")
print("="*60)
print()

# 1. Save interface residues to CSV
csv_path = f'{RESULTS_DIR}/6W4H_interface_residues.csv'
df_interface.to_csv(csv_path, index=False)
print(f"‚úì Interface residues saved to:")
print(f"  {os.path.abspath(csv_path)}")
print()

# 2. Save grid parameters to JSON
json_path = f'{RESULTS_DIR}/6W4H_grid_params.json'
with open(json_path, 'w') as f:
    json.dump(grid_params, f, indent=2)
print(f"‚úì Grid parameters saved to:")
print(f"  {os.path.abspath(json_path)}")
print()

# 3. Save summary table to CSV
summary_path = f'{RESULTS_DIR}/6W4H_analysis_summary.csv'
df_summary.to_csv(summary_path, index=False)
print(f"‚úì Analysis summary saved to:")
print(f"  {os.path.abspath(summary_path)}")
print()

# 4. Create AutoDock Vina config file
vina_config_path = f'{RESULTS_DIR}/6W4H_vina_config.txt'
with open(vina_config_path, 'w') as f:
    f.write("# AutoDock Vina Configuration for 6W4H NSP10-NSP16\n")
    f.write("# Target: NSP10 Lys93 interface\n")
    f.write("# Reference: Trepte et al. 2024\n")
    f.write("# Generated: " + datetime.now().strftime('%Y-%m-%d %H:%M:%S') + "\n\n")
    f.write("receptor = 6W4H_prepared.pdbqt\n")
    f.write("ligand = ligand.pdbqt\n\n")
    f.write("# Grid box center (√Ö)\n")
    f.write(f"center_x = {center_lys93[0]:.3f}\n")
    f.write(f"center_y = {center_lys93[1]:.3f}\n")
    f.write(f"center_z = {center_lys93[2]:.3f}\n\n")
    f.write("# Grid box size (√Ö)\n")
    f.write("size_x = 25.0\n")
    f.write("size_y = 25.0\n")
    f.write("size_z = 25.0\n\n")
    f.write("# Docking parameters\n")
    f.write("exhaustiveness = 8\n")
    f.write("num_modes = 9\n")
    f.write("energy_range = 3\n\n")
    f.write("# Output\n")
    f.write("out = docking_output.pdbqt\n")
    f.write("log = docking_log.txt\n")
print(f"‚úì Vina config template saved to:")
print(f"  {os.path.abspath(vina_config_path)}")
print()

print("="*60)
print("ALL RESULTS EXPORTED SUCCESSFULLY")
print("="*60)
print()

print("Files created:")
print("  1. Interface residues (CSV)")
print("  2. Grid parameters (JSON)")
print("  3. Analysis summary (CSV)")
print("  4. Vina config template (TXT)")
print()
print("Next: Week 3-4 - fpocket pocket identification")
print()

## Conclusions and Next Steps

### ‚úÖ Completed Tasks:
- Structure downloaded and analyzed
- Hot spots identified and validated (Lys93, Asp106)
- Interface residues mapped (10 √Ö cutoff)
- Grid box parameters defined
- Results exported for downstream analyses

### üìã Next Steps:

**Week 3-4: Pocket Identification**
- [ ] Install fpocket
- [ ] Run fpocket on 6W4H structure
- [ ] Compare fpocket pockets with manual interface analysis
- [ ] Identify best pocket for docking
- [ ] Validate pocket includes Lys93 hot spot

**Week 5-7: Docking Setup**
- [ ] Prepare receptor (remove waters, add hydrogens)
- [ ] Convert to PDBQT format
- [ ] Test docking with known inhibitors (e.g., compound 459)
- [ ] Optimize docking parameters
- [ ] Benchmark against literature results

**Month 2+: Virtual Screening**
- [ ] Select compound library
- [ ] Run large-scale screening on HPC (JURECA)
- [ ] Analyze top hits
- [ ] Select compounds for validation

---

## References:

1. **Trepte et al. (2024)** AI-guided pipeline for protein‚Äìprotein interaction drug discovery identifies a SARS-CoV-2 inhibitor. *Molecular Systems Biology* 20:428-457.

2. **Rosas-Lemus et al. (2020)** High-resolution structures of the SARS-CoV-2 2'-O-methyltransferase reveal strategies for structure-based inhibitor design. *Sci Signal* 13:eabe1202. [PDB: 6W4H]

---

**Analysis Complete!** ‚úÖ