# 6W4H NSP10-NSP16 Interface Analysis

**Date:** January 28, 2025  
**Target:** NSP10-NSP16 methyltransferase complex  
**PDB ID:** 6W4H  
**Reference:** Trepte et al. (2024) Molecular Systems Biology 20:428-457

---

## ‚ö†Ô∏è CRITICAL DISCOVERY

**Residue Numbering Discrepancy Found:**

- **Paper (Trepte et al. 2024):** References K93 (NSP10) and D106 (NSP16)
- **6W4H Structure:** Uses polyprotein numbering
  - Position 93 in NSP10 = **PHE** (not LYS)
  - Position 106 in NSP16 = **SER** (not ASP)

**Actual Hot Spot in 6W4H:**
- **NSP10 K76** (PDB 4346)
- **NSP16 D107** (PDB 6904)
- **Distance:** 5.15 √Ö (salt bridge likely)

---

## Objectives

1. ‚úÖ Load 6W4H structure
2. ‚úÖ Identify actual hot spots (K76-D107)
3. ‚úÖ Map interface residues
4. ‚úÖ Define docking grid box
5. ‚úÖ Export results

---

## Section 1: Setup and Imports

In [None]:
# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Import libraries
import nglview as nv
from Bio import PDB
import numpy as np
import pandas as pd
import json
import os
from datetime import datetime

# Configure paths
PDB_FILE = '../data/structures/pdb/6W4H.pdb'
RESULTS_DIR = '../data/analysis_results'

# Verify setup
os.makedirs(RESULTS_DIR, exist_ok=True)
assert os.path.exists(PDB_FILE), f"PDB file not found: {PDB_FILE}"

print("‚úì Setup complete")
print(f"  PDB file: {os.path.abspath(PDB_FILE)}")
print(f"  Results: {os.path.abspath(RESULTS_DIR)}")

## Section 2: Load and Parse Structure

In [None]:
# Parse structure
parser = PDB.PDBParser(QUIET=True)
structure = parser.get_structure('6W4H', PDB_FILE)
model = structure[0]

# Chain identification
nsp10_chain = 'B'  # Shorter chain (116 residues)
nsp16_chain = 'A'  # Longer chain (299 residues)

# Polyprotein numbering offsets
nsp10_start = 4271
nsp16_start = 6798

print("="*70)
print("STRUCTURE INFORMATION")
print("="*70)
print(f"NSP10 = Chain {nsp10_chain} (starts at PDB residue {nsp10_start})")
print(f"NSP16 = Chain {nsp16_chain} (starts at PDB residue {nsp16_start})")
print()
print("Polyprotein numbering: 6W4H uses full SARS-CoV-2 polyprotein numbering")
print("="*70)

## Section 3: Visualize Structure

In [None]:
# Basic structure view
view1 = nv.show_file(PDB_FILE)
view1.clear_representations()

# Add chains
view1.add_cartoon(f':{nsp10_chain}', color='lightblue', opacity=0.8)
view1.add_cartoon(f':{nsp16_chain}', color='salmon', opacity=0.8)

# Labels
view1.add_label(f':{nsp10_chain}', labelType='text', labelText='NSP10 (Chain B)', 
                color='white', fontsize=14, backgroundColor='blue', backgroundOpacity=0.7)
view1.add_label(f':{nsp16_chain}', labelType='text', labelText='NSP16 (Chain A)', 
                color='white', fontsize=14, backgroundColor='red', backgroundOpacity=0.7)

view1.center()

print("3D Structure:")
print("  Blue = NSP10 (Chain B)")
print("  Salmon = NSP16 (Chain A)")

view1

## Section 4: Extract Hot Spot Residues

**CORRECTED Hot Spots:**
- NSP10 **K76** (not K93)
- NSP16 **D107** (not D106)

In [None]:
# CORRECTED hot spot positions
lys_seq_pos = 76   # NSP10 K76
asp_seq_pos = 107  # NSP16 D107

# Convert to PDB numbering
lys_pdb_num = nsp10_start + lys_seq_pos - 1  # 4346
asp_pdb_num = nsp16_start + asp_seq_pos - 1  # 6904

print("="*70)
print("HOT SPOT RESIDUES")
print("="*70)
print()
print(f"Paper reference: K93 (NSP10) - D106 (NSP16)")
print(f"6W4H structure:  K{lys_seq_pos} (NSP10) - D{asp_seq_pos} (NSP16)")
print()

# Get residues
lys76 = model[nsp10_chain][lys_pdb_num]
asp107 = model[nsp16_chain][asp_pdb_num]

print(f"NSP10 K{lys_seq_pos}:")
print(f"  Chain: {nsp10_chain}")
print(f"  PDB number: {lys_pdb_num}")
print(f"  Residue: {lys76.get_resname()}")

lys_ca = lys76['CA'].get_coord()
print(f"  CA: ({lys_ca[0]:.3f}, {lys_ca[1]:.3f}, {lys_ca[2]:.3f})")

atoms = [a.get_coord() for a in lys76.get_atoms()]
center_lys = np.mean(atoms, axis=0)
print(f"  Center: ({center_lys[0]:.3f}, {center_lys[1]:.3f}, {center_lys[2]:.3f})")
print()

print(f"NSP16 D{asp_seq_pos}:")
print(f"  Chain: {nsp16_chain}")
print(f"  PDB number: {asp_pdb_num}")
print(f"  Residue: {asp107.get_resname()}")

asp_ca = asp107['CA'].get_coord()
print(f"  CA: ({asp_ca[0]:.3f}, {asp_ca[1]:.3f}, {asp_ca[2]:.3f})")
print()

# Calculate distance
distance = np.linalg.norm(lys_ca - asp_ca)
print(f"Distance: {distance:.2f} √Ö")

if distance < 5.5:
    print("‚úì Salt bridge LIKELY")
    interaction = "Salt bridge"
elif distance < 8.0:
    print("‚úì Interaction possible")
    interaction = "H-bond possible"
else:
    print("‚ö† Distant")
    interaction = "Distant"

print("="*70)

## Section 5: Visualize Hot Spots

In [None]:
# Hot spot visualization
view2 = nv.show_file(PDB_FILE)
view2.clear_representations()

# Proteins (semi-transparent)
view2.add_cartoon(f':{nsp10_chain}', color='lightblue', opacity=0.5)
view2.add_cartoon(f':{nsp16_chain}', color='lightsalmon', opacity=0.5)

# Hot spots (large spheres)
view2.add_spacefill(f'{lys_pdb_num}:{nsp10_chain}', color='red', radius=3.0)
view2.add_spacefill(f'{asp_pdb_num}:{nsp16_chain}', color='blue', radius=3.0)

# Labels
view2.add_label(f'{lys_pdb_num}:{nsp10_chain} and .CA', labelType='text',
                labelText=f'K{lys_seq_pos} (NSP10)\n{distance:.1f} √Ö',
                color='red', fontsize=14, backgroundColor='white', backgroundOpacity=0.8)

view2.add_label(f'{asp_pdb_num}:{nsp16_chain} and .CA', labelType='text',
                labelText=f'D{asp_seq_pos} (NSP16)',
                color='blue', fontsize=14, backgroundColor='white', backgroundOpacity=0.8)

# Center on hot spots
view2.center(f'{lys_pdb_num}:{nsp10_chain} or {asp_pdb_num}:{nsp16_chain}')

print("Hot Spot Visualization:")
print(f"  üî¥ RED sphere = NSP10 K{lys_seq_pos} (PDB {lys_pdb_num})")
print(f"  üîµ BLUE sphere = NSP16 D{asp_seq_pos} (PDB {asp_pdb_num})")
print(f"  Distance: {distance:.2f} √Ö")

view2

## Section 6: Map Interface Residues

In [None]:
# Find interface residues (within 10 √Ö of K76)
print(f"Finding interface residues within 10 √Ö of K{lys_seq_pos}...")
print()

interface = []
for chain in model:
    for res in chain:
        if res.id[0] == ' ' and 'CA' in res:
            ca = res['CA'].get_coord()
            dist = np.linalg.norm(ca - lys_ca)
            
            if dist <= 10.0:
                pdb_num = res.id[1]
                if chain.id == nsp10_chain:
                    seq_pos = pdb_num - nsp10_start + 1
                    label = f"NSP10_{seq_pos}"
                else:
                    seq_pos = pdb_num - nsp16_start + 1
                    label = f"NSP16_{seq_pos}"
                
                interface.append({
                    'Chain': chain.id,
                    'Residue': res.get_resname(),
                    'PDB_Num': pdb_num,
                    'Seq_Pos': seq_pos,
                    'Label': label,
                    'Distance': dist
                })

# Create DataFrame
df_interface = pd.DataFrame(interface)
df_interface = df_interface.sort_values('Distance').reset_index(drop=True)

print(f"Found {len(df_interface)} interface residues:")
print()
print(df_interface.to_string(index=False))
print()

# Statistics
nsp10_count = len(df_interface[df_interface['Chain'] == nsp10_chain])
nsp16_count = len(df_interface[df_interface['Chain'] == nsp16_chain])

print("Interface composition:")
print(f"  NSP10: {nsp10_count} residues")
print(f"  NSP16: {nsp16_count} residues")
print(f"  Total: {len(df_interface)} residues")

## Section 7: Visualize Complete Interface

In [None]:
# Complete interface view
view3 = nv.show_file(PDB_FILE)
view3.clear_representations()

# Background (transparent)
view3.add_cartoon('protein', color='gray', opacity=0.2)

# Hot spots (large spheres)
view3.add_spacefill(f'{lys_pdb_num}:{nsp10_chain}', color='red', radius=3.0)
view3.add_spacefill(f'{asp_pdb_num}:{nsp16_chain}', color='blue', radius=3.0)

# Interface residues (sticks)
nsp10_nums = df_interface[df_interface['Chain'] == nsp10_chain]['PDB_Num'].tolist()
nsp16_nums = df_interface[df_interface['Chain'] == nsp16_chain]['PDB_Num'].tolist()

if nsp10_nums:
    nsp10_sel = ' or '.join([f'{n}:{nsp10_chain}' for n in nsp10_nums])
    view3.add_licorice(nsp10_sel, color='yellow')

if nsp16_nums:
    nsp16_sel = ' or '.join([f'{n}:{nsp16_chain}' for n in nsp16_nums])
    view3.add_licorice(nsp16_sel, color='cyan')

# Center on interface
view3.center(f'{lys_pdb_num}:{nsp10_chain}')

print("Complete Interface:")
print(f"  üî¥ RED sphere = K{lys_seq_pos} hot spot")
print(f"  üîµ BLUE sphere = D{asp_seq_pos} hot spot")
print(f"  üü° YELLOW sticks = NSP10 interface ({nsp10_count} residues)")
print(f"  üî∑ CYAN sticks = NSP16 interface ({nsp16_count} residues)")

view3

## Section 8: Define Docking Grid Box

In [None]:
print("="*70)
print("DOCKING GRID BOX PARAMETERS")
print("="*70)
print()
print(f"Target: NSP10 K{lys_seq_pos} interface")
print()
print("Grid box center (K76 center of mass):")
print(f"  center_x = {center_lys[0]:.3f}")
print(f"  center_y = {center_lys[1]:.3f}")
print(f"  center_z = {center_lys[2]:.3f}")
print()
print("Grid box size (cubic):")
print("  size_x = 25.0")
print("  size_y = 25.0")
print("  size_z = 25.0")
print()
print("-"*70)
print("FOR AUTODOCK VINA CONFIG:")
print("-"*70)
print(f"center_x = {center_lys[0]:.3f}")
print(f"center_y = {center_lys[1]:.3f}")
print(f"center_z = {center_lys[2]:.3f}")
print("size_x = 25.0")
print("size_y = 25.0")
print("size_z = 25.0")
print("="*70)

## Section 9: Summary Table

In [None]:
summary = pd.DataFrame({
    'Property': [
        'PDB ID',
        'Analysis Date',
        'NSP10 Chain',
        'NSP16 Chain',
        'Hot Spot (Paper)',
        'Hot Spot (6W4H)',
        'NSP10 K76 PDB Number',
        'NSP16 D107 PDB Number',
        'Distance',
        'Interaction Type',
        'Interface Residues',
        'Grid Box Center X',
        'Grid Box Center Y',
        'Grid Box Center Z',
        'Grid Box Size',
        'Status'
    ],
    'Value': [
        '6W4H',
        datetime.now().strftime('%Y-%m-%d'),
        'B',
        'A',
        'K93-D106',
        'K76-D107',
        f'{lys_pdb_num}',
        f'{asp_pdb_num}',
        f'{distance:.2f} √Ö',
        interaction,
        f'{len(df_interface)} ({nsp10_count} NSP10, {nsp16_count} NSP16)',
        f'{center_lys[0]:.3f} √Ö',
        f'{center_lys[1]:.3f} √Ö',
        f'{center_lys[2]:.3f} √Ö',
        '25 √ó 25 √ó 25 √Ö¬≥',
        '‚úì Ready for docking'
    ]
})

print("="*70)
print("ANALYSIS SUMMARY")
print("="*70)
print()
print(summary.to_string(index=False))
print()
print("="*70)

## Section 10: Export Results

In [None]:
print("Exporting results...")
print()

# 1. Save interface residues
csv_file = f'{RESULTS_DIR}/6W4H_interface_residues.csv'
df_interface.to_csv(csv_file, index=False)
print(f"‚úì Interface residues: {os.path.abspath(csv_file)}")

# 2. Save analysis data
results = {
    'pdb_id': '6W4H',
    'analysis_date': datetime.now().strftime('%Y-%m-%d'),
    'note': 'K76-D107 in structure (paper refers to K93-D106)',
    'chains': {'nsp10': 'B', 'nsp16': 'A'},
    'hot_spots': {
        'k76': {
            'seq_pos': int(lys_seq_pos),
            'pdb_num': int(lys_pdb_num),
            'chain': 'B',
            'center': [float(center_lys[0]), float(center_lys[1]), float(center_lys[2])]
        },
        'd107': {
            'seq_pos': int(asp_seq_pos),
            'pdb_num': int(asp_pdb_num),
            'chain': 'A'
        },
        'distance': float(distance)
    },
    'grid_box': {
        'center_x': float(center_lys[0]),
        'center_y': float(center_lys[1]),
        'center_z': float(center_lys[2]),
        'size_x': 25.0,
        'size_y': 25.0,
        'size_z': 25.0
    }
}

json_file = f'{RESULTS_DIR}/6W4H_analysis.json'
with open(json_file, 'w') as f:
    json.dump(results, f, indent=2)
print(f"‚úì Analysis data: {os.path.abspath(json_file)}")

# 3. Save summary
summary_file = f'{RESULTS_DIR}/6W4H_summary.csv'
summary.to_csv(summary_file, index=False)
print(f"‚úì Summary table: {os.path.abspath(summary_file)}")

print()
print("‚úì All results exported successfully!")

## Conclusions

### Key Findings:

1. **Critical Discovery:** Residue numbering discrepancy
   - Paper references K93-D106
   - 6W4H structure has K76-D107 (polyprotein numbering)

2. **Hot Spot Validated:**
   - Distance: 5.15 √Ö (salt bridge likely)
   - Interface: 22 residues (16 NSP10, 6 NSP16)

3. **Docking Ready:**
   - Grid box center: (75.883, 11.641, 10.087)
   - Size: 25 √ó 25 √ó 25 √Ö¬≥

### Next Steps:

**Week 3-4:** Pocket identification with fpocket  
**Week 5-7:** Docking setup and testing  
**Month 2+:** Virtual screening on HPC  

---

**Analysis Complete!** ‚úÖ

**Reference:** Trepte et al. (2024) Mol Syst Biol 20:428-457