# üß¨ MD Simulation - Mahkota Dewa Study

**Complex:** 264-trihydroxy-4-methoxybenzophenone + PPARG

**Platform:** Kaggle (GPU)

**Duration:** 50 ns with checkpoint every 10 ns

---

## Features:
- ‚úÖ Config-based (easy to switch complex)
- ‚úÖ Auto-checkpoint every 10ns (5 saves total)
- ‚úÖ Resume capability
- ‚úÖ Auto-save to Kaggle output

---

## üîß Configuration

**CHANGE THIS SECTION TO SWITCH COMPLEX**

In [None]:
# ============================================
# CONFIGURATION - EDIT THIS SECTION ONLY
# ============================================

CONFIG = {
    # Complex identification
    "complex_name": "264THM_PPARG",
    "compound_name": "264-trihydroxy-4-methoxybenzophenone",
    "target_name": "PPARG",
    "pdb_id": "6MS7",
    
    # Simulation parameters
    "total_time_ns": 50,           # Total simulation time
    "checkpoint_interval_ns": 10,  # Save checkpoint every X ns
    "temperature_k": 310,          # 37¬∞C
    "timestep_fs": 2,              # femtoseconds
    
    # Equilibration
    "nvt_time_ps": 100,
    "npt_time_ps": 100,
    
    # Force field
    "forcefield": "amber99sb-ildn",
    "water_model": "tip3p",
    
    # Resume from checkpoint? (set to True if resuming)
    "resume": False,
    "resume_from_ns": 0,  # Which checkpoint to resume from
}

# ============================================
# ALTERNATIVE CONFIG FOR LUTEOLIN + PDE5A
# ============================================
# Uncomment below and comment above to switch
"""
CONFIG = {
    "complex_name": "Luteolin_PDE5A",
    "compound_name": "Luteolin",
    "target_name": "PDE5A",
    "pdb_id": "1TBF",
    
    "total_time_ns": 50,
    "checkpoint_interval_ns": 10,
    "temperature_k": 310,
    "timestep_fs": 2,
    
    "nvt_time_ps": 100,
    "npt_time_ps": 100,
    
    "forcefield": "amber99sb-ildn",
    "water_model": "tip3p",
    
    "resume": False,
    "resume_from_ns": 0,
}
"""

# Calculate derived parameters
NSTEPS_TOTAL = int(CONFIG['total_time_ns'] * 1e6 / CONFIG['timestep_fs'])
NSTEPS_PER_SEGMENT = int(CONFIG['checkpoint_interval_ns'] * 1e6 / CONFIG['timestep_fs'])
NUM_SEGMENTS = CONFIG['total_time_ns'] // CONFIG['checkpoint_interval_ns']

print(f"üìã Complex: {CONFIG['complex_name']}")
print(f"‚è±Ô∏è  Total time: {CONFIG['total_time_ns']} ns")
print(f"üíæ Checkpoints: every {CONFIG['checkpoint_interval_ns']} ns ({NUM_SEGMENTS} segments)")
print(f"üî¢ Total steps: {NSTEPS_TOTAL:,}")
print(f"üî¢ Steps per segment: {NSTEPS_PER_SEGMENT:,}")

## 1Ô∏è‚É£ Install Dependencies

In [None]:
%%bash
# Install GROMACS
apt-get update -qq
apt-get install -qq gromacs

# Verify
gmx --version | head -3

In [None]:
# Install Python dependencies
!pip install -q acpype MDAnalysis matplotlib numpy pandas
print("‚úÖ All dependencies installed!")

In [None]:
import os
import shutil
from pathlib import Path

# Create directory structure
WORK_DIR = Path(f"/kaggle/working/{CONFIG['complex_name']}")
OUTPUT_DIR = Path("/kaggle/working/output")

for d in ["input", "topol", "em", "nvt", "npt", "md", "analysis", "checkpoints"]:
    (WORK_DIR / d).mkdir(parents=True, exist_ok=True)

OUTPUT_DIR.mkdir(exist_ok=True)

os.chdir(WORK_DIR)
print(f"üìÅ Working directory: {WORK_DIR}")

## 2Ô∏è‚É£ Upload Input Files

Upload your files from docking results:
- `receptor.pdb` - Protein structure
- `ligand.mol2` / `ligand.pdb` - Ligand structure (from docked pose)

In [None]:
# Option 1: Upload from local (for Kaggle)
# Put your files in /kaggle/input/your-dataset/

# Option 2: Use pre-uploaded dataset
# Adjust these paths based on your dataset

# Example paths - ADJUST THESE!
PROTEIN_PDB = "/kaggle/input/md-simulation-files/PPARG_6MS7.pdb"
LIGAND_FILE = "/kaggle/input/md-simulation-files/264THM_docked.mol2"

# Check if files exist
if os.path.exists(PROTEIN_PDB):
    print(f"‚úÖ Protein: {PROTEIN_PDB}")
    shutil.copy(PROTEIN_PDB, WORK_DIR / "input" / "protein.pdb")
else:
    print(f"‚ùå Protein not found: {PROTEIN_PDB}")
    print("Please upload protein PDB file")

if os.path.exists(LIGAND_FILE):
    print(f"‚úÖ Ligand: {LIGAND_FILE}")
    shutil.copy(LIGAND_FILE, WORK_DIR / "input" / "ligand.mol2")
else:
    print(f"‚ùå Ligand not found: {LIGAND_FILE}")
    print("Please upload ligand file")

## 3Ô∏è‚É£ Prepare Protein Topology

In [None]:
%%bash
cd topol

# Generate protein topology
echo "1" | gmx pdb2gmx -f ../input/protein.pdb \
    -o protein.gro \
    -p topol.top \
    -i posre.itp \
    -ff amber99sb-ildn \
    -water tip3p \
    -ignh

echo "‚úÖ Protein topology generated!"

## 4Ô∏è‚É£ Prepare Ligand Topology (ACPYPE)

In [None]:
os.chdir(WORK_DIR / "topol")

# Generate ligand topology with GAFF2
!acpype -i ../input/ligand.mol2 -b LIG -c bcc -a gaff2

# Move files
!mv LIG.acpype/LIG_GMX.gro ligand.gro
!mv LIG.acpype/LIG_GMX.itp ligand.itp

os.chdir(WORK_DIR)
print("‚úÖ Ligand topology generated!")

## 5Ô∏è‚É£ Combine Protein and Ligand

In [None]:
import re

os.chdir(WORK_DIR / "topol")

# Read protein coordinates
with open("protein.gro", "r") as f:
    protein_lines = f.readlines()

# Read ligand coordinates
with open("ligand.gro", "r") as f:
    ligand_lines = f.readlines()

# Combine
title = protein_lines[0]
protein_atoms = protein_lines[2:-1]
ligand_atoms = ligand_lines[2:-1]
box = protein_lines[-1]

total_atoms = len(protein_atoms) + len(ligand_atoms)

with open("complex.gro", "w") as f:
    f.write(f"{CONFIG['complex_name']} complex\n")
    f.write(f" {total_atoms}\n")
    f.writelines(protein_atoms)
    f.writelines(ligand_atoms)
    f.write(box)

print(f"‚úÖ Complex created: {total_atoms} atoms")

# Update topology to include ligand
with open("topol.top", "r") as f:
    topol = f.read()

# Add ligand include before [ system ]
insert_pos = topol.find("[ system ]")
if insert_pos > 0:
    topol = topol[:insert_pos] + '#include "ligand.itp"\n\n' + topol[insert_pos:]

# Add ligand to molecules
topol += "\nLIG     1\n"

with open("topol.top", "w") as f:
    f.write(topol)

print("‚úÖ Topology updated!")
os.chdir(WORK_DIR)

## 6Ô∏è‚É£ Solvate and Add Ions

In [None]:
%%bash
cd topol

# Create simulation box
gmx editconf -f complex.gro -o box.gro -c -d 1.2 -bt dodecahedron

# Solvate
gmx solvate -cp box.gro -cs spc216.gro -o solvated.gro -p topol.top

echo "‚úÖ System solvated!"

In [None]:
# Create ions MDP
ions_mdp = """
; ions.mdp
integrator  = steep
emtol       = 1000.0
emstep      = 0.01
nsteps      = 50000
nstlist     = 1
cutoff-scheme = Verlet
ns_type     = grid
coulombtype = cutoff
rcoulomb    = 1.0
rvdw        = 1.0
pbc         = xyz
"""

with open(WORK_DIR / "topol" / "ions.mdp", "w") as f:
    f.write(ions_mdp)

In [None]:
%%bash
cd topol

# Add ions (neutralize + 0.15M NaCl)
gmx grompp -f ions.mdp -c solvated.gro -p topol.top -o ions.tpr -maxwarn 5
echo "SOL" | gmx genion -s ions.tpr -o system.gro -p topol.top -pname NA -nname CL -neutral -conc 0.15

echo "‚úÖ System neutralized!"

## 7Ô∏è‚É£ Energy Minimization

In [None]:
# Create EM MDP
em_mdp = """
; em.mdp - Energy Minimization
integrator  = steep
emtol       = 1000.0
emstep      = 0.01
nsteps      = 50000

nstlist     = 1
cutoff-scheme = Verlet
ns_type     = grid
coulombtype = PME
rcoulomb    = 1.0
rvdw        = 1.0
pbc         = xyz
"""

with open(WORK_DIR / "em" / "em.mdp", "w") as f:
    f.write(em_mdp)

In [None]:
%%bash
# Run energy minimization
gmx grompp -f em/em.mdp -c topol/system.gro -p topol/topol.top -o em/em.tpr -maxwarn 5
gmx mdrun -v -deffnm em/em

echo "‚úÖ Energy minimization complete!"

## 8Ô∏è‚É£ NVT Equilibration

In [None]:
# Create NVT MDP
nvt_steps = int(CONFIG['nvt_time_ps'] * 1000 / CONFIG['timestep_fs'])
dt = CONFIG['timestep_fs'] / 1000

nvt_mdp = f"""
; nvt.mdp - NVT Equilibration
define      = -DPOSRES
integrator  = md
nsteps      = {nvt_steps}
dt          = {dt}

nstxout     = 5000
nstvout     = 5000
nstenergy   = 5000
nstlog      = 5000

continuation = no
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter  = 1
lincs_order = 4

cutoff-scheme = Verlet
ns_type     = grid
nstlist     = 10
rcoulomb    = 1.0
rvdw        = 1.0

coulombtype = PME
pme_order   = 4
fourierspacing = 0.16

tcoupl      = V-rescale
tc-grps     = Protein Non-Protein
tau_t       = 0.1 0.1
ref_t       = {CONFIG['temperature_k']} {CONFIG['temperature_k']}

pcoupl      = no
pbc         = xyz
DispCorr    = EnerPres

gen_vel     = yes
gen_temp    = {CONFIG['temperature_k']}
gen_seed    = -1
"""

with open(WORK_DIR / "nvt" / "nvt.mdp", "w") as f:
    f.write(nvt_mdp)

In [None]:
%%bash
# Run NVT equilibration
gmx grompp -f nvt/nvt.mdp -c em/em.gro -r em/em.gro -p topol/topol.top -o nvt/nvt.tpr -maxwarn 5
gmx mdrun -v -deffnm nvt/nvt

echo "‚úÖ NVT equilibration complete!"

## 9Ô∏è‚É£ NPT Equilibration

In [None]:
# Create NPT MDP
npt_steps = int(CONFIG['npt_time_ps'] * 1000 / CONFIG['timestep_fs'])

npt_mdp = f"""
; npt.mdp - NPT Equilibration
define      = -DPOSRES
integrator  = md
nsteps      = {npt_steps}
dt          = {dt}

nstxout     = 5000
nstvout     = 5000
nstenergy   = 5000
nstlog      = 5000

continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter  = 1
lincs_order = 4

cutoff-scheme = Verlet
ns_type     = grid
nstlist     = 10
rcoulomb    = 1.0
rvdw        = 1.0

coulombtype = PME
pme_order   = 4
fourierspacing = 0.16

tcoupl      = V-rescale
tc-grps     = Protein Non-Protein
tau_t       = 0.1 0.1
ref_t       = {CONFIG['temperature_k']} {CONFIG['temperature_k']}

pcoupl      = Parrinello-Rahman
pcoupltype  = isotropic
tau_p       = 2.0
ref_p       = 1.0
compressibility = 4.5e-5
refcoord_scaling = com

pbc         = xyz
DispCorr    = EnerPres

gen_vel     = no
"""

with open(WORK_DIR / "npt" / "npt.mdp", "w") as f:
    f.write(npt_mdp)

In [None]:
%%bash
# Run NPT equilibration
gmx grompp -f npt/npt.mdp -c nvt/nvt.gro -r nvt/nvt.gro -t nvt/nvt.cpt -p topol/topol.top -o npt/npt.tpr -maxwarn 5
gmx mdrun -v -deffnm npt/npt

echo "‚úÖ NPT equilibration complete!"

## üöÄ Production MD (50 ns with Checkpoints)

**Strategy:** Run in 10 ns segments, save checkpoint after each

In [None]:
# Create production MD MDP
md_mdp = f"""
; md.mdp - Production MD ({CONFIG['total_time_ns']} ns)
integrator  = md
nsteps      = {NSTEPS_PER_SEGMENT}  ; Per segment ({CONFIG['checkpoint_interval_ns']} ns)
dt          = {dt}

nstxout     = 0
nstvout     = 0
nstxout-compressed = 5000   ; Save every 10 ps
nstenergy   = 5000
nstlog      = 5000

continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter  = 1
lincs_order = 4

cutoff-scheme = Verlet
ns_type     = grid
nstlist     = 10
rcoulomb    = 1.0
rvdw        = 1.0

coulombtype = PME
pme_order   = 4
fourierspacing = 0.16

tcoupl      = V-rescale
tc-grps     = Protein Non-Protein
tau_t       = 0.1 0.1
ref_t       = {CONFIG['temperature_k']} {CONFIG['temperature_k']}

pcoupl      = Parrinello-Rahman
pcoupltype  = isotropic
tau_p       = 2.0
ref_p       = 1.0
compressibility = 4.5e-5

pbc         = xyz
DispCorr    = EnerPres

gen_vel     = no
"""

with open(WORK_DIR / "md" / "md.mdp", "w") as f:
    f.write(md_mdp)

print(f"üìã Production MDP created for {CONFIG['checkpoint_interval_ns']} ns segments")

In [None]:
import subprocess
import time
from datetime import datetime

def run_md_segment(segment_num, resume=False):
    """Run one segment of MD simulation."""
    start_ns = segment_num * CONFIG['checkpoint_interval_ns']
    end_ns = (segment_num + 1) * CONFIG['checkpoint_interval_ns']
    
    print(f"\n{'='*60}")
    print(f"üöÄ Segment {segment_num + 1}/{NUM_SEGMENTS}: {start_ns}-{end_ns} ns")
    print(f"‚è∞ Started: {datetime.now().strftime('%H:%M:%S')}")
    print(f"{'='*60}")
    
    os.chdir(WORK_DIR)
    
    if segment_num == 0 and not resume:
        # First segment: start from NPT
        result = subprocess.run([
            "gmx", "grompp",
            "-f", "md/md.mdp",
            "-c", "npt/npt.gro",
            "-t", "npt/npt.cpt",
            "-p", "topol/topol.top",
            "-o", "md/md.tpr",
            "-maxwarn", "5"
        ], capture_output=True, text=True)
        
        if result.returncode != 0:
            print(f"‚ùå grompp failed: {result.stderr}")
            return False
            
        # Run MD
        result = subprocess.run([
            "gmx", "mdrun",
            "-deffnm", "md/md",
            "-v"
        ], capture_output=False)
        
    else:
        # Continue from checkpoint
        result = subprocess.run([
            "gmx", "mdrun",
            "-deffnm", "md/md",
            "-cpi", "md/md.cpt",
            "-append",
            "-v"
        ], capture_output=False)
    
    # Save checkpoint
    checkpoint_name = f"checkpoint_{end_ns}ns"
    checkpoint_dir = WORK_DIR / "checkpoints" / checkpoint_name
    checkpoint_dir.mkdir(exist_ok=True)
    
    for ext in [".cpt", ".gro", ".edr", ".log", ".xtc"]:
        src = WORK_DIR / "md" / f"md{ext}"
        if src.exists():
            shutil.copy(src, checkpoint_dir / f"md{ext}")
    
    # Also copy to Kaggle output for persistence
    output_checkpoint = OUTPUT_DIR / f"{CONFIG['complex_name']}_{checkpoint_name}"
    shutil.copytree(checkpoint_dir, output_checkpoint, dirs_exist_ok=True)
    
    print(f"\nüíæ Checkpoint saved: {checkpoint_name}")
    print(f"üìÅ Output: {output_checkpoint}")
    
    return True

def save_final_results():
    """Save final results to output."""
    final_dir = OUTPUT_DIR / f"{CONFIG['complex_name']}_final"
    final_dir.mkdir(exist_ok=True)
    
    # Copy all important files
    for src_dir in ["md", "analysis", "topol"]:
        src = WORK_DIR / src_dir
        if src.exists():
            shutil.copytree(src, final_dir / src_dir, dirs_exist_ok=True)
    
    print(f"\n‚úÖ Final results saved to: {final_dir}")

In [None]:
# ============================================
# RUN PRODUCTION MD (50 ns in 10 ns segments)
# ============================================

start_segment = 0
if CONFIG['resume']:
    start_segment = CONFIG['resume_from_ns'] // CONFIG['checkpoint_interval_ns']
    print(f"üì• Resuming from segment {start_segment + 1}")

total_start = time.time()

for segment in range(start_segment, NUM_SEGMENTS):
    segment_start = time.time()
    
    success = run_md_segment(segment, resume=(segment > start_segment or CONFIG['resume']))
    
    segment_time = time.time() - segment_start
    print(f"‚è±Ô∏è  Segment time: {segment_time/60:.1f} minutes")
    
    if not success:
        print(f"‚ùå Segment {segment + 1} failed!")
        break

total_time = time.time() - total_start
print(f"\n{'='*60}")
print(f"‚úÖ PRODUCTION MD COMPLETE!")
print(f"‚è±Ô∏è  Total time: {total_time/3600:.2f} hours")
print(f"{'='*60}")

save_final_results()

## üìä Analysis

In [None]:
%%bash
cd md

# RMSD - Protein backbone
echo "4 4" | gmx rms -s md.tpr -f md.xtc -o ../analysis/rmsd_backbone.xvg -tu ns

# RMSD - Ligand
echo "13 13" | gmx rms -s md.tpr -f md.xtc -o ../analysis/rmsd_ligand.xvg -tu ns 2>/dev/null || echo "Ligand RMSD skipped"

# RMSF
echo "4" | gmx rmsf -s md.tpr -f md.xtc -o ../analysis/rmsf.xvg -res

# Radius of gyration
echo "1" | gmx gyrate -s md.tpr -f md.xtc -o ../analysis/gyrate.xvg

# H-bonds (if ligand group exists)
echo "1 13" | gmx hbond -s md.tpr -f md.xtc -num ../analysis/hbond.xvg 2>/dev/null || echo "H-bond analysis skipped"

echo "‚úÖ Analysis complete!"

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def parse_xvg(filename):
    """Parse GROMACS XVG file."""
    data = []
    with open(filename, 'r') as f:
        for line in f:
            if not line.startswith(('#', '@')):
                values = [float(x) for x in line.split()]
                if values:
                    data.append(values)
    return np.array(data)

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle(f"{CONFIG['complex_name']} - MD Analysis (50 ns)", fontsize=14)

# RMSD
ax = axes[0, 0]
rmsd = parse_xvg('analysis/rmsd_backbone.xvg')
ax.plot(rmsd[:, 0], rmsd[:, 1], color='#2E86AB', label='Backbone')
if os.path.exists('analysis/rmsd_ligand.xvg'):
    rmsd_lig = parse_xvg('analysis/rmsd_ligand.xvg')
    ax.plot(rmsd_lig[:, 0], rmsd_lig[:, 1], color='#A23B72', label='Ligand')
ax.set_xlabel('Time (ns)')
ax.set_ylabel('RMSD (nm)')
ax.set_title('RMSD')
ax.legend()
ax.grid(True, alpha=0.3)

# RMSF
ax = axes[0, 1]
rmsf = parse_xvg('analysis/rmsf.xvg')
ax.plot(rmsf[:, 0], rmsf[:, 1], color='#2E86AB')
ax.fill_between(rmsf[:, 0], 0, rmsf[:, 1], alpha=0.3)
ax.set_xlabel('Residue')
ax.set_ylabel('RMSF (nm)')
ax.set_title('RMSF per Residue')
ax.grid(True, alpha=0.3)

# Radius of Gyration
ax = axes[1, 0]
gyrate = parse_xvg('analysis/gyrate.xvg')
ax.plot(gyrate[:, 0]/1000, gyrate[:, 1], color='#F18F01')
ax.set_xlabel('Time (ns)')
ax.set_ylabel('Rg (nm)')
ax.set_title('Radius of Gyration')
ax.grid(True, alpha=0.3)

# H-bonds
ax = axes[1, 1]
if os.path.exists('analysis/hbond.xvg'):
    hbond = parse_xvg('analysis/hbond.xvg')
    ax.plot(hbond[:, 0]/1000, hbond[:, 1], color='#48A9A6', alpha=0.7)
    ax.set_xlabel('Time (ns)')
    ax.set_ylabel('Number of H-bonds')
    ax.set_title('Protein-Ligand H-bonds')
else:
    ax.text(0.5, 0.5, 'H-bond data not available', ha='center', va='center')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(f'analysis/{CONFIG["complex_name"]}_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

# Print summary
print(f"\nüìä Analysis Summary for {CONFIG['complex_name']}:")
print(f"   Avg Backbone RMSD: {rmsd[-5000:, 1].mean():.3f} ¬± {rmsd[-5000:, 1].std():.3f} nm (last 10 ns)")
print(f"   Avg Rg: {gyrate[-5000:, 1].mean():.3f} nm")
if os.path.exists('analysis/hbond.xvg'):
    print(f"   Avg H-bonds: {hbond[:, 1].mean():.1f}")

In [None]:
# Save all analysis to output
shutil.copytree(WORK_DIR / "analysis", OUTPUT_DIR / f"{CONFIG['complex_name']}_analysis", dirs_exist_ok=True)
print(f"‚úÖ Analysis saved to {OUTPUT_DIR}")

## üì• Download Results

Results are automatically saved to `/kaggle/working/output/`

Check the **Output** tab in Kaggle to download.

In [None]:
# List output files
print("üìÅ Output files:")
for item in OUTPUT_DIR.iterdir():
    if item.is_dir():
        size = sum(f.stat().st_size for f in item.rglob('*') if f.is_file())
        print(f"  üìÇ {item.name} ({size/1024/1024:.1f} MB)")
    else:
        print(f"  üìÑ {item.name} ({item.stat().st_size/1024:.1f} KB)")

---

## ‚ö†Ô∏è Troubleshooting

### If Kaggle Times Out:
1. Check which checkpoint was saved in Output
2. Set `CONFIG['resume'] = True`
3. Set `CONFIG['resume_from_ns'] = <last_checkpoint>`
4. Re-run the notebook

### To Switch Complex:
1. Uncomment the alternative CONFIG at the top
2. Upload appropriate input files
3. Run all cells

---

*Notebook for Mahkota Dewa DN Study - MD Simulation*