<a href="https://colab.research.google.com/github/kimjc95/computational-chemistry/blob/main/Protein_MD_in_Colab_(ENG).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Protein MD simulation in Google Colab
2024-07-28 by Joo-Chan Kim at MSBL, KAIST

This is a Google Colaboratory Notebook for the all-atom molecular dynamics simulation of proteins (and nucleic acids) (BSD-3 license)

Please cite DOI:[10.5281/zenodo.13133762](https://doi.org/10.5281/zenodo.13133762) if you have used my code in your research.

If you have any problems, please raise issue in [GitHub](https://github.com/kimjc95/computational-chemistry/issues) or email me (kimjoochan@kaist.ac.kr).

-------------------------------------

**Things to Prepare : **

1. Protein/DNA/RNA PDB file or PDB ID

2. Google account (for saving jobs in Google Drive), Internet connection (or additional Google Colab computing resources)

In [1]:
#@markdown Run this cell to check the GPU availability.
import torch
if not torch.cuda.is_available():
    print("Please change to the GPU runtime. MD simulations are very slow without GPUs!")
else:
    print("Good to go!")

Good to go!


In [2]:
#@markdown Run this cell to install conda environment. The runtime will be restarted, so wait for it to reconnect.
!pip install -q condacolab
import condacolab
condacolab.install()

⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:10
🔁 Restarting kernel...


In [1]:
#@title #0. Install dependencies & Connect to the Google Drive
#@markdown (Takes ~ 2 min) Please authorize the connection to your Google Drive. MD Trajectory files are huge, and the Google Colab sessions are infamous for their instability.

import subprocess

print("Wait for the dependencies installation to complete...", end='')
subprocess.run("mamba install -c conda-forge numpy scikit-learn openmm pdbfixer mdtraj nglview plotly ipywidgets=7", shell=True)
subprocess.run("pip install pdb2pqr", shell=True)
print("done.")

from time import sleep
from pdbfixer import PDBFixer
import locale
import warnings
import threading
import numpy as np
from openmm import *
from openmm.app import *
from openmm.unit import *
from tqdm import tqdm_notebook
import os
import mdtraj as md
import nglview as nv
from sklearn.cluster import AgglomerativeClustering
from sklearn.decomposition import PCA
from collections import Counter
import plotly.graph_objects as go
import plotly.io as pio
from plotly.subplots import make_subplots
pio.renderers.default = "colab"
from google.colab import output, files, drive
output.enable_custom_widget_manager()
warnings.filterwarnings("ignore")
drive.flush_and_unmount()
drive.mount('/content/drive', force_remount=True)

Wait for the dependencies installation to complete...done.




Drive not mounted, so nothing to flush and unmount.
Mounted at /content/drive


In [2]:
#@title #1. Check working directory
#@markdown Type the name of the working directory inside your Google Drive.

#@markdown If there is no such directory, a new one will be created.

save_directory = "protein" #@param {type:"string"}

save_at = "/content/drive/MyDrive/"+save_directory

if os.path.isdir(save_at):
    print("There already is a directory named "+save_directory+" within the Google Drive.")
    if os.path.exists(save_at+"/MD_processed.xtc"):
        print("There is a finished MD simulation in the directory.")
        print("Change the directory name in order to preserve your works.")

    elif os.path.exists(save_at+"/settings.txt"):
        print("Current MD parameter settings")
        with open(save_at+"/settings.txt", 'r') as f:
            for line in f.readlines():
                print(line, end='')
        print("\nIf you want to change the MD settings, rerun the cell 3. Set the MD parameters")
        print("If you want to continue your simulation from where it has stopped, run the cell 4. Prepare Simulation.")

    elif os.path.exists(save_at+"/protein_with_H.pdb"):
        print("Protonated protein input file exists.")
        print("Proceed to 3. Set MD parameters.")

    else:
        print("The directory "+save_directory+" exists, but it is empty.")
        print("Proceed to the next cell to prepare your PDB input files.")

else:
    subprocess.run("mkdir "+save_at, shell=True)
    print("A new directory named "+save_directory+" is created within the Google Drive.")

save_at += '/'

A new directory named protein is created within the Google Drive.


In [4]:
#@title #2. Prepare the protein.pdb input file
#@markdown Type the 4-letter PDB ID of your protein. To upload your custom PDB file, leave it as blank.

PDB_ID = "" # @param {type:"string"}
##@markdown For non-standard residues that are defined in [RCSB Chemical Component Dictionary](https://www.wwpdb.org/data/ccd),
##@markdown type the 3-letter residue name below. For more than one non-standard residues, separate them using commas.
#non_standard_residues = "SEC" # @param {type:"string"}
#@markdown To remove any heterogens, select the checkbox below.
remove_Heterogens = False # @param {type:"boolean"}

if PDB_ID == "":
    pdbfile = files.upload()
    fixer = PDBFixer(filename=next(iter(pdbfile)))
else:
    fixer = PDBFixer(pdbid=PDB_ID)

"""
if non_standard_residues != "":
    ncAAs = non_standard_residues.split(',')
    for ncaa in ncAAs:
        fixer.downloadTemplate(ncaa)
"""
print("Seaching & adding missing heavy atoms...", end='')
fixer.findMissingResidues()
fixer.removeHeterogens(remove_Heterogens)
fixer.findMissingAtoms()
fixer.addMissingAtoms()
print("done.")
#fixer.addMissingHydrogens(pH)

with open(save_at+"protein.pdb", 'w') as f:
    PDBFile.writeFile(fixer.topology, fixer.positions, f)

output.enable_custom_widget_manager()
view1 = nv.NGLWidget()
view1._set_size('750px','500px')
view1.add_structure(nv.FileStructure(save_at+"protein.pdb"))
view1.add_representation("cartoon")
view1.add_licorice("protein or nucleic")
view1

Saving protein.pdb to protein.pdb
Seaching & adding missing heavy atoms...done.


NGLWidget()

In [None]:
#@title #3. Set MD parameters
#@markdown Set the parameters for your simulation and run this cell. Check the results below before proceeding to subsequent cells.
forcefield = "Amber14" #@param ["Amber14", "CHARMM36", "AMOEBA 2018"]
waterModel = "(implicit) OBC2 (igb=5)" #@param ["(explicit) TIP3P", "(explicit) SPC/E", "(explicit) TIP4P-Ew", "(explicit) TIP5P", "(explicit) OPC", "(explicit) OPC3", "(implicit) HCT (igb=1)", "(implicit) OBC1 (igb=2)", "(implicit) OBC2 (igb=5)", "(implicit) GBn (igb=7)", "(implicit) GBn2 (igb=8)"]
#@markdown Select the reference pH for the protonation of your residues.
pH = 7.4 # @param {type:"slider", min:0.0, max:14.0, step:0.1}
#@markdown (The ionic strength of the solution in mol/L.)
ion_conc = 0.15 #@param {type:"slider", min:0.0, max:2.0, step:0.05}
#@markdown For explicit water models, set the additional parameters below.

#@markdown (The thickness of the solvent pad around the protein in nanometers.)
padding = 1.5 #@param {type:"slider", min:1.0, max:5.0, step:0.1}
cation = "Na+" #@param ["Li+", "Na+", "K+", "Rb+", "Cs+"]
anion = "Cl-" #@param ["F-", "Cl-", "Br-", "I-"]
#@markdown Set below constraints / increase the hydrogen mass to allow larger integration step sizes.
rigidWater = True #@param {type:"boolean"}
constraints = "HBonds" #@param ["None", "HBonds", "AllBonds", "HAngles"]
hydrogenMass = 2 #@param {type:"slider", min:1.0, max:4.0, step:0.1}
#@markdown Set the precision of the calculation: (cost-efficient) single -> mixed -> double (high precision)
precision = "single" #@param ["single", "mixed", "double"]
#@markdown (Step size in femtoseconds. Number of steps will be determined from this).
step_size = 2 #@param {type:"slider", min:0.1, max:4, step:0.1}
#@markdown (Temperature of the system in **degrees Celsius**)
temperature = 25 #@param {type:"slider", min:0.0, max:100.0, step:1.0}
#@markdown (Pressure on the system in bar.)
pressure = 1.0 #@param {type:"slider", min:0.0, max:100.0, step:1.0}
#@markdown (Equilibrate the system prior to the production run for n picoseconds.)
eq_time = 100 #@param {type:"slider", min:0.0, max:1000.0, step:10.0}
#@markdown (Run the production MD simulation for n nanoseconds.)
run_time = 1 #@param {type:"slider", min:1, max:500.0, step:1}
#@markdown (Save the trajectory in every n picoseconds.)
save_interval = 10 #@param {type:"slider", min:1, max:50, step:1}
#@markdown (Backup the production run's progress to the GoogleDrive every n realtime minutes.
#@markdown Set it to smaller values if you are using free Colab sessions.)
backup_interval = 10 #@param {type:"slider", min:5, max:60, step:5}


if forcefield == "Amber14" and waterModel == "(explicit) TIP5P":
    print("ERROR : AMBER forcefield does not support TIP5P water model!")
    print("Please choose other options.")

elif forcefield == "CHARMM36" and waterModel.startswith("(explicit) OPC"):
    print("ERROR : CHARMM forcefield does not support OPC water models!")
    print("Please choose other options.")

elif forcefield == "CHARMM36" and padding < 1.2:
    print("ERROR : In CHARMM forcefield, the minimum padding distance should be larger than the cutoff distance: 1.2 nanometers!")
    print("Please choose other options.")

else:
    if forcefield == "AMOEBA 2018":
        print("WARNING : No constraints should be applied in AMOEBA, so the your choices in contraints and rigidWater do not count.")
        if waterModel.startswith("(explicit)"):
            print("WARNING : There is only one explicit water model available in AMOEBA, so your choice in waterModel does not count.")
        if hydrogenMass == 1.0 and step_size > 0.5:
            print(f"WARNING : Since there are no applied constraints, your simulation with step size of {step_size} fs may blow up!")
        elif hydrogenMass < 2.0 and step_size > 2.0:
            print(f"WARNING : SInce there are no applied constraints, your simulation with step size of {step_size} fs may blow up.")

    if waterModel.startswith("(implicit)"):
        print("WARNING : In an implicit solvent condition, your choices in padding, ion_conc, cation, anion, rigidWater, and pressure do not count.")

    if hydrogenMass == 1.0 and not rigidWater and constraints == 'None' and step_size > 0.5:
        print(f'WARNING : You did not set any constraints. With no constraints and the step size of {step_size} fs, your simulation may blow up!')

    elif hydrogenMass == 1.0 and constraints == 'HBonds' and step_size > 2.0:
        print(f'WARNING : With step size of {step_size} fs, your simulation may blow up. Increase hydrogenMass or change constraints into HAngles.')

    # Setting the input parameters for PDB2PQR
    if waterModel.startswith("(explicit)"):
        if forcefield == 'Amber14':
            ffin = "AMBER"
        elif forcefield == 'CHARMM36':
            ffin = "CHARMM"
        else:
            ffin = "PARSE"
    else:
        ffin = "PARSE"

    # For implicit solvent systems, PARSE forcefield is recommended
    # For the compatibility with Modeller, set --ffout as AMBER
    params = f' --ff {ffin} --ffout AMBER --pdb-output {save_at}protein_with_H.pdb'
    params += f' --titration-state-method propka --with-ph {pH} --pH {pH} --drop-water'
    params += f' {save_at}protein.pdb protein_with_H.pqr'

    print("Adding hydrogens to the protein.pdb...", end='')
    subprocess.run("pdb2pqr"+params, shell=True)

    if not os.path.exists(save_at+"protein_with_H.pdb"):
        print("PDB2PQR job failed.")
        print("Using PDBfixer instead...", end='')
        fixer = PDBFixer(filename=save_at+"protein.pdb")
        fixer.addMissingHydrogens(pH)
        with open(save_at+"protein_with_H.pdb", 'w') as f:
            PDBFile.writeFile(fixer.topology, fixer.positions, f)

    print("done.")

    view2 = nv.NGLWidget()
    view2._set_size('750px','500px')
    view2.add_structure(nv.FileStructure(save_at+"protein_with_H.pdb"))
    view2.add_representation("cartoon")
    view2.add_licorice("protein or nucleic")
    display(view2)

    with open(save_at+"settings.txt", 'w') as f:
        f.write("forcefield : "+forcefield)
        f.write("\nconstraints : "+constraints)
        f.write("\npadding : "+str(padding))
        f.write("\nion_conc : "+str(ion_conc))
        f.write("\ncation : "+cation)
        f.write("\nanion : "+anion)
        f.write("\nhydrogenMass : "+str(hydrogenMass))
        f.write("\nwaterModel : "+waterModel)
        f.write("\nrigidWater : "+str(rigidWater))
        f.write("\nstep_size : "+str(step_size))
        f.write("\ntemperature : "+str(temperature))
        f.write("\npressure : "+str(pressure))
        f.write("\neq_time : "+str(eq_time))
        f.write("\nrun_time : "+str(run_time))
        f.write("\nsave_interval : "+str(save_interval))
        f.write("\nbackup_interval : "+str(backup_interval))
        f.write("\nprecision : "+precision)
    print("Settings saved.")
    print("Now run the below cell 4. Prepare Simulation.")

Adding hydrogens to the protein.pdb...PDB2PQR job failed.
Using PDBfixer instead...done.


NGLWidget()

Settings saved.
Now run the below cell 4. Prepare Simulation.


In [None]:
#@title #4. Prepare Simulation

def read_settings(save_at):
    """
    Reads the parameter values from the setting.txt file.
    Returns the settings dictionary.
    """
    if not os.path.exists(save_at+"settings.txt"):
        return None

    with open(save_at+"settings.txt", 'r') as f:
        for line in f.readlines():
            if line.startswith("forcefield"):
                forcefield = line.split(":")[1].strip()
            elif line.startswith("constraints"):
                constraints = line.split(":")[1].strip()
            elif line.startswith("padding"):
                padding = float(line.split(":")[1].strip())*nanometers
            elif line.startswith("ion_conc"):
                ion_conc = float(line.split(":")[1].strip())*molar
            elif line.startswith("cation"):
                cation = line.split(":")[1].strip()
            elif line.startswith("anion"):
                anion = line.split(":")[1].strip()
            elif line.startswith("hydrogenMass"):
                hydrogenMass = float(line.split(":")[1].strip())*amu
            elif line.startswith("waterModel"):
                waterModel = line.split(":")[1].strip()
            elif line.startswith("rigidWater"):
                rigidWater = bool(line.split(":")[1].strip())
            elif line.startswith("step_size"):
                step_size = float(line.split(":")[1].strip())*femtoseconds
            elif line.startswith("temperature"):
                temperature = (float(line.split(":")[1].strip())+273.15)*kelvin
            elif line.startswith("pressure"):
                pressure = (float(line.split(":")[1].strip()))*bar
            elif line.startswith("eq_time"):
                eq_time = float(line.split(":")[1].strip())*picoseconds
            elif line.startswith("run_time"):
                run_time = float(line.split(":")[1].strip())*nanoseconds
            elif line.startswith("save_interval"):
                save_interval = float(line.split(":")[1].strip())*picoseconds
            elif line.startswith("backup_interval"):
                backup_interval = float(line.split(":")[1].strip())*60
            elif line.startswith("precision"):
                precision = line.split(":")[1].strip()

    settings = {'forcefield':forcefield, 'constraints':constraints, 'padding':padding,
                'ion_conc':ion_conc, 'cation':cation, 'anion':anion, 'hydrogenMass':hydrogenMass,
                'waterModel':waterModel, 'rigidWater':rigidWater, 'step_size':step_size,
                'temperature':temperature,'pressure':pressure, 'eq_time':eq_time,
                'run_time':run_time, 'save_interval':save_interval, 'backup_interval':backup_interval,
                'precision':precision}

    return settings


def prepare_simulation(save_at, status):

    """
    Prepares the system object. Adds solvent box if neccessary.
    Creates the simulation object and minimizes it.
    Returns the NGLViewer widget.
    """

    s = read_settings(save_at)

    if s['forcefield'] == "Amber14":
        if s['waterModel'] == "(explicit) TIP3P":
            ff = ForceField('amber14-all.xml', 'amber14/tip3p.xml')
            water = 'tip3p'
        elif s['waterModel'] == "(explicit) SPC/E":
            ff = ForceField('amber14-all.xml', 'amber14/spce.xml')
            water = 'spce'
        elif s['waterModel'] == "(explicit) TIP4P-Ew":
            ff = ForceField('amber14-all.xml', 'amber14/tip4pew.xml')
            water = 'tip4pew'
        elif s['waterModel'] == "(explicit) TIP5P":
            ff = ForceField('amber14-all.xml', 'tip5p.xml')
            water = 'tip5p'
        elif s['waterModel'] == "(explicit) OPC":
            ff = ForceField('amber14-all.xml', 'amber14/opc.xml')
            water = 'tip4pew'
        elif s['waterModel'] == "(explicit) OPC3":
            ff = ForceField('amber14-all.xml', 'amber14/opc3.xml')
            water = 'tip3p'
        elif s['waterModel'] == "(implicit) HCT (igb=1)":
            ff = ForceField('amber14-all.xml', 'implicit/hct.xml')
        elif s['waterModel'] == "(implicit) OBC1 (igb=2)":
            ff = ForceField('amber14-all.xml', 'implicit/obc1.xml')
        elif s['waterModel'] == "(implicit) OBC2 (igb=5)":
            ff = ForceField('amber14-all.xml', 'implicit/obc2.xml')
        elif s['waterModel'] == "(implicit) GBn (igb=7)":
            ff = ForceField('amber14-all.xml', 'implicit/gbn.xml')
        elif s['waterModel'] == "(implicit) GBn2 (igb=8)":
            ff = ForceField('amber14-all.xml', 'implicit/gbn2.xml')
        else:
            print("Error in reading the Amber14 forcefield parameters!")
            return None

    elif s['forcefield'] == "CHARMM36":
        if s['waterModel'] == "(explicit) TIP3P":
            ff = ForceField('charmm36.xml', 'charmm36/water.xml')
            water = 'tip3p'
        elif s['waterModel'] == "(explicit) SPC/E":
            ff = ForceField('charmm36.xml', 'charmm36/spce.xml')
            water = 'spce'
        elif s['waterModel'] == "(explicit) TIP4P-Ew":
            ff = ForceField('charmm36.xml', 'charmm36/tip4pew.xml')
            water = 'tip4pew'
        elif s['waterModel'] == "(explicit) TIP5P":
            ff = ForceField('charmm36.xml', 'charmm36/tip5p.xml')
            water = 'tip5p'
        elif s['waterModel'] == "(explicit) OPC":
            ff = ForceField('charmm36.xml', 'opc.xml')
            water = 'tip4pew'
        elif s['waterModel'] == "(explicit) OPC3":
            ff = ForceField('charmm36.xml', 'opc3.xml')
            water = 'tip3p'
        elif s['waterModel'] == "(implicit) HCT (igb=1)":
            ff = ForceField('charmm36.xml', 'implicit/hct.xml')
        elif s['waterModel'] == "(implicit) OBC1 (igb=2)":
            ff = ForceField('charmm36.xml', 'implicit/obc1.xml')
        elif s['waterModel'] == "(implicit) OBC2 (igb=5)":
            ff = ForceField('charmm36.xml', 'implicit/obc2.xml')
        elif s['waterModel'] == "(implicit) GBn (igb=7)":
            ff = ForceField('charmm36.xml', 'implicit/gbn.xml')
        elif s['waterModel'] == "(implicit) GBn2 (igb=8)":
            ff = ForceField('charmm36.xml', 'implicit/gbn2.xml')
        else:
            print("Error in reading the CHARMM36 forcefield parameters!")
            return None

    elif s['forcefield'] == "AMOEBA 2018":
        if s['waterModel'].startswith("(explicit)"):
            ff = ForceField('amoeba2018.xml')
            water = 'tip3p'
        elif s['waterModel'].startswith("(implicit)"):
            ff = ForceField('amoeba2018.xml', 'amoeba2018_gk.xml')
        else:
            print("Error in reading the AMOEBA forcefield parameters!")
            return None

    else:
        print("Error in reading the forcefield parameters!")
        return None

    pdb = PDBFile(save_at+"protein_with_H.pdb")
    model = Modeller(pdb.topology, pdb.positions)

    if waterModel.startswith("(explicit)"):
        print("Adding solvent...", end='')

        if s['forcefield'] == "AMOEBA 2018":
            # AMOEBA forcefields somehow does not support NonbondedForce which is required to set the box dimensions from the padding distance.
            # As a detour, create the solvated system with Amber forcefield and then use that topology..
            ff2 = ForceField('amber14-all.xml', 'amber14/tip3p.xml')
            model2 = Modeller(pdb.topology, pdb.positions)
            model2.addExtraParticles(ff)
            model2.addSolvent(ff2, model='tip3p', padding=s['padding'], positiveIon=s['cation'],
                              negativeIon=s['anion'], ionicStrength=s['ion_conc'], neutralize=True)
            model = Modeller(model2.topology, model2.positions)
            model.addExtraParticles(ff)

        else:
            model.addExtraParticles(ff)
            model.addSolvent(ff, model=water, padding=s['padding'], positiveIon=s['cation'],
                             negativeIon=s['anion'], ionicStrength=s['ion_conc'], neutralize=True)
        print("done.")

        print("Creating system...", end='')

        if s['forcefield'] == "Amber14":
            system = ff.createSystem(model.topology, nonbondedMethod=PME,
                                 nonbondedCutoff=1.0*nanometer, constraints=s['constraints'],
                                 rigidWater=s['rigidWater'], removeCMMotion=True,
                                 hydrogenMass=s['hydrogenMass'])
        elif s['forcefield'] == "CHARMM36":
            # CHARMM forcefield recommends the use of switch distance at 1.0 nm and cutoff distance at 1.2 nm.
            system = ff.createSystem(model.topology, nonbondedMethod=PME,
                                     nonbondedCutoff=1.2*nanometer, constraints=s['constraints'],
                                     rigidWater=s['rigidWater'], removeCMMotion=True,
                                     switchDistance=1.0*nanometer,
                                     hydrogenMass=s['hydrogenMass'])
        else:
            # AMOEBA forcefields do not support constraints
            system = ff.createSystem(model.topology, nonbondedMethod=PME,
                                     nonbondedCutoff=1.0*nanometer, vdwCutoff=1.2*nanometer,
                                     constraints="None",
                                     rigidWater=False, removeCMMotion=True,
                                     polarization='extrapolated',
                                     hydrogenMass=s['hydrogenMass'])

    else:
        print("Creating system...", end='')

        # Set solvent dielectric constant according to the empirical values
        """ Source : Hasted, J.B., Ritson, D.M., Collie, C.H.,
                    "Dielectric Properties of Aqueous Ionic Solutions. Parts I and II."
                    Journal of Chemical Physics 16, 1 (1948).
                    http://dx.doi.org/10.1063/1.1746645
        """

        if s['forcefield'] == "AMOEBA 2018":
            if s['cation'] == 'Li+':
                dcation = -11
            elif s['cation'] == 'Na+' or s['cation'] == 'K+':
                dcation = -8
            elif s['cation'] == 'Rb+' or s['cation'] == 'Cs+':
                dcation = -7

            if s['anion'] == 'F-':
                danion = -5
            elif s['anion'] == 'Cl-' or s['anion'] == 'Br-':
                danion = -3
            elif s['anion'] == 'I-':
                danion = -7

            dielectric = 80 + (dcation+danion)*s['ion_conc']/molar
            model.addExtraParticles(ff)
            system = ff.createSystem(model.topology, nonbondedMethod=NoCutoff,
                                     soluteDielectric=1.0, solventDielectric=dielectric,
                                     constraints="None", polarization="extrapolated",
                                     hydrogenMass=s['hydrogenMass'])
        else:
            # for other forcefields, setting the kappa value is possible
            # kappa is the inverse of Debye length.
            kappa = 367.434915*sqrt(s['ion_conc']/molar/78.5/(s['temperature']/kelvin))/nanometer
            model.addExtraParticles(ff)
            system = ff.createSystem(model.topology, nonbondedMethod=NoCutoff,implicitSolventKappa=kappa,
                                     constraints=s['constraints'], hydrogenMass=s['hydrogenMass'])

    # save the system as a serialized XML file.
    with open(save_at+'system.xml', 'w') as f:
        f.write(XmlSerializer.serialize(system))

    print("done.")
    print("Creating simulation...", end='')
    integrator = LangevinMiddleIntegrator(s['temperature'],1/picosecond, s['step_size'])

    platform = Platform.getPlatformByName('CUDA')
    properties = {'Precision': s['precision']}

    simulation = Simulation(model.topology, system, integrator, platform, properties)
    print("done.")

    print("Minimizing...", end='')
    simulation.context.setPositions(model.positions)
    simulation.minimizeEnergy(tolerance=0.001*kilojoules/(nanometer*mole))
    positions = simulation.context.getState(getPositions=True).getPositions()
    with open(save_at+"minimized.pdb", 'w') as f:
        PDBFile.writeFile(simulation.topology, positions, f)
    print("done.")

    w = nv.NGLWidget()
    w._set_size('750px','500px')
    w.add_component(nv.FileStructure(save_at+"minimized.pdb"))
    w.add_cartoon("protein or nucleic")
    if s['waterModel'].startswith("(explicit)"):
        w.add_point("water")
        w.add_spacefill("ion")
    w.center()

    # Initialize the simulation context
    simulation.context.setVelocitiesToTemperature(s['temperature'])
    simulation.context.setStepCount(0)
    simulation.context.setTime(0)
    simulation.saveCheckpoint(save_at+"checkpoint.chk")

    # Save the simulation context as a callable object in order to modify it on-the-fly
    status[4] = simulation

    if eq_time == 0:
        status[2] = "production"
        status[1] = "prepared"
        print("Go to the MD production.")
    else:
        status[2] = "NVT"
        status[1] = "prepared"
        print("Proceed to the equilibration step.")

    return w


def update_status(save_at, status):
    """
    Updates the status variable, which contains the current status info of the simulation.
    Returns True for the started job
    """
    filelist = os.listdir(save_at[:-1])

    job_flag = [False, False, False]

    # Search the working directory for the trajectory files.
    for f in filelist:
        name = f.split('.')[0]
        ext = f.split('.')[-1]

        if ext == "xtc":
            if name.startswith("MD"):
                job_flag[2] = True
            elif name.startswith("NPT"):
                job_flag[1] = True
            elif name.startswith("NVT"):
                job_flag[0] = True

    if job_flag[2]:
        job = "production"
    elif job_flag[1]:
        job = "NPT"
    elif job_flag[0]:
        job = "NVT"
    else:
        job = None

    stride_flag = 0

    if job == "production":
        for f in filelist:
            name = f.split('.')[0]
            ext = f.split('.')[-1]
            if ext == "xtc" and name.startswith("MD"):
                stride = int(name.split("_")[-1])
                if stride > stride_flag:
                    stride_flag = stride

    status[2] = job
    status[3] = stride_flag

    if job is None:
        return False
    else:
        return True



def resume_simulation(save_at, status):
    """
    Resumes the simulation from the checkpoint.chk file.
    Returns True for the resumed job.
    """
    s = read_settings(save_at)

    status[0] = False

    if os.path.exists(save_at+"checkpoint.chk"):
        with open(save_at+'system.xml', 'r') as f:
            system = XmlSerializer.deserialize(f.read())

        # Somehow resuming from the state.xml does not work with MonteCarloBarostat.
        # Resumes from the checkpoint.chk and it is best to hope the device settings have not changed. :(
        integrator = LangevinMiddleIntegrator(s['temperature'],1/picosecond, s['step_size'])
        platform = Platform.getPlatformByName('CUDA')
        properties = {'Precision': s['precision']}
        pdb = PDBFile(save_at+"minimized.pdb")
        status[4] = Simulation(pdb.topology, system, integrator, platform, properties)

        print("Resuming from the checkpoint...", end='')
        status[4].loadCheckpoint(save_at+"checkpoint.chk")
        print("done.")

        if status[2] == "NVT" or status[2] == "NPT":
            total_steps = int(s['eq_time']/s['step_size'])
        else:
            total_steps = int(s['run_time']/s['step_size'])

        if status[4].context.getStepCount() < total_steps:
            status[1] = "running"
            return True

        else:
            status[1] = "finished"
            return False

    else:
        print("Checkpoint file not found!")
        print("Unfortunately, you should erase the .xtc trajectory files and start over.")
        return False


# When returning from the connection losses, check whether variables are alive.
if 'save_at' not in vars():
    print("Save location not set.")
    print("Run the cell 1. Check the working directory and follow its instructions.")

else:
    s =  read_settings(save_at)

    if s is None:
        print("Settings not set!")
        print("Run the cell 3. Set MD parameters above and rerun this cell.")

    else:
        # status variable
        # status[0] : checks for the halt
        # status[1] : shows the current state of simultion (prepared, started, running, or finished)
        # status[2] : shows the current stage of simulation (NVT, NPT, or production)
        # status[3] : shows the current stride of the production run (MD production trajectory files are splitted into strides in case of disconnections)
        # status[4] : simulation object

        status = [False, "prepared", None, 0, None]

        if not update_status(save_at, status):
            print("No preexisting trajectory files detected. Starting from the scratch...")
            view3 = prepare_simulation(save_at, status)
            if view3 is None:
                print("Simulation settings not set!")
                print("Change the settings above and rerun this cell.")
            else:
                display(view3)

        else:
            if resume_simulation(save_at, status):
                if status[2] == "NVT":
                    print("Proceed to the cell 5. NVT equilibrate to continue NVT equilibration step.")
                elif status[2] == "NPT":
                    print("Proceed to the cell 6. NPT equilibrate to continue NPT equilibration step.")
                else:
                    print("Proceed to the cell 7. Production Run to continue the MD production run.")
            else:
                if status[2] == "NVT" and s['waterModel'].startswith("(explicit)"):
                    print("Proceed to the cell 6. NPT equilibrate to start the NPT equilibration.")
                elif status[2] == "NPT" or (status[2] == "NVT" and s['waterModel'].startswith("(implicit)")):
                    print("Proceed to the cell 7. Production Run to start the MD production run.")
                else:
                    print("All MD simulation steps have finished.")
                    print("Proceed to the cell 8. Analysis.")

No preexisting trajectory files detected. Starting from the scratch...
Creating system...done.
Creating simulation...done.
Minimizing...done.
Proceed to the equilibration step.


NGLWidget()

In [None]:
#@title #5. NVT equilibrate

def NVT_equilibrate(save_at, status):
    """
    Runs NVT equilibration.
    Returns True if the equilibration has been performed.
    """

    if status[2] == "NVT" and status[1] == "finished":
        print("NVT equilibration already finished.")
        return False

    elif status[2] == "NPT" or status[2] == "production":
        print("Steps subsequent to the NVT equilibration already running.")
        return False

    s = read_settings(save_at)

    total_steps = int(s['eq_time']/s['step_size'])

    if total_steps == 0:
        print("eq_time = 0, so no equilibration is performed.")
        return False

    step = status[4].context.getStepCount()

    if step >= total_steps:
        print("NVT equilibration has already been run through eq_time.")
        print("Increase the eq_time or proceed to the next step.")
        return False

    status[4].reporters.clear()

    logger = StateDataReporter(status[2]+"_log.txt", 10, step=True,
                               potentialEnergy=True, kineticEnergy=True,
                               temperature=True, separator=',')

    save = int(s['save_interval']/s['step_size'])

    if s['waterModel'].startswith("(explicit)"):
        box = True
    else:
        box = False

    if os.path.exists(save_at+status[2]+".xtc"):
        append = True
    else:
        append = False

    xtc = XTCReporter(save_at+status[2]+".xtc", save, append=append,
                      enforcePeriodicBox=box)

    status[4].reporters.append(logger)
    status[4].reporters.append(xtc)

    if step == 0:
        print("Starting the NVT equilibration...")
    else:
        print("Continuing the NVT equilibration...")

    status[1] = "running"
    for i in tqdm_notebook(range(total_steps-step)):
        status[4].step(1)

    status[4].saveCheckpoint(save_at+"checkpoint.chk")

    print("NVT equilibration finished.")
    status[1] = "finished"

    return True

if NVT_equilibrate(save_at, status):
    time = []
    temp = []
    s = read_settings(save_at)

    with open("NVT_log.txt", 'r') as f:
        for line in f.readlines():
            if line.startswith("#"):
                continue
            line = line.strip().split(',')
            step = float(line[0])
            time.append(step*s['step_size']/picoseconds)
            temp.append(round(float(line[3]),2))

    target_temp = round(s['temperature']/kelvin,2)

    fig1 = go.Figure()
    fig1.add_trace(go.Scatter(x=time, y=temp, mode='lines', name='Temperature'))
    fig1.add_hline(y=target_temp, annotation_text="target T : "+str(target_temp)+" K")
    fig1.update_layout(title='NVT equilibration', xaxis_title='Time (ps)', yaxis_title='Temperature (K)', width=750, height=500)
    fig1.show()

    print("Check whether the system's apparent temperature stabilizes (flattens).")
    print("If not, go back to the cell 3. Set MD parameters and increase the eq_time.")
    print("If yes, proceed to the next cell.")
    print("You may zoom, pan, or download the above interactive widget.")

else:
    print("Proceed to the next cell.")

Starting the NVT equilibration...


  0%|          | 0/50000 [00:00<?, ?it/s]

NVT equilibration finished.


Check whether the system's apparent temperature stabilizes (flattens).
If not, go back to the cell 3. Set MD parameters and increase the eq_time.
If yes, proceed to the next cell.
You may zoom, pan, or download the above interactive widget.


In [None]:
#@title #6. NPT equilibrate

def NPT_equilibrate(save_at, status):
    """
    Runs NPT equilibration.
    Returns True if the equilibration has been performed.
    """

    if status[2] == "NVT" and status[1] != "finished":
        print("Finish the NVT equilibration first!")
        return False

    elif status[2] == "NPT" and status[1] == "finished":
        print("NPT equilibration already finished.")
        print("Proceed to the next cell.")
        return False

    elif status[2] == "production":
        print("Production run is already running.")
        print("Proceed to the next cell.")
        return False

    s = read_settings(save_at)

    if s['waterModel'].startswith("(implicit)"):
        print("No need to NPT equilibrate in implicit solvent model!")
        print("Proceed to the next cell.")
        return False

    total_steps = int(s['eq_time']/s['step_size'])

    if total_steps == 0:
        print("eq_time = 0, so no equilibration is performed.")
        print("Proceed to the next cell.")
        return False

    if status[2] == "NVT" and status[1] == "finished":
        print("Starting the NPT equilibration...")
        status[2] = "NPT"
        status[4].context.setStepCount(0)
        status[4].context.setTime(0)
        force = MonteCarloBarostat(s['pressure'], s['temperature'])
        status[4].system.addForce(force)
        # The context should be reinitialized after adding a force.
        status[4].context.reinitialize(preserveState=True)
        # Update the system.xml file.
        with open(save_at+'system.xml', 'w') as f:
            f.write(XmlSerializer.serialize(status[4].system))
        append = False
    else:
        print("Continuing the NPT equilibration...")
        append = True

    status[4].reporters.clear()
    step = status[4].context.getStepCount()

    if step >= total_steps:
        print("NPT equilibration has already been run through eq_time.")
        print("Increase the eq_time or proceed to the next cell.")
        return False

    save = int(s['save_interval']/s['step_size'])

    logger = StateDataReporter(status[2]+"_log.txt", 10, potentialEnergy=True,
                               kineticEnergy=True, temperature=True,
                               volume=True, density=True, step=True)

    xtc = XTCReporter(save_at+status[2]+".xtc", save, append=append,
                      enforcePeriodicBox=True)

    status[4].reporters.append(logger)
    status[4].reporters.append(xtc)

    status[1] = "started"
    for i in tqdm_notebook(range(total_steps-step)):
        status[4].step(1)

    status[4].saveCheckpoint(save_at+"checkpoint.chk")
    print("NPT equilibration finished.")
    status[1] = "finished"

    return True

if NPT_equilibrate(save_at, status):
    time = []
    temp = []
    dens = []

    with open("NPT_log.txt", 'r') as f:
        for line in f.readlines():
            if line.startswith("#"):
                continue
            line = line.strip().split(',')
            step = float(line[0])
            time.append(step*s['step_size']/picoseconds)
            temp.append(round(float(line[3]),2))
            dens.append(round(float(line[5]),5))

    target_temp = round(s['temperature']/kelvin,2)

    fig2 = make_subplots(specs=[[{"secondary_y":True}]])
    fig2.add_trace(go.Scatter(x=time, y=temp, mode='lines', name='Temperature'), secondary_y=False)
    fig2.add_hline(y=target_temp, annotation_text="target T : "+str(target_temp)+" K")
    fig2.add_trace(go.Scatter(x=time, y=dens, mode='lines', name='Density'), secondary_y=True)
    fig2.update_layout(title='NPT equilibration', width=750, height=500)
    fig2.update_xaxes(title_text="Time (ps)")
    fig2.update_yaxes(title_text="Temperature (K)", secondary_y=False)
    fig2.update_yaxes(title_text="Density (g/mL)", secondary_y=True)
    fig2.show()

    print("Check whether the system's apparent temperature stabilizes to the desired value.")
    print("Check also the system's density stabilizes to a value near 1.04 g/mL for physiological conditions.")
    print("If not, go back to the cell 3. Set MD parameters and increase the eq_time.")
    print("If yes, proceed to the next cell.")

No need to NPT equilibrate in implicit solvent model!
Proceed to the next cell.


In [None]:
#@title #7. Production Run
#@markdown Run this cell and then the one below to visulaize the progress of the MD production run.

#@markdown It's best to keep one cell running to maintain your Colab runtime!

#@markdown **Warning: visualizing MD snapshots take up a lot of memory for large systems!**

#@markdown First set the update_period to larger values and try decreasing.

#@markdown Set the update_period to 0 seconds to turn off the visualization.

#@markdown There can be glitches in snapshots from time to time. Your trajectory would be fine though.

#@markdown No matter of the update_period, the info text showing the simulation progress will be renewed every second.

def check_thread():
    """
    Checks whether the simulation is running in the background.
    Returns True if the simulation is running.
    """
    warnings.filterwarnings("ignore")
    simulationIsRunning = False
    for t in threading.enumerate():
        if t.name == "MD":
            simulationIsRunning = True
            break
    return simulationIsRunning


def run_simulation(save_at, total_steps, box, save_period, backup_interval, status):
    """
    Runs the MD simulation.
    Generates a stride of trajectory.xtc file for each backup_intervals.
    """
    while not status[0]:
        current_step = status[4].context.getStepCount()

        if current_step == 0 or not os.path.exists("MD_log.txt"):
            if current_step == 0:
                status[1] = "started"
            elif current_step < total_steps:
                status[1] = "running"
            else:
                status[1] = "finished"
                print("Simulation length reached the given value!")
                break

            logger = StateDataReporter("MD_log.txt", 50, append=False, step=True,
                                       potentialEnergy=True, temperature=True,
                                       progress=True, remainingTime=True,
                                       speed=True, elapsedTime=True,
                                       totalSteps=total_steps)

        elif current_step < total_steps:
            status[1] = "running"
            logger = StateDataReporter("MD_log.txt", 50, append=True, step=True,
                                       potentialEnergy=True, temperature=True,
                                       progress=True, remainingTime=True,
                                       speed=True, elapsedTime=True,
                                       totalSteps=total_steps)

        else:
            status[1] = "finished"
            print("Simulation finished.")
            break


        xtc = XTCReporter(save_at+"MD_"+str(status[3])+".xtc", save_period, append=False,
                          enforcePeriodicBox=box)

        status[4].reporters.clear()
        status[4].reporters.append(logger)
        status[4].reporters.append(xtc)

        # Run for backup_interval minutes and then save the progress
        for i in range(int(backup_interval)):
            status[4].runForClockTime(1*second)
            if status[0]:
                break
            elif status[4].context.getStepCount() >= total_steps:
                break

        status[4].saveCheckpoint(save_at+"checkpoint.chk")
        status[3] += 1

    return


def production_run(save_at, status):
    """
    Performs the production run.
    Returns True if the run has been performed.
    """
    if status[1] != "finished" and status[2] != "production":
        print("Finish the equilibration steps first!")
        return False

    elif status[1] == "finished" and status[2] == "production":
        print("Production run already finished.")
        print("Proceed to 8. Analysis.")
        return False

    elif check_thread():
        print("Simulation is already running in the background!")
        return True

    elif status[0]:
        print("Restarting the halted simulation.")
        resume_simulation(save_at, status)
        s = read_settings(save_at)

        total_steps = int(s['run_time']/s['step_size'])
        save_period = int(s['save_interval']/s['step_size'])

        if s['waterModel'].startswith("(explicit)"):
            box = True
        else:
            box = False

        status[2] = "production"
        status[1] = "running"

        # Enables the background calculation process
        # It is impossible to run two or more cells simultaneously in Colab, so this was the solution I came up with..
        t = threading.Thread(name="MD", target=run_simulation,
                             args=(save_at, total_steps, box,
                                   save_period, s['backup_interval'], status,))
        t.setDaemon(False)
        t.start()
        return True

    else:
        if status[1] == "finished" and status[2] != "production":
            status[4].context.setStepCount(0)
            status[4].context.setTime(0)

        s = read_settings(save_at)

        total_steps = int(s['run_time']/s['step_size'])
        save_period = int(s['save_interval']/s['step_size'])

        if s['waterModel'].startswith("(explicit)"):
            box = True
        else:
            box = False

        status[2] = "production"
        status[1] = "running"

        t = threading.Thread(name="MD", target=run_simulation,
                             args=(save_at, total_steps, box,
                                   save_period, s['backup_interval'], status,))
        t.setDaemon(False)
        t.start()
        print("Simulation is now running in the background.")
        return True

if production_run(save_at, status):
    view4 = nv.NGLWidget()
    view4._set_size('750px','500px')
    view4.add_structure(nv.FileStructure(save_at+"minimized.pdb"), defaultRepresentation=False)
    display(view4)

Simulation is now running in the background.


NGLWidget()

In [None]:
update_period = 60 #@param {type:"slider", min:0, max:60, step:5}
#@markdown Check the box below and run this cell to halt the simulation.
halt_simulation = False #@param {type:"boolean"}

def display_simulation(save_at, status, w, t, halt):
    """
    Displays the current state/snapshot of the simulation.
    """
    status[0] = halt

    if halt:
        print("Simulation halted by the User.")
        print("To resume the simulation, run the above cell again.")
        return

    if not check_thread():
        print("Simulation not running!")
        return

    s = read_settings(save_at)

    if s['waterModel'].startswith("(explicit)"):
        box = True
    else:
        box = False

    # Reads the MD_log.txt file and reports the tail
    log = open("MD_log.txt", 'r')
    log.seek(0,2)
    wait = 0
    counter = 0

    w.clear_representations()
    w.add_cartoon("protein or nucleic", lowResolution=True, antialias=False)
    if box:
        w.add_spacefill("ion", lowResolution=True, antialias=False)
        w.add_point("water", lowResolution=True, antialias=False)

    try:
        while True:
            if status[1] == "finished":
                print("\tSimulation finished.")
                print("Proceed to the 8. Analysis.")
                break

            where = log.tell()
            line = log.readline()

            if not line:
                sleep(0.1)
                wait += 1
                log.seek(where)
                if wait > 50:
                    print("Simulation not responding.")
                    break

            elif line.startswith("#"):
                continue

            elif not line.endswith("\n"):
                continue

            else:
                data = line.strip().split(',')
                if len(data) < 7:
                    continue
                print(f'{data[0]:>6s} done, {data[6]} left. Step : {data[1]}, Potential Energy : {float(data[2]):.2f} kJ/mol, Temperature : {(float(data[3])-273.15):.2f} C, Speed : {data[4]} ns/day', end='')
                wait = 0
                counter += 1

                if t != 0:
                    # Update the snapshot every t seconds.
                    if counter % t == 0:
                        state = status[4].context.getState(getPositions=True, enforcePeriodicBox=box)
                        position = state.getPositions(asNumpy=True)
                        w.set_coordinates({0:position*10})
                        w.update_representation(component=0)
                        w.center()

                log.seek(0,2)
                sleep(1)
                # Renew the output
                print(" ", end='\r')

    except KeyboardInterrupt:
        print("\tStopped viewing by User.")
        log.close()
        return

    log.close()
    return


if 'save_at' in vars():
    display_simulation(save_at, status, view4, update_period, halt_simulation)
else:
    print("Rerun from the cell 1. Check the working directory.")

 98.2% done, 0:00 left. Step : 491100, Potential Energy : -2671.56 kJ/mol, Temperature : 43.22 C, Speed : 1.68e+03 ns/daySimulation finished.
	Simulation finished.
Proceed to the 8. Analysis.


In [None]:
#@title #8. Analysis
#@markdown Run this cell before performing any analyses below.

#@markdown Click the play button in the widget to view the trajectory.

def wrap_trajectory(save_at, status):
    """
    Wraps the trajectory and concatenates the trajectory files.
    """
    print("Reading the trajectory files...", end='')
    trajs = []
    for i in range(status[3]):
        trajs.append(save_at+'MD_'+str(i)+'.xtc')

    traj = md.load(trajs, top=save_at+'minimized.pdb')
    print("done.")

    s = read_settings(save_at)
    prot_indices = traj.topology.select("protein")

    if s['waterModel'].startswith("(explicit)"):
        print("Wrapping the trajectory...", end='')
        if len(prot_indices) > 0:
            prot = traj.topology.subset(prot_indices)
            traj.image_molecules(inplace=True, anchor_molecules=prot.find_molecules())
        else:
            traj.image_molecules(inplace=True, anchor_molecules=traj.topology.guess_anchor_molecules())
        print("done.")

    print("Aligning the trajectory...", end='')
    if len(prot_indices) > 0:
        traj2 = traj.superpose(traj, 0, atom_indices=traj.topology.select_atom_indices('alpha'))
    else:
        traj2 = traj.superpose(traj, 0, atom_indices=traj.topology.select_atom_indices('heavy'))

    print("done.")
    print("Writing the trajectory to the file 'MD_processed.xtc' ...", end='')
    traj2.save_xtc(save_at+'MD_processed.xtc')
    traj2[0].save_pdb(save_at+'MD_processed_0.pdb')
    print("done.")

    return

if 'save_at' not in vars():
    print("Rerun from the cell 1. Check the working directory.")

elif os.path.exists(save_at+'MD_processed.xtc'):
    traj = md.load(save_at+'MD_processed.xtc', top=save_at+'MD_processed_0.pdb')
    prot_traj = traj.atom_slice(traj.topology.select("not water"))
    view5 = nv.show_mdtraj(prot_traj, defaultRepresentation=False)
    view5._set_size('750px','500px')
    view5.add_cartoon("protein or nucleic", lowResolution=True)
    view5.add_simplified_base("nucleic", lowResolution=True, color_scheme='chainindex')
    view5.center()
    display(view5)

else:
    wrap_trajectory(save_at, status)
    traj = md.load(save_at+'MD_processed.xtc', top=save_at+'MD_processed_0.pdb')
    prot_traj = traj.atom_slice(traj.topology.select("not water"))
    view5 = nv.show_mdtraj(prot_traj, defaultRepresentation=False)
    view5._set_size('750px','500px')
    view5.add_cartoon("protein or nucleic", lowResolution=True)
    view5.add_simplified_base("nucleic", lowResolution=True, color_scheme='chainindex')
    view5.center()
    display(view5)

Reading the trajectory files...done.
Aligning the trajectory...done.
Writing the trajectory to the file 'MD_processed.xtc' ...done.


NGLWidget(max_frame=100)

In [None]:
#@title Clustering
#@markdown Set the RMSD threshold value (in nanometers) for the trajectory frame clustering. The structures within this threshold will be grouped into the same cluster.
rmsd_threshold = 0.15 #@param {type:"slider", min:0.1, max:1.0, step:0.05}

#@markdown Centroid structures (representative structures) of each clusters will be saved to different frames of "clustered.pdb" file.

rmsd_matrix = np.zeros((traj.n_frames, traj.n_frames))

for i in tqdm_notebook(range(traj.n_frames)):
    rmsd_matrix[i] = md.rmsd(traj, traj, frame=i, atom_indices=traj.topology.select_atom_indices('heavy'))

clustering = AgglomerativeClustering(n_clusters=None, distance_threshold=rmsd_threshold, metric='precomputed', linkage='average')
cluster_labels = clustering.fit_predict(rmsd_matrix)

n_clusters = len(set(cluster_labels))
cluster_sizes = Counter(cluster_labels)
sorted_clusters = sorted(cluster_sizes.items(), key=lambda x:x[1], reverse=True)

print(f"Total {n_clusters} clusters found!")

centroids = []
for cluster, size in sorted_clusters:
    cluster_frames = np.where(cluster_labels == cluster)[0]
    centroid_idx = cluster_frames[np.argmin([np.sum(rmsd_matrix[frame, cluster_frames]) for frame in cluster_frames])]
    print(f"Cluser {cluster} (size = {size})'s centroid structure : frame {centroid_idx}")
    centroids.append((cluster, size, centroid_idx))

cent_traj = md.join([traj[idx] for _, _, idx in centroids])
cent_traj.save_pdb(save_at+"clustered.pdb")

view6 = nv.show_mdtraj(cent_traj, defaultRepresentation=False)
view6._set_size('750px','500px')
view6.add_cartoon("protein or nucleic")
view6.add_licorice("protein or nucleic")
view6.center()
view6.player.delay = 3000
display(view6)

  0%|          | 0/101 [00:00<?, ?it/s]

Total 2 clusters found!
Cluser 0 (size = 100)'s centroid structure : frame 99
Cluser 1 (size = 1)'s centroid structure : frame 88


NGLWidget(max_frame=1)

In [None]:
#@title Backbone Root-Mean-Square Deviation
#@markdown This cell shows the backbone atoms' RMSD change along the trajectory.

#@markdown You may zoom, pan, or download the screenshot from the interactive chart below.

#@markdown You can also hover the cursor over the graph to show the details.

rmsd = md.rmsd(traj, traj, 0, atom_indices=traj.topology.select_atom_indices('minimal'))

fig3 = go.Figure()
fig3.add_trace(go.Scatter(x=traj.time/1000, y=rmsd, mode='lines', name='RMSD'))
fig3.update_layout(title='Backbone RMSD', xaxis_title='Time (ns)', yaxis_title='RMSD (nm)', width=750, height=500)
fig3.show()

In [None]:
#@title Radius of Gyration
#@markdown This cell shows the radius of gyration of the given protein (approximate size of the globular protein) along the trajectory.

rg = md.compute_rg(traj)

fig4 = go.Figure()
fig4.add_trace(go.Scatter(x=traj.time/1000, y=rg, mode='lines', name='RoG'))
fig4.update_layout(title='Radius of Gyration', xaxis_title='Time (ns)', yaxis_title='Radius of Gyration (nm)', width=750, height=500)
fig4.show()

In [None]:
#@title Residue-wise Root-Mean-Square Fluctuation
#@markdown This cell shows the all-atom RMSF of each residues over the course of trajectory.

#@markdown Residues with higher RMSFs are considered as more flexible.

prot_indices = traj.topology.select("not water and (mass 11 to 33) and (symbol != 'Na')")
traj2 = traj.superpose(traj, 0, atom_indices=prot_indices)
traj2.atom_slice(atom_indices=prot_indices, inplace=True)

atom_rmsf = np.sqrt(3*np.mean(np.square(traj2.xyz - np.mean(traj2.xyz, axis=0)), axis=0))

res_rmsf = []
for residue in traj2.topology.residues:
    atom_idx = [atom.index for atom in residue.atoms]
    res_rmsf.append(np.mean(atom_rmsf[atom_idx]))

fig5 = go.Figure()
fig5.add_trace(go.Scatter(x=np.arange(traj2.topology.n_residues), y=res_rmsf, mode='lines', name='RMSF',
                          hovertemplate='Residue: %{x}<br>RMSF: %{y:.3f} nm<extra></extra>'))
fig5.update_layout(title='Residue-wise Root-Mean-Square Fluctuation', xaxis_title='Residue index', yaxis_title='RMSF (nm)',
                   hovermode='closest', width=750, height=500)
fig5.show()

In [None]:
#@title 2D Principal Component Analysis
#@markdown This cell extracts the collective motions of atomic coordinates and demonstrates them as a pair of orthogonal eigenvectors called principal components. The percentage values in the parentheses describe the eigenvalue of each component, which show the explainability of each component in the motion of atoms over the course of trajectory.

prot_indices = traj.topology.select("not water and (mass 11 to 33) and (symbol != 'Na')")
traj2 = traj.superpose(traj, 0, atom_indices=prot_indices)
pca = PCA(n_components=2)
reduced_cart = pca.fit_transform(traj2.xyz.reshape(traj2.n_frames, traj2.topology.subset(prot_indices).n_atoms*3))
var_explained = pca.explained_variance_ratio_*100

s = read_settings(save_at)

fig6 = go.Figure()
fig6.add_trace(go.Scatter(x=reduced_cart[:,0], y=reduced_cart[:,1], mode='markers',
                          marker=dict(size=5, color=np.arange(len(reduced_cart))*s["save_interval"]/1000, colorscale='Viridis', colorbar=dict(title='Time (ns)'), showscale=True),
                          text=[f'Time: {i*s["save_interval"]}' for i in range(len(reduced_cart))], hoverinfo='text', name='PCA'))
fig6.update_layout(title='Two Dimensional Principal Component Analysis', xaxis_title=f'PC1 ({var_explained[0]:.2f}%)', yaxis_title=f'PC2 ({var_explained[1]:.2f}%)', width=750, height=500)
fig6.show()

In [None]:
#@title 3D Principal Component Analysis
#@markdown SImilar to the above cell that performed a 2D PCA, this cell performs the 3D PCA.
prot_indices = traj.topology.select("not water and (mass 11 to 33) and (symbol != 'Na')")
traj2 = traj.superpose(traj, 0, atom_indices=prot_indices)
pca = PCA(n_components=3)
reduced_cart = pca.fit_transform(traj2.xyz.reshape(traj2.n_frames, traj2.topology.subset(prot_indices).n_atoms*3))
var_explained = pca.explained_variance_ratio_*100

s = read_settings(save_at)

fig7 = go.Figure()
fig7.add_trace(go.Scatter3d(x=reduced_cart[:,0], y=reduced_cart[:,1], z=reduced_cart[:,2], mode='markers',
                          marker=dict(size=5, color=s["save_interval"]/1000*np.arange(len(reduced_cart)), colorscale='Viridis', opacity=0.8, colorbar=dict(title='Time (ns)'), showscale=True),
                          text=[f'Time: {i*s["save_interval"]}' for i in range(len(reduced_cart))], hoverinfo='text', name='PCA'))
fig7.update_layout(title='Three Dimensional Principal Component Analysis', scene=dict(xaxis_title=f'PC1 ({var_explained[0]:.2f}%)', yaxis_title=f'PC2 ({var_explained[1]:.2f}%)', zaxis_title=f'PC3 ({var_explained[2]:.2f}%)'), width=750, height=750)
fig7.show()

In [None]:
#@title Visualization of Principal Components
#@markdown This cell animates each principal component's motion in MDTraj trajectory.

#@markdown Set index of the principal component to visualize.
component_to_view = "1" #@param [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
#@markdown Set the scale of the motion. Set larger values for more exaggerated motions.
scale = 3 #@param {type:"slider", min:1, max:10, step:1}

prot_indices = traj.topology.select("not water and (mass 11 to 33) and (symbol != 'Na')")
traj2 = traj.superpose(traj, 0, atom_indices=prot_indices)
traj2.atom_slice(atom_indices=prot_indices, inplace=True)
coords = traj2.xyz.reshape(traj2.n_frames, -1)

pca = PCA()
pca_results = pca.fit_transform(coords)
top_pcs = pca.components_[:10]

pc_trajs = top_pcs.reshape(10, traj2.n_atoms, 3)
sine_wave = np.sin(np.linspace(0, 2*np.pi, num=50))

pc_traj = md.Trajectory(np.zeros((50, traj2.n_atoms, 3)), traj2.topology)

for i in range(50):
    pc_traj.xyz[i] = traj2.xyz[0] + scale*sine_wave[i]*pc_trajs[int(component_to_view)-1]

view7 = nv.show_mdtraj(pc_traj, defaultRepresentation=False)
view7._set_size('750px','500px')
view7.add_cartoon("protein or nucleic", lowResolution=True, color_scheme='residueindex')
view7.add_simplified_base("nucleic", lowResolution=True, color_scheme='residueindex')
view7.center()
view7.player.delay = 500
display(view7)

NGLWidget(max_frame=49)

In [None]:
#@title Ramachandran Map
#@markdown This cell shows the Ramachandran map of the given protein over the course of trajectory.

#@markdown For large proteins, this may take some time.

stride = int(traj.n_frames/50)

if stride < 2:
    traj2 = traj
    stride = 1
else:
    traj2 = traj[::stride]

phi = md.compute_phi(traj2, periodic=False)[1]
psi = md.compute_psi(traj2, periodic=False)[1]

s = read_settings(save_at)

fig8 = go.Figure()
colors = np.linspace(0, traj.n_frames*s['save_interval'].value_in_unit(nanosecond), traj2.n_frames)

fig8.add_trace(go.Scatter(x=phi.flatten(), y=psi.flatten(), mode='markers', marker=dict(color=np.repeat(colors, phi.shape[1]), colorscale='Viridis', colorbar=dict(title='Time (ns)'), showscale=True),
                          text=[f'Time: {(i//phi.shape[1])*stride*s["save_interval"]}<br>Residue: {i%phi.shape[1]}<br>Phi: {phi[i//phi.shape[1], i%phi.shape[1]]:.2f}<br>Psi: {psi[i//phi.shape[1], i%phi.shape[1]]:.2f}'
                                for i in range(phi.size)], hoverinfo='text'))

fig8.update_layout(title='Time-dependent Ramachandran Map', xaxis_title='Phi (radians)', yaxis_title='Psi (radians)', xaxis_range=[-np.pi, np.pi], yaxis_range=[-np.pi, np.pi], width=750, height=500)
fig8.show()

In [None]:
#@title Solvent Accessible Surface Area
#@markdown This cell shows the change in the surface area of the given residue.

#@markdown Type the index (starting from 1) of the residue you want to analyze its SASA.

residue_index = "6" #@param {type:"string"}

res_idx = int(residue_index)-1

if res_idx < 0 or res_idx >= traj.topology.n_residues:
    print("Invalid residue index!")

else:
    sasa = md.shrake_rupley(traj, mode='residue')

    fig9 = go.Figure()
    fig9.add_trace(go.Scatter(x=traj.time/1000, y=sasa[:,res_idx], mode='lines', name='SASA'))
    fig9.update_layout(title=f'Solvent Accessible Surface Area of the residue {traj.topology.residue(res_idx)}', xaxis_title='Time (ns)', yaxis_title='SASA (nm^2)', width=750, height=500)
    fig9.show()

In [None]:
#@title Secondary Structure Analysis using DSSP
#@markdown This cell shows the secondary structure of each residue over the course of trajectory.

dssp = md.compute_dssp(traj)
dssp_map = {'H':0, 'E':1, 'C':2}
dssp_num = np.array([[dssp_map.get(ss,2) for ss in frame] for frame in dssp])

s = read_settings(save_at)

fig10 = go.Figure(data=go.Heatmap(z=dssp_num.T, x=traj.time/1000, y=np.arange(traj.n_residues),
                                 colorscale=[[0,'red'],[0.33, 'red'], [0.33, 'yellow'], [0.66, 'yellow'], [0.66, 'blue'], [1, 'blue']],
                                 colorbar=dict(tickvals=[0,1,2],ticktext=['Helix', 'Sheet', 'Coil'])))
fig10.update_layout(title='Time-dependent Secondary Structure (DSSP)', xaxis_title='Time (ns)', yaxis_title='Residue index', width=750, height=1000)
fig10.data[0].hovertext = [[f'Time : {i*s["save_interval"]}<br>Residue: {j}<br>SS: {dssp[i][j]}' for i in range(traj.n_frames)] for j in range(traj.n_residues)]
fig10.data[0].hoverinfo = 'text'
fig10.show()

In [None]:
#@title Dynamic Cross-Correlation Matrix
#@markdown This cell shows the correlation between the movements of alpha carbons of each residue.
#@markdown When the correlation value approaches +1 (blue), it means the pair of residues tend to move in the same direction.
#@markdown When the value approaches -1 (red), it means the pair of residues tend to move in the opposite directions.
#@markdown When the value approaches 0 (white), it means the pair of residues move independently from each other.

ca_idx = traj.topology.select_atom_indices('alpha')
mean_pos = np.mean(traj.xyz[:, ca_idx, :], axis=0)
fluct = traj.xyz[:, ca_idx, :] - mean_pos

dccm = np.zeros((traj.n_residues, traj.n_residues))

for i in range(traj.n_residues):
    for j in range(traj.n_residues):
        numerator = np.mean(np.sum(fluct[:,i,:]*fluct[:,j,:], axis=1))
        denominator = np.sqrt(np.mean(np.sum(fluct[:,i,:]**2, axis=1))*np.mean(np.sum(fluct[:,j,:]**2, axis=1)))
        dccm[i,j] = numerator/denominator

fig11 = go.Figure(data=go.Heatmap(z=dccm, colorscale='RdBu', zmid=0, colorbar=dict(title='Correlation')))
fig11.update_layout(title='Dynamic Cross-Correlation Matrix', xaxis_title='Residue index', yaxis_title='Residue index', width=750, height=750)
fig11.data[0].hovertext = [[f'Residue i: {i+1}<br>Residue j: {j+1}<br>Correlation: {dccm[i,j]:.2f}' for j in range(dccm.shape[1])] for i in range(dccm.shape[0])]
fig11.data[0].hoverinfo = 'text'
fig11.show()

**Tools used**

*  OpenMM
> P. Eastman, J. Swails, J. D. Chodera, R. T. McGibbon, Y. Zhao, K. A. Beauchamp, L.-P. Wang, A. C. Simmonett, M. P. Harrigan, C. D. Stern, R. P. Wiewiora, B. R. Brooks, and V. S. Pande. (2017) "OpenMM 7: Rapid development of high performance algorithms for molecular dynamics.” PLOS Comp. Biol. 13(7): e1005659. DOI:[10.1371/journal.pcbi.1005659](https://doi.org/10.1371/journal.pcbi.1005659)

* PDB2PQR
> T. J. Dolinsky, J. E. Nielsen, J. A. McCammon, and N. A. Baker. (2004) "PDB2PQR: an automated pipeline for the setup, execution, and analysis of Poisson-Boltzmann electrostatics calculations." Nucleic Acids Res. 32: W665-667. DOI:[10.1093/nar/gkh381](https://doi.org/10.1093/nar/gkh381)

* PropKa
> H. Li, A. D. Robertson, and J. H. Jensen. (2005) "Very Fast Empirical Prediction and Rationalization of Protein pKa Values." Proteins, 61: 704-721. DOI:[10.1002/prot.20660](https://doi.org/10.1002/prot.20660)

*  NGLViewer
> H. Nguyen, D. A. Case, and A. S. Rose. (2018) "NGLview - Interactive molecular graphics for Jupyter notebooks" Bioinformatics 34(7): 1241-1242. DOI:[10.1093/bioinformatics/btx789](https://doi.org/10.1093/bioinformatics/btx789)

* MDTraj
> R. T. McGibbon, K. A. Beauchamp, M. P. Harrigan, C. Klein, J. M. Swails, C. X. Hernández, C. R. Schwantes, L-P. Wang, T. J. Lane, and V. S. Pande (2011) "MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories" Biophys. J. 109(8): 1528-1532. DOI:[10.1016/j.bpj.2015.08.015](https://doi.org/10.1016/j.bpj.2015.08.015)

* NumPy
> C. R. Harris, K. J. Millman, S. J. van der Walt et al. (2020) "Array programming with NumPy" Nature 585, 357-362. DOI:[10.1038/s41586-020-2649-2](https://doi.org/10.1038/s41586-020-2649-2)

* Scikit-learn
> F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, David Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay (2011) "Scikit-learn: Machine Learning in Python" JMLR 12(85): 2825-2830. (https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html)

* Plotly
> Plotly Technologies Inc. (https://plot.ly)

* ipywidgets
> Jupyter widgets community (https://github.com/jupyter-widgets/ipywidgets)

