# Parameterisation
To parameterise a small molecule for simulation, import to right `Parameteriser`. Here we showcase using the `SolutionParameteriser`, which when given a SMILES string of the molecule to be simulated, it gives back a parameterised system with one copy of this molecule solvated in water. We use benzene as example here. 


In [1]:
benzene_smiles = "c1ccccc1"

Depending on the toolkit at hand, parameterisation can either be done using `via_rdkit()`, which uses open-sourced RDKit or `via_openeye()`, which is commercial. 

The parameterised system is stored as a Parmed object.


In [2]:
from mdfptools import Parameteriser
#RDKit parameterisation
rdk_pmd = Parameteriser.SolutionParameteriser.via_rdkit(benzene_smiles)
rdk_pmd

<Structure 2622 atoms; 871 residues; 1752 bonds; PBC (orthogonal); parametrized>

In [3]:
#OpenEye alternative
oe_pmd = Parameteriser.SolutionParameteriser.via_openeye(benzene_smiles)
oe_pmd

<Structure 2652 atoms; 881 residues; 1772 bonds; PBC (orthogonal); parametrized>

When using RDKit, by default the partial charge assignment to the small molecule is done via antechamber to yield AM1-BCC charges. 

We also developed a machine-learned alternative partial charge assignment scheme called [mlddec](github.com/rinikierlab/mlddec). Once this package is installed, one can charge the system using:

In [4]:
Parameteriser.SolutionParameteriser.load_ddec_models()
Parameteriser.SolutionParameteriser.via_rdkit(benzene_smiles)

  0%|          | 0/10 [00:00<?, ?it/s]

Loading models...


100%|██████████| 10/10 [02:53<00:00, 17.30s/it]


<Structure 2631 atoms; 874 residues; 1758 bonds; PBC (orthogonal); parametrized>

After one is finished with using `Parameteriser` to prepare all the systems one wishes to subsequently simulate, the ddec models should be unloaded as they occupy quite some memory.

In [5]:
Parameteriser.SolutionParameteriser.unload_ddec_models()

The parameterised systems as parmed objects can be stored to disk and reloaded into memory using pickle:

In [6]:
import pickle
#store to disk
pickle.dump(rdk_pmd, open("./benzene.pickle", "wb"))

# Load the pickled object back to memory:
pickle.load(open("./benzene.pickle", "rb"))

<Structure 2622 atoms; 871 residues; 1752 bonds; PBC (orthogonal); parametrized>

# Visualisation (Optional)
You can have a look at the parameterised parmed system inside Jupyter notebook (Jupyter lab does not seem to work) using [nglview](https://github.com/arose/nglview). 

In [7]:
import nglview as nv
view = nv.show_parmed(rdk_pmd)
view.add_licorice()
view

_ColormakerRegistry()

NGLWidget()

# Simulation
To simulate, just import the right simulator and call the `via_openmm()` class method, which as the name implies runs MD using OpenMM under the hood. 

We plan to include another python handle in GROMACS which will enable `via_gromacs()` in the future.

The default simulation length is 5 ns, with trajectory frame stored every 10 ps (so 1 frame is stored after every 5000 steps, totally 500 frames). The simulation will take some time to run.

In [8]:
from mdfptools.Simulator import SolutionSimulator
SolutionSimulator.via_openmm(rdk_pmd, file_name = "benzene", file_path = "./", 
                             platform = "CUDA", num_steps = 5000 * 500)

'/home/shuwang/Documents/Modelling/MDFP/Codes/mdfptools/examples/benzene.h5'

# Obtain Molecular Dynamics Fingerprint (MDFP)
Once the simulation has finished, one can extract the relevant properties using the right `Composer` (`SolutionComposer` here). 

In [9]:
#First load in the simulated trajectory
import mdtraj as md
traj = md.load("./benzene.h5")

In [13]:
from mdfptools.Composer import SolutionComposer
mdfp = SolutionComposer.run(traj, rdk_pmd)
print(mdfp)

{'2d_counts': [6, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'water_intra_crf': [8.447469879840682, 0.07754269775241597, 8.448101852311458], 'water_intra_lj': [14.698494638650427, 0.35388130931658185, 14.62080608686442], 'water_total_crf': [-20.59946506906263, 8.513642293598433, -18.724968586184893], 'water_total_lj': [-20.978276995376827, 4.06594223290863, -22.954899317973993], 'water_intra_ene': [23.14596451849111, 0.3528157869181612, 23.10431903412804], 'water_total_ene': [-41.57774206443945, 6.698043493291234, -39.76306852550783], 'water_rgyr': [1.4897907974950872, 0.0031876909007386768, 1.4891081418027634], 'water_sasa': [2.433884, 0.007920546, 2.4333215]}


The returned object from a Composer is a `MDFP` object. As can be seen from above, it contains more information.
To use it for the subsequent machine learning tasks, call the `get_mdfp()` method to get the feature vectors (i.e. just the values not the keys)

In [11]:
mdfp.get_mdfp()

[6,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 8.447469879840682,
 0.07754269775241597,
 8.448101852311458,
 14.698494638650427,
 0.35388130931658185,
 14.62080608686442,
 -20.59946506906263,
 8.513642293598433,
 -18.724968586184893,
 -20.978276995376827,
 4.06594223290863,
 -22.954899317973993,
 23.14596451849111,
 0.3528157869181612,
 23.10431903412804,
 -41.57774206443945,
 6.698043493291234,
 -39.76306852550783,
 1.4897907974950872,
 0.0031876909007386768,
 1.4891081418027634,
 2.433884,
 0.007920546,
 2.4333215]