# Gromos Trajectory evaluation with Pygromos and Pandas

## Example file for the evaluation of GROMOS trajectory files in pygromos

1. Analysis of a GROMOS trc file (position trajectory)
    1. Import
    2. Common Functions
2. Analysis of a GROMOS tre file (energy trajectory)
    1. Import
    2. Common Functions

In [1]:
# general imports for manual data manipulations. Not needed if only provided functions are used
import numpy as np
import pandas as pd

In [2]:

#specific imports from pygromos for trc and tre file support
import pygromos.files.trajectory.trc as traj_trc
import pygromos.files.trajectory.tre as traj_tre





## 1) TRC

### 1.1) TRC import

In [3]:
# import the trajectory file into a Trc class
trc = traj_trc.Trc(traj_path='../example_files/Traj_files/b_emin_vacuum_1.trc',
          in_cnf='../example_files/Traj_files/b_emin_vacuum.cnf')

FileNotFoundError: [Errno 2] No such file or directory: '../example_files/Traj_files/b_emin_vacuum_1.trc'

The TRC class bridges the GROMOS TRC file based strucutres and mdtraj.Trajectories.

One can read in TRCs and use them just like mdtraj.Trajectories. One can than write them out both as .h5 as well as GROMOS .trc/trc.gz files.

If you have a function that's generally useful, please contact the developers to possibly add it to the pygromos code to help other people :)

In [4]:
[x for x in dir(trc) if not x.startswith("_")]

NameError: name 'trc' is not defined

### 1.2) TRC file handling

##### Save as .h5

In [5]:
trc.save('./traj.h5') 

NameError: name 'trc' is not defined

#### Save as trc

In [6]:
trc.save('./traj.trc.gz')

NameError: name 'trc' is not defined

#### export last frame as cnf

In [7]:
trc[-1].to_cnf(base_cnf='../example_files/Traj_files/b_emin_vacuum.cnf')

NameError: name 'trc' is not defined

#### Get selection of trajectories

In [8]:
trc[::10].n_frames

NameError: name 'trc' is not defined

#### Save out selection

In [9]:
trc[::10].save('9_frames.trc')

NameError: name 'trc' is not defined

#### RMSD

In [10]:
# Calculate the rmsd to the initial frame (0th frame).
# Alternatively a different trajectory can be provide as argument to the rmsd function.
# The accepted arguments are integer or single trajectory frame.
rmsd = trc.rmsd(0)

NameError: name 'trc' is not defined

In [11]:
# Which returns the rmsd for every time frame to the initial frame.
# It can be seen how the rmsd slowly gets larger as the simulations get farther away from the initial setup.
rmsd

NameError: name 'rmsd' is not defined

In [12]:
# The mean over all frames can be easily taken with the pandas function mean()
rmsd.mean()

NameError: name 'rmsd' is not defined

## 2) TRE

### 2.1) Tre import and structure

In [13]:
# import the trajectory file into a Tre class
from pygromos.files.trajectory.tre_field_libs.ene_fields import gromos_2015_tre_block_names_table

tre = traj_tre.Tre(input_value="../example_files/Traj_files/test_CHE_H2O_bilayer.tre", _ene_ana_names=gromos_2015_tre_block_names_table)

OSError: [Errno Could not find File: ] ../example_files/Traj_files/test_CHE_H2O_bilayer.tre

In [14]:
tre.database

NameError: name 'tre' is not defined

In [15]:
[x for x in dir(tre) if not x.startswith("_")]

NameError: name 'tre' is not defined

Tre files contain all energy related data (like split up energy terms, temperature, pressure, .....). In PyGromos they generally share the same block structure as other files, but all the data inside the specific timesteps is stored efficiently inside a pandas DataFrame, here called tre.database . This database offers manipulation with all pandas functions. Alternatively many common functions are provided inside the Tre class. 

This class should in principle replace further usage of the gromos++ ene_ana function, since all these operation can be done efficiently on the pandas DataFrame. 

We are currently working on adding more common functions to the Tre class. If you find a useful function please contact the developers so the function can be added for general usage :)

### 2.2) Common Tre functions

In [16]:
# calculate the average density over all timesteps
tre.get_density().mean()

NameError: name 'tre' is not defined

In [17]:
# calculate the mean temperature over all frames for all baths in the system. In this example two baths with slightly different temperatures.
tre.get_temperature().mean()

NameError: name 'tre' is not defined

Tables and lists inside the database are stored in numpy arrays. For example the two temperatures from the previous example are stored in a numpy array of size 2 since it has two temperature baths

Specific values inside a tre file can also be directly accessed with numpy and pandas syntax

In [18]:
tre.database.iloc[2]

NameError: name 'tre' is not defined

In [19]:
# select the first nonbonded energy value for the first force group over all time frames
tre.database["nonbonded"].apply(lambda x: x[0][0])

NameError: name 'tre' is not defined

In [20]:
tre.get_totals()

NameError: name 'tre' is not defined

### $\lambda$-Sampling & TREs

In [21]:
# import the trajectory file into a Tre class
tre = traj_tre.Tre(input_value="../example_files/Traj_files/RAFE_TI_l0_5.tre")
tre.get_precalclam()

OSError: [Errno Could not find File: ] ../example_files/Traj_files/RAFE_TI_l0_5.tre

### EDS in TREs

In [22]:
# import the trajectory file into a Tre class
tre = traj_tre.Tre(input_value="../example_files/Traj_files/RAFE_eds.tre")
tre.get_eds()

OSError: [Errno Could not find File: ] ../example_files/Traj_files/RAFE_eds.tre

## Concatenate  and Copy multiple Trajectories

Trajectories offer a wide range of additional file manipulations. Trajectory classes can be copied (deep) and added to each other to concatenate multiple small simulation pieces into one large trajectory. 

In [23]:
tre_copy = traj_tre.Tre(input_value=tre)

NameError: name 'tre' is not defined

In [24]:
tre_copy.database.shape

NameError: name 'tre_copy' is not defined

In [25]:
tre_combined = tre + tre_copy

NameError: name 'tre' is not defined

In [26]:
tre_combined.database.shape

NameError: name 'tre_combined' is not defined

In the new combined trajectory we have one long trajectory made from the two smaller ones. The length is one element shorter, since normally the last element of the first trajectory and the first element of the second trajectory is the same element. This can be controlled via the option "skip_new_0=True" in the add_traj() function which is the core of the "+" operator for trajectories. In the following line the default behavior can be seen as a smooth numbering in the TIMESTEPs.

In [27]:
tre_combined.database.time

NameError: name 'tre_combined' is not defined

In [28]:
print(len(tre_combined.database), len(tre.database))

NameError: name 'tre_combined' is not defined