# Gromos Trajectory evaluation with Pygromos and Pandas

## Example file for the evaluation of GROMOS trajectory files in pygromos

1. Analysis of a GROMOS trc file (position trajectory)
    1. Import
    2. Common Functions
2. Analysis of a GROMOS tre file (energy trajectory)
    1. Import
    2. Common Functions

In [1]:
# general imports for manual data manipulations. Not needed if only provided functions are used
import numpy as np
import pandas as pd

In [2]:
#specific imports from pygromos for trc and tre file support
import pygromos.files.trajectory.trc as traj_trc
import pygromos.files.trajectory.tre as traj_tre

## 1) TRC

### 1.1) TRC import

In [3]:
# import the trajectory file into a Trc class
trc = traj_trc.Trc(input_value="example_files/test_CHE_vacuum_sd.trc")

example_files/test_CHE_vacuum_sd.trc


Trc class offers the normal gromos block structure and additionaly a pandas DataFrame called database where all the timesteps are stored.
For typical trc files the only classic block is the TITLE block, and all the other blocks are stored inside the database.

Additionally many common functions are offered to evaluate the given data. If a needed function is not provided the normal pandas syntax can be used to create costum functions.

If you have a function generally usefull pleas contact the developers to possibly add them to the pygromos code to help other people :)

In [4]:
[x for x in dir(trc) if not x.startswith("_")]

['TITLE',
 'add_traj',
 'database',
 'get_atom_movement_length_mean',
 'get_atom_movement_length_series',
 'get_atom_movement_length_total',
 'get_atom_movement_series',
 'get_atom_pair_distance_mean',
 'get_atom_pair_distance_series',
 'get_cog_movement_total_series_for_atom_group',
 'get_cog_movement_vector_series_for_atom_group',
 'path',
 'radial_distribution',
 'write',
 'write_pdb']

### 1.2) Common trc functions

In [5]:
# Get the average movement legth between two frames
trc.get_atom_movement_length_mean(atomI=1)

0.03818326045999198

In [6]:
# Or get the center of mass movement for a hole group of atoms. The atoms are provided as numbers in a list.
trc.get_cog_movement_total_series_for_atom_group(atoms=[1,2,5]).mean()

0.029582014393078063

In [7]:
# Get the average distance between to atoms over all time frames
trc.get_atom_pair_distance_mean(atomI=1, atomJ=2)

0.15260032608007468

## 2) TRE

### 2.1) Tre import and structure

In [8]:
# import the trajectory file into a Tre class
tre = traj_tre.Tre(input_value="example_files/test_CHE_H2O_bilayer.tre")

example_files/test_CHE_H2O_bilayer.tre


In [9]:
[x for x in dir(tre) if not x.startswith("_")]

['ENEVERSION',
 'TITLE',
 'add_traj',
 'database',
 'get_density',
 'get_temperature',
 'get_totals',
 'path',
 'totals_subblock_names',
 'write']

Tre files contain all energy related data (like split up energy terms, temperature, pressure, .....). In PyGromos they generally share the same block structure as other files, but all the data inside the specific timesteps is stored efficently inside a pandas DataFrame, here called tre.database . This database offers manipulation with all pandas functions. Alternatively many common functions are provided inside the Tre class. 

This class should in principle replace further usage of the gromos++ ene_ana function, since all these operation can be done efficently on the pandas DataFrame. 

We are currently working on adding more common functions to the Tre class. If you find a useful function please contact the developers so the function can be added for general usage :)

### 2.2) Common Tre functions

In [10]:
# calculate the average density over all timesteps
tre.get_density().mean()

874.6182260652407

In [11]:
# calculate the mean temperature over all frames for all baths in the system. In this example two baths with slightly different temperatures.
tre.get_temperature().mean()

array([297.7702854 , 297.71343093])

Tables and lists inside the database are stored in numpy arrays. For example the two temperatures from the previos example are stored in a numpy array of size 2 since it has two temperature baths

Specific values inside a tre file can also be directly accesed with numpy and pandas syntax

In [12]:
tre.database.iloc[2]

TIMESTEP_step                                                 2000
TIMESTEP_time                                                    4
totals           [-409167.9172, 392370.1191, -801538.0363, 1389...
baths            [[100096.9564, 29163.29693, 70933.65948], [292...
bonded                 [[0.0, 50155.81989, 0.0, 88764.59053, 0.0]]
nonbonded                  [[983688.8106, -1924147.257, 0.0, 0.0]]
special          [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
eds                                                             []
numstates                                                      WIP
mass                                                 [1277287.736]
temperature      [[297.2614008, 346.4257383, 280.8695397, 0.999...
volume           [2425.07286, 8.463596169, 0.0, 0.0, 0.0, 8.463...
pressure         [28.23515616, 96520.43393, 130756.5894, 30.172...
Name: 2, dtype: object

In [13]:
# select the first nonbonded energy value for the first force group over all time frames
tre.database["nonbonded"].apply(lambda x: x[0][0])

0    984602.6158
1    988489.6284
2    983688.8106
3    984891.3076
4    980597.3587
5    980518.6316
6    980975.7104
7    979207.7071
8    977305.7837
9    976722.0947
Name: nonbonded, dtype: float64

## Concatenate  and Copy multiple Trajectories

Trajectories offer a wide range of additional file manipulations. Trajectory classes can be copied (deep) and added to each other to concatenate multiple small simulation pieces into one large trajectory. 

In [14]:
tre_copy = traj_tre.Tre(input_value=tre)

In [15]:
tre_copy.database.shape

(10, 13)

In [16]:
tre_combined = tre + tre_copy

In [17]:
tre_combined.database.shape

(19, 13)

In the new combined trajectory we have one long trajectory made from the two smaller ones. The lenght is one element shorter, since normally the last element of the first trajectory and the first element of the second trajectory is the same element. This can be controlled via the option "skip_new_0=True" in the add_traj() function which is the core of the "+" operator for trajectories. In the following line the default behavior can be seen as a smooth numbering in the TIMESTEPs.

In [18]:
tre_combined.database.TIMESTEP_time

0      0.0
1      2.0
2      4.0
3      6.0
4      8.0
5     10.0
6     12.0
7     14.0
8     16.0
9     18.0
10    20.0
11    22.0
12    24.0
13    26.0
14    28.0
15    30.0
16    32.0
17    34.0
18    36.0
Name: TIMESTEP_time, dtype: float64