Expand supported file formats #109

phydev · 2023-03-13T13:40:58Z

A parser for different file formats is needed, especially for processing csv files that contain several trajectories.

file format	description	priority
.xyz	file with several atoms and time points	1
.csv	csv file with several trajectories	2
LAMMPS	Molecular dynamics	2
.pdb	protein data bank	3

.xyz file

Nparticles [integer]
comment [character]
X Y Z [repeat Nparticles]
[repeat Nframes]

CSV with several trajectories - format definition

The csv should contain 5 columns: time t, 3 spatial (x, y, z) components and the trajectory identifier id.

LAMMPS data file format

Large-scale Atomic/Molecular Massively Parallel Simulator is a molecular dynamics program from Sandia National Laboratories.

More details about the file format: https://docs.lammps.org/read_data.html

The LAMMPS data dump file format is written in yaml with the following structure:

---
creator: LAMMPS
timestep: 0
units: lj
time: 0
natoms: 3
boundary: [ p, p, p, p, p, p, ]
thermo:
  - keywords: [ Step, Temp, E_pair, E_mol, TotEng, Press, ]
  - data: [ 0, 0, -27093.472213010766, 0, 0, 0, ]
box:
  - [ 0, 16.795961913825074 ]
  - [ 0, 16.795961913825074 ]
  - [ 0, 16.795961913825074 ]
  - [ 0, 0, 0 ]
keywords: [ id, type, x, y, z, vx, vy, vz, ix, iy, iz,  ]
data:
  - [     1 , 1 ,  0.000000e+00 ,  0.000000e+00 ,  0.000000e+00 ,  -1.841579e-01 , -9.710036e-01 , -2.934617e+00 , 0 , 0 , 0, ]
  - [     2 , 1 ,  8.397981e-01 ,  8.397981e-01 ,  0.000000e+00 ,  -1.799591e+00 ,  2.127197e+00 ,  2.298572e+00 , 0 , 0 , 0, ]
  - [     3 , 1 ,  8.397981e-01 ,  0.000000e+00 ,  8.397981e-01 ,  -1.807682e+00 , -9.585130e-01 ,  1.605884e+00 , 0 , 0 , 0, ]
---
timestep: 100
...
---

A parser for this file format is straightforward with yaml.load_all() function.

Protein Data Bank (PDB) format

Standard file format for protein structures containing several atoms each file at different time steps. Each pdb file can contain a screenshot of the system or several trajectories, so we need to process several pdb files at once to extract trajectories.

A possible workflow would be:

Read each pdb file and extract the trajectories per atom
Write a CSV file using the format (y, x, y, z, id), where id is the atom identifier.
Use the CSV file to compute the features using trajpy

More information about pdb file format: https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)

The text was updated successfully, but these errors were encountered:

phydev added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers future-work Code that should be implemented soon labels Mar 13, 2023

phydev pinned this issue Mar 13, 2023

phydev self-assigned this Jul 19, 2023

phydev added a commit that referenced this issue Jul 21, 2023

Lammps YAML parser added #109

fd1db54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand supported file formats #109

Expand supported file formats #109

phydev commented Mar 13, 2023 •

edited

Expand supported file formats #109

Expand supported file formats #109

Comments

phydev commented Mar 13, 2023 • edited

.xyz file

CSV with several trajectories - format definition

LAMMPS data file format

Protein Data Bank (PDB) format

phydev commented Mar 13, 2023 •

edited