# Preprocessing of the Molecular Dynamics Trajectories

#### Imports
* [numpy](https://numpy.org/) for manipulating and saving arrays.
* `gen_traj_numpy` for using the [MDAnalysis](https://www.mdanalysis.org/) library to read the trajectories and save them as numpy arrays.
* `normalize_files` for normalizing the trajectories
* `align_traj` for aligning the trajectories

In [1]:
import numpy as np
import sys
sys.path.insert(0, '../../')
from modules.inputs.preprocess import gen_traj_numpy

#### Inputs
* `input_top` is the path to the topology file. Check [here](https://userguide.mdanalysis.org/1.0.0/formats/index.html) for all accepted formats.
* `input_traj` is the path to the trajectory file. Check [here](https://userguide.mdanalysis.org/1.0.0/formats/index.html) for all accepted formats.
    * **Note**: The trajectory file should be already aligned and centered beforehand if needed!
* `output_name` is the name of the output file. The output file will be saved as `{output_name}.npy` for faster loading in the future.
* `atomSelection` is the atom selection used for clustering that must be compatible with the [MDAnalysis Atom Selections Language](https://userguide.mdanalysis.org/stable/selections.html).

`gen_traj_numpy` will convert the trajectory to a numpy array with the shape (n_frames, n_atoms $\times$ 3) for comparison purposes.
`normalize_file` will normalize trajectory between $[0, 1]$ to be compatible with extended similarity indices.

In [4]:
input_top = '../../example/aligned_tau.pdb'
input_traj = '../../example/aligned_1000_tau.dcd'
output_base_name = '../../example/aligned_tau'
atomSelection = 'resid 3 to 12 and name N CA C O H'

traj_numpy = gen_traj_numpy(input_top, input_traj, atomSelection)

Number of atoms in trajectory: 217
Number of frames in trajectory: 6001
Number of atoms in selection: 50


#### Outputs
The output is a numpy array of shape (n_frames, n_atoms $\times$ 3).

In [5]:
output_name = output_base_name + '.npy'
np.save(output_name, traj_numpy)