Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

coordinates.load loading wrong shape from GROMACS .xtc #1513

Closed
germanbarcenas opened this issue Oct 1, 2021 · 6 comments
Closed

coordinates.load loading wrong shape from GROMACS .xtc #1513

germanbarcenas opened this issue Oct 1, 2021 · 6 comments

Comments

@germanbarcenas
Copy link

germanbarcenas commented Oct 1, 2021

Hello,

I am trying to load in a .xtc file generated from GROMACS. Right now I'm mostly following the workflow from the posted tutorials, except I want to use my own data. I just have 1 .xtc file to load and analyze. I'm trying to cluster and look for features like in notebook 2:
http://emma-project.org/latest/tutorials/notebooks/02-dimension-reduction-and-discretization.html

When I try actually process the data, I see some errors associated with the shape of my trajectory array, that should be T,d, where T is timesteps, and d is the number of features, according to the coordinates.load documentation. I can confirm I have 2 features, but something is off on my trajectory shape. The .xtc is very large, so I don't think I can share it directly, However, this is how the code generally flows:

PyEMMA v2.5.9

XTC = 'md_nojump.xtc'
PDB= 'md_hj.pdb'
feat=pyemma.coordinates.featurizer(PDB)
#some custom feature stuff here
print(feat.describe())
print(feat.dimension())
traj=pyemma.coordinates.load(XTC,features=feat,stride=100)
print('type of data:', type(traj))
print('lengths:', len(traj))
print('shape of elements:', traj[0].shape)
print('n_atoms:', feat.topology.n_atoms)

Here is the error

Traceback (most recent call last):

  File "R:\GermanBarcenas\dna\gromacs\codes\markovAnalysis.py", line 65, in <module>
    traj=pyemma.coordinates.load(XTC,features=feat,stride=100)

  File "C:\Users\GermanBarcenas\Anaconda3\envs\moldym\lib\site-packages\pyemma\coordinates\api.py", line 244, in load
    trajs = reader.get_output(stride=stride)

  File "C:\Users\GermanBarcenas\Anaconda3\envs\moldym\lib\site-packages\pyemma\coordinates\data\_base\datasource.py", line 406, in get_output
    for itraj, chunk in it:

  File "C:\Users\GermanBarcenas\Anaconda3\envs\moldym\lib\site-packages\pyemma\coordinates\data\_base\datasource.py", line 1049, in __next__
    X = self._use_cols(self._next_chunk())

  File "C:\Users\GermanBarcenas\Anaconda3\envs\moldym\lib\site-packages\pyemma\coordinates\data\_base\datasource.py", line 1171, in _next_chunk
    x = self.transform_function(x)

  File "C:\Users\GermanBarcenas\Anaconda3\envs\moldym\lib\site-packages\pyemma\coordinates\data\feature_reader.py", line 155, in transform
    return self.featurizer.transform(data)

  File "C:\Users\GermanBarcenas\Anaconda3\envs\moldym\lib\site-packages\pyemma\coordinates\data\featurization\featurizer.py", line 906, in transform
    vec = f.transform(traj).astype(np.float32, casting='safe')

  File "C:\Users\GermanBarcenas\Anaconda3\envs\moldym\lib\site-packages\pyemma\coordinates\data\featurization\misc.py", line 115, in transform
    feature = self._fun(traj, *self._args, **self._kwargs)

  File "R:\GermanBarcenas\dna\gromacs\codes\markovAnalysis.py", line 33, in COMDistance_function
    traj = mdtraj.Trajectory(xyz.reshape(-1, 3, 3), topology=None)

ValueError: cannot reshape array of size 5010 into shape (3,3)

@thempel
Copy link
Member

thempel commented Oct 4, 2021

There is something about your traceback that I don't understand. According to your code, you are not adding any features - that should default to xyz coordinates, if I'm not mistaken. However, the error occurs in a function called COMDistance_function, which is not from PyEMMA but comes from markovAnalysis.py . How does this function enter your code example?

@germanbarcenas
Copy link
Author

germanbarcenas commented Oct 6, 2021

Here is that function I wrote:

c4 = GroupCOMFeature(feat.topology, [[15]])
c5 = GroupCOMFeature(feat.topology, [[70]])

def COMDistance_function(traj: mdtraj.Trajectory):
    centers4 = c4.transform(traj)  # yields ndarray
    centers5 = c5.transform(traj)  # yields ndarray
    xyz = np.hstack((centers4, centers5))
    traj = mdtraj.Trajectory(xyz.reshape(-1, 3, 3), topology=None)
    # this has shape (n_frames, 1)
    distances = mdtraj.compute_distances(traj, distance_indices=[[0, 1, 2]], periodic=True)
    return distances  

I am trying to return the distance from residue center of mass of index 15 and 70. I saw a similar custom thread here and I'm trying to adopt it to just return the scalar distance between the center of mass of two residues:
#1501

@clonker
Copy link
Member

clonker commented Oct 7, 2021

Hi @germanbarcenas, this is because XYZ does not have the proper shape! If you want to compute the distance between two (trajectories of) centers...

centers4 = c4.transform(traj)  # yields (n_frames, 1, 3) ndarray
centers5 = c5.transform(traj)  # yields (n_frames, 1, 3) ndarray
xyz = np.hstack((centers4, centers5))
traj = mdtraj.Trajectory(xyz.reshape(-1, 2, 3), topology=pdb)
distances = mdtraj.compute_distances(traj, distance_indices=[[0, 1]], periodic=True)
return distances

Note that this is untested code but in general it should help to print the intermediate shapes or step through the code with an actual debugger.

@germanbarcenas
Copy link
Author

germanbarcenas commented Oct 11, 2021

It works now! So I see I was not understanding the shape criteria. Here is my final code:

`
def COMAngle_function(traj: mdtraj.Trajectory):
    centers1 = c1.transform(traj)  # yields ndarray
    centers2 = c2.transform(traj)  # yields ndarray
    centers3 = c3.transform(traj)  # yields ndarray
    xyz = np.hstack((centers1, centers2, centers3))
    traj = mdtraj.Trajectory(xyz.reshape(-1, 3, 3), topology=None)
    # this has shape (n_frames, 1)
    angles = mdtraj.compute_angles(traj, angle_indices=[[0, 1, 2]], periodic=True)
    return angles    
def COMDistance_function(traj: mdtraj.Trajectory):
    centers4 = c4.transform(traj)  # yields ndarray
    centers5 = c5.transform(traj)  # yields ndarray
    xyz = np.hstack((centers4, centers5))
    traj = mdtraj.Trajectory(xyz.reshape(-1, 2, 3), topology=None)
    # this has shape (n_frames, 1)
    distances = mdtraj.compute_distances(traj, [[0, 1]], periodic=True)
    return distances   


XTC = 'md_nojump.xtc'
PDB= 'md_hj.pdb'

feat=pyemma.coordinates.featurizer(PDB)
c1 = GroupCOMFeature(feat.topology, [[0]])
c2 = GroupCOMFeature(feat.topology, [[13]])
c3 = GroupCOMFeature(feat.topology, [[28]]) 
c4 = GroupCOMFeature(feat.topology, [[15]])
c5 = GroupCOMFeature(feat.topology, [[70]])
feat.add_custom_func(COMAngle_function, dim=1, description='IDA_A')
feat.add_custom_func(COMDistance_function, dim=1, description='DyeDistance')
#feat.add_residue_COM([15,70],image_molecule=True)
print(feat.describe())
print(feat.dimension())
traj=pyemma.coordinates.load(XTC,features=feat)
print('type of data:', type(traj))
print('lengths:', len(traj))
print('shape of elements:', traj[0].shape)
print('n_atoms:', feat.topology.n_atoms)`

@germanbarcenas
Copy link
Author

Hi @germanbarcenas, this is because XYZ does not have the proper shape! If you want to compute the distance between two (trajectories of) centers...

centers4 = c4.transform(traj)  # yields (n_frames, 1, 3) ndarray
centers5 = c5.transform(traj)  # yields (n_frames, 1, 3) ndarray
xyz = np.hstack((centers4, centers5))
traj = mdtraj.Trajectory(xyz.reshape(-1, 2, 3), topology=pdb)
distances = mdtraj.compute_distances(traj, distance_indices=[[0, 1]], periodic=True)
return distances

Note that this is untested code but in general it should help to print the intermediate shapes or step through the code with an actual debugger.

I'd like to point out some errors I encounted as I was debugging:

> distances = mdtraj.compute_distances(traj, distance_indices=[[0, 1]], periodic=True)

I had to take out distance_indices=[[0, 1]], and just use [[0, 1]], becuase I would get this error:

compute_distances() got an unexpected keyword argument 'distance_indices'

and then I changed traj = mdtraj.Trajectory(xyz.reshape(-1, 2, 3), topology=PDB) to traj = mdtraj.Trajectory(xyz.reshape(-1, 2, 3), topology=None) because I was getting this error:

AttributeError: 'str' object has no attribute '_numAtoms'

@clonker
Copy link
Member

clonker commented Oct 12, 2021

Great that this works for you! Beware of periodic boundary conditions when evaluating distances though. You will need the topology (pdb) for this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants