Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute_neighbors compute_neighborlist and compute_contacts are giving me a strange result #1658

Closed
MauriceKarrenbrock opened this issue Jul 2, 2021 · 4 comments
Labels

Comments

@MauriceKarrenbrock
Copy link
Contributor

Hi, I am trying to get all the residues that have at leas a atom at a distance of max 4.5 Angstrom from at least one atom of the ligand in the attached protein-ligand system (the ligand resname is "UNK") 4lr6_solvbox.gro.txt
And I have tried with multiple functions of yours and always get the same strange results, and I can't understand if I am doing some very dumb error or if there is a bug.

The result I get is residue 4096 4746 3857 4625 3862 4632 3744 4771 4645 40 41 42 44 46 4015 51 4660 3781 4819 3932 95 4575 99 4579 103 104 105 3822 3827 4725 632 126 3967 where the numbers are the resSeq. If you put it in VMD you will see that on one side I don't get all the neighboring residues (an obvious example is LEU51 that has one hydrogen atom at 3.52 Angstrom from one of the atoms of the ligand); and on the other side I get some random waters that are obviously more far away than 4.5 Angstrom.

The ligand resname is UNK and this what I tried:

import mdtraj

tt = mdtraj.load('4lr6_solvbox.gro')

atom_index = mdtraj.geometry.compute_neighbors(traj=tt,
cutoff=0.45,
query_indices=tt.top.select('resname UNK'))

atom_index = atom_index[0]

resSeq = list(set(tt.top.atom(i).residue.resSeq for i in atom_index))

resSeq = [str(i) for i in resSeq]

resSeq = 'residue  ' + '  '.join(resSeq)

print(resSeq)
import mdtraj

tt = mdtraj.load('4lr6_solvbox.gro')

atom_index = mdtraj.geometry.compute_neighborlist(tt, 0.45, periodic=False)
tmp_list = []
for n in tt.top.select('resname UNK'):
        tmp_list += list(atom_index[n])

tmp_list = list(set(tmp_list))

resSeq = [str(tt.top.atom(i).residue.resSeq) for i in tmp_list]
resSeq = list(set(resSeq))

del atom_index

resSeq = 'residue  ' + '  '.join(resSeq)

print(resSeq)

And in the 3rd case I used a function that I wrote that uses mdtraj.compute_contacts, and can be found here https://github.com/MauriceKarrenbrock/PythonPDBStructures/blob/master/PythonPDBStructures/geometry.py at line 120 and is called get_nearest_neighbors_residues_with_mdtraj

Is someone able to reproduce this error and does someone have an idea of what is going on?

@sroet
Copy link
Contributor

sroet commented Jul 5, 2021

So, something goes wrong loading that file (in version 1.9.6).

If I just run:

import mdtraj as md
frame = md.load("Downloads/4lr6_solvbox.gro")
frame

I get:
<mdtraj.Trajectory with 1 frames, 40779 atoms, 38810 residues, and unitcells at 0x7fe85eb725e0>
Which is way more residues than the 13016 stated in the gro file.

together with a bunch of warnings like:
/lib/python3.9/site-packages/mdtraj/formats/gro.py:307: UserWarning: WARNING: two consecutive residues with same number (THR, SER)

This seems to be a bug introduced between 1.9.5 and 1.9.6 (probably a side effect of #1638), as running the same code in mdtraj 1.9.5 gives:
<mdtraj.Trajectory with 1 frames, 40779 atoms, 13016 residues, and unitcells at 0x7fc5096bff40>

I unfortunately don't have time to investigate the loading issue further, but as you are the original author of the PR, you might have some insights on what goes wrong. If you rather have a solution for this, then you can probably downgrade mdtraj

PS: There might be a code that does what you are looking for (assuming that you can correct the loading, as it is also based on mdtraj under the hood): contact_map

@MauriceKarrenbrock
Copy link
Contributor Author

Hi, thank you very much!

Yes you were right, I introduced a small bug during the mentioned bugfix, I resolved it in this pull request #1659

But the problem persists also with the bugfix: some random waters get selected and some good residues don't

I checked contact_map up, but what I need is so simple that my first example is all what I need. And in any case if it uses mdtraj under the hood the problem should be there too

@MauriceKarrenbrock
Copy link
Contributor Author

In any case, in order to see if it is only a mdtraj thing, I tried using a function that does the same thing with biopython that I wrote some time ago and I get the exact same results.

I really don't know how to interpret this, the fact that some near residues don't get counted may be because of some floating point rounding error, but the fact that some random waters very far away get counted makes no sense to me...

The function can be found here https://github.com/MauriceKarrenbrock/PythonPDBStructures/blob/master/PythonPDBStructures/geometry.py line 244 and is called get_nearest_neighbors_residues_with_biopython

To reproduce

pip install git+https://github.com/MauriceKarrenbrock/PythonPDBStructures
pip install biopython
import mdtraj
from PythonPDBStructures.geometry import get_nearest_neighbors_residues_with_biopython
import PythonPDBStructures.pdb.biopython_utils as ut

mdtraj.load('4lr6_solvbox.gro').save('4lr6_solvbox.pdb')

struct = ut.parse_pdb('4lr6', '4lr6_solvbox.pdb')

nn = get_nearest_neighbors_residues_with_biopython(struct, target_resname='UNK', ignore_resnames=[])

residues = [str(i) for  i in nn.resnumbers]

residues = 'residue ' + ' '.join(residues)

print(residues)

and you should obtain residue 40 41 42 44 46 51 95 99 103 104 105 632 3744 3781 3822 3827 3857 3862 3932 3967 4015 4096 4575 4579 4625 4632 4645 4660 4725 4746 4771 4819

@stale
Copy link

stale bot commented Jan 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 8, 2022
@stale stale bot closed this as completed Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants