Fixing bug leading to atom graph containing a lot of incorrect atoms #241
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey!
First of all thanks for a great repo! It's a very cool piece of research.
I have playing around with this a repo a bit lately and found this bug a few days ago. The consequence of assigning these to zero instead of nan is that the filtering on this line https://github.com/gcorso/DiffDock/blob/main/datasets/process_mols.py#L205 will not work as it's intended. This leads to loads of atoms (thousands) in the location
[0, 0, 0]
in the graph and means that all ofcomplex_graph['atom']
,complex_graph['atom', 'atom_contact', 'atom']
andcomplex_graph['atom', 'atom_rec_contact', 'receptor']
get corrupted.You can see this for yourself by running the following script. You'll see that for any protein you have loads of atoms in
[0, 0, 0]
.Full disclaimer I'm only using a subset of your repo, so I'd advise to test it out on your full repo before merging, in case there's someplace that expects there to be
[0, 0, 0]
(from a quick inspection it seems like it's not).The main difference by fixing this was that your atom model trains with half the GPU memory requirement, I have tbh not seen a massive performance difference (but maybe the atom model needs tweaking to leverage the corrected atom graph).