New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conformor prevent clash #23
Conformor prevent clash #23
Conversation
Wrote two functions:
Next steps: will continue to write tests for these functions |
Thanks @AlaaShamandy for your implementations. I will review them carefully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very neat implementation, congratulations and thanks @AlaaShamandy !
I made some comments, let's discuss a bit further before merging so that we define a strategy to then pass to a library arch.
@AlaaShamandy I wait for you to tell me when the PR is completed before I review it for merging. Added the label not to merge. |
alphas/clash_validator.py
Outdated
""" | ||
|
||
# concatenate every residue containing four atoms into one numpy array | ||
last_chain_vector = np.concatenate([np.array(list(atom.values())) for atom in last_chain[:-2]]) #ORIGINAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlaaShamandy We should not slice the conformer.coords items inside the validators. That should be made before sending the chains to the function/class call. I will do the same in the Numba implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
Normally the function should be handling any slicing etc + for someone who doesn't know the program very well, it will reiterate their understanding when they see the slicing inside the function. The stateholder keeps track of conformers built on each other. So current_conformer = last_conformer + new loop. This is the logic outside the validators.
We should just stay consistent. There is no advantage to either method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally we say we'll make standalone classes to work on their own but these validators are very specific to our program. It depends on the format of the conformers we have specified, it depends on only finding the atoms N, O, CA and C for now and it depends on the order we have specified. It's ok to deviate from making it standalone since it doesn't make much sense on its own anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because in this PR you are working on alphas
it is fine to have the slicing there, but I don't agree with that. Let's see:
- you have two logics in the same place: 1) deciding what to compare 2) how to compare
- if we have different strategies for comparison we need to repeat the logic in 1) in the different strategies
- if there are edge cases in the future that need different method we will need to develop logic that goes around the logic 1) implemented in the core of the VDW validators.
So for this reasons, it should be defined somewhere else how the slicing of the fragments is done before sending two fragments to the VDW validator. The logic of the VDW validator should be "validate A against B according to this criteria", and not "validate A against B but to A lets remove one index or two depending..." that part of the logic should be somewhere else.
Yes, though these validators are only for our program lets keep logic and abstractions as separated as possible, it will also facilitate unit testing.
@joaomcteixeira I tried to think of the most efficient way to utilize kdtrees, please review. This pull request is ready to be merged. Next steps:
|
new_chain_kdtree = KDTree(new_chain_vector) | ||
|
||
clashes = last_chain_kdtree.sparse_distance_matrix(new_chain_kdtree, 3.4) #maximum value to consider | ||
for indices, distance in zip(clashes.keys(),clashes.values()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick note: this can be written as clashes.items()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean and neat implementation of the vectorized and KDTree implementation. The architecture looks nice at this stage, though I am not evaluating it extensively for now as I am sure it will change or re-adapt when this code is implemented in the actual program libraries.
The the conformer_builder.py
you have corrected some of the bugs we have identified in meetings and others you have found. We will have those updates on top of your desk when implementing for #27.
Thanks @AlaaShamandy , I will merge now.
Removing clashes implementation is done.
Reduced the four for loops and 2 if statements to just 2 for loops by vectorizing the calculations.
These could be reduced to 0 for loops eventually if we vectorize the whole program.
Reasonable assumptions made while developing:
EDITED (by @joaomcteixeira):
This PR addresses #26.