Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ilrmsdmatrix module #685

Merged
merged 25 commits into from
Jan 9, 2024
Merged

ilrmsdmatrix module #685

merged 25 commits into from
Jan 9, 2024

Conversation

mgiulini
Copy link
Contributor

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines and that you comply with the following criteria:

  • You have sticked to Python. Please talk to us before adding other programming languages to HADDOCK3
  • Your PR is about CNS
  • Your code is well documented: proper docstrings and explanatory comments for those tricky parts
  • You structured the code into small functions as much as possible. You can use classes if there is a (state) purpose
  • Your code follows our coding style
  • You wrote tests for the new code
  • tox tests pass. Run tox command inside the repository folder
  • -test.cfg examples execute without errors. Inside examples/ run python run_tests.py -b
  • PR does not add any dependencies, unless permission granted by the HADDOCK team
  • PR does not break licensing
  • Your PR is about writing documentation for already existing code 🔥
  • Your PR is about writing tests for already existing code :godmode:

Closes #684

@mgiulini mgiulini self-assigned this Aug 28, 2023
@mgiulini mgiulini changed the title added first draft of ilrmsdmatrix module ilrmsdmatrix module Aug 28, 2023
@mgiulini mgiulini mentioned this pull request Oct 10, 2023
12 tasks
@codecov
Copy link

codecov bot commented Oct 27, 2023

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (31155b3) 70.25% compared to head (a96027f) 71.22%.
Report is 63 commits behind head on main.

Files Patch % Lines
.../haddock/modules/analysis/ilrmsdmatrix/__init__.py 89.51% 13 Missing ⚠️
src/haddock/libs/libparallel.py 50.00% 2 Missing ⚠️
...rc/haddock/modules/analysis/ilrmsdmatrix/ilrmsd.py 98.79% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #685      +/-   ##
==========================================
+ Coverage   70.25%   71.22%   +0.97%     
==========================================
  Files          78       80       +2     
  Lines        6967     7261     +294     
==========================================
+ Hits         4895     5172     +277     
- Misses       2072     2089      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mgiulini mgiulini marked this pull request as ready for review October 27, 2023 18:49
Copy link
Member

@amjjbonvin amjjbonvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment - further I leave it to the experts


from haddock.modules.analysis.ilrmsdmatrix import DEFAULT_CONFIG as DEFAULT_ILRMSD_CONFIG
from haddock.modules.analysis.ilrmsdmatrix import HaddockModule as IlrmsdmatrixModule
DATA_DIR = Path(Path(__file__).parent.parent / "tests" / "golden_data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo it's best if you don't share input between the unit and the integration since this adds a cross-test dependency and can cause an indirect side-effect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only imports a text file that is not used by the ilrmsdmatrix unit test..I can add a small ensemble of conformations, but that means adding more data to the repository. And more and more data will have to be added for the next integrations tests..
If you think that having complete independency between the two folders is crucial, I'll do it, it does not sound super necessary to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important, yes

integration_tests/test_ilrmsdmatrix.py Outdated Show resolved Hide resolved
src/haddock/modules/analysis/ilrmsdmatrix/defaults.yaml Outdated Show resolved Hide resolved
src/haddock/modules/analysis/ilrmsdmatrix/__init__.py Outdated Show resolved Hide resolved
src/haddock/modules/analysis/ilrmsdmatrix/__init__.py Outdated Show resolved Hide resolved
@mgiulini mgiulini mentioned this pull request Oct 30, 2023
12 tasks
VGPReys
VGPReys previously approved these changes Nov 6, 2023
src/haddock/modules/analysis/ilrmsdmatrix/__init__.py Outdated Show resolved Hide resolved
@mgiulini
Copy link
Contributor Author

mgiulini commented Nov 6, 2023

@VGPReys @rvhonorato I should have implemented your suggestions in the code, thanks for the review!

@amjjbonvin
Copy link
Member

amjjbonvin commented Nov 7, 2023 via email

@mgiulini
Copy link
Contributor Author

mgiulini commented Nov 7, 2023

it doesn't make much sense to let the users modify this parameter..10k is already a lot, if there're more input models you should never use RMSD-based clustering (at least in docking related contexts)
Any idea of the timing for clustering 10K models with RMSD?

clustering is never a problem, the matrix calculation is..with the native python implementation in HADDOCK3 the ilrmsdmatrix in the glycan example took ~10 minutes for 1k models on 10 CPU cores..since it scales quadratically probably we should limit the number of models to 4-5k instead of 10k

@amjjbonvin
Copy link
Member

amjjbonvin commented Nov 7, 2023 via email

@mgiulini
Copy link
Contributor Author

mgiulini commented Nov 7, 2023

I would make that a parameter - up to the user to decide how much time they want to spend on it. If you select less than the number of available models, will that be done automatically on the top ranked ones? Or would you need a seletop step before that, e.g. to reduce from 20k to 5k before clustering

OK about making it a parameter, but the max value should not exceed 20k imo, so as to avoid memory problems and super long executions
currently if MAX_MODELS is less than the number of models the code raises an error

@amjjbonvin
Copy link
Member

amjjbonvin commented Nov 7, 2023 via email

@rvhonorato
Copy link
Member

rvhonorato commented Nov 7, 2023

I would make that a parameter - up to the user to decide how much time they want to spend on it.

Shouldn't we simply improve the code to make this faster? Adding this as a parameter and all this contour conditions just to avoid programimg?

@mgiulini
Copy link
Contributor Author

I would make that a parameter - up to the user to decide how much time they want to spend on it.

Shouldn't we simply improve the code to make this faster? Adding this as a parameter and all this contour conditions just to avoid programimg?

not sure code improvements can make much of a difference here: for sure the computation takes longer than it should, but the matrix calculation scales quadratically with the number of models..even if we made the code faster by a factor of 10, the execution will take a lot of time for 10k input models, because 50 million distances have to be computed.

as for the code improvements, it's a big choice, as it means adding extra dependencies and spending a lot of time on it. And there aren't so many cases in which we really need that efficiency. I am not saying I am against that, just that it must be discussed with the other developers

@VGPReys VGPReys added feature New feature request m|ilrmsdmatrix interface-ligand RMSD matrix calculation module labels Dec 13, 2023
@mgiulini mgiulini merged commit 6389a20 into main Jan 9, 2024
4 checks passed
@mgiulini mgiulini deleted the ilrmsd_clustering branch January 9, 2024 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature request m|ilrmsdmatrix interface-ligand RMSD matrix calculation module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

develop ilRMSD-based clustering
4 participants