ilrmsdmatrix module #685

mgiulini · 2023-08-28T09:33:17Z

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines and that you comply with the following criteria:

Closes #684

…dock3 into ilrmsd_clustering

codecov · 2023-10-27T18:46:20Z

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (31155b3) 70.25% compared to head (a96027f) 71.22%.
Report is 63 commits behind head on main.

Files	Patch %	Lines
.../haddock/modules/analysis/ilrmsdmatrix/__init__.py	89.51%	13 Missing ⚠️
src/haddock/libs/libparallel.py	50.00%	2 Missing ⚠️
...rc/haddock/modules/analysis/ilrmsdmatrix/ilrmsd.py	98.79%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #685      +/-   ##
==========================================
+ Coverage   70.25%   71.22%   +0.97%     
==========================================
  Files          78       80       +2     
  Lines        6967     7261     +294     
==========================================
+ Hits         4895     5172     +277     
- Misses       2072     2089      +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

amjjbonvin

One comment - further I leave it to the experts

examples/docking-protein-glycan/docking-protein-glycan-ilrmsd-test.cfg

rvhonorato · 2023-10-30T10:54:32Z

integration_tests/test_ilrmsdmatrix.py

+
+from haddock.modules.analysis.ilrmsdmatrix import DEFAULT_CONFIG as DEFAULT_ILRMSD_CONFIG
+from haddock.modules.analysis.ilrmsdmatrix import HaddockModule as IlrmsdmatrixModule
+DATA_DIR = Path(Path(__file__).parent.parent / "tests" / "golden_data")


imo it's best if you don't share input between the unit and the integration since this adds a cross-test dependency and can cause an indirect side-effect

this only imports a text file that is not used by the ilrmsdmatrix unit test..I can add a small ensemble of conformations, but that means adding more data to the repository. And more and more data will have to be added for the next integrations tests..
If you think that having complete independency between the two folders is crucial, I'll do it, it does not sound super necessary to me

It is important, yes

integration_tests/test_ilrmsdmatrix.py

src/haddock/modules/analysis/ilrmsdmatrix/defaults.yaml

src/haddock/modules/analysis/ilrmsdmatrix/__init__.py

src/haddock/modules/analysis/ilrmsdmatrix/ilrmsd.py

src/haddock/libs/libparallel.py

src/haddock/modules/analysis/ilrmsdmatrix/__init__.py

mgiulini · 2023-11-06T17:34:24Z

@VGPReys @rvhonorato I should have implemented your suggestions in the code, thanks for the review!

amjjbonvin · 2023-11-07T08:33:34Z

it doesn't make much sense to let the users modify this parameter..10k is already a lot, if there're more input models you should never use RMSD-based clustering (at least in docking related contexts)

Any idea of the timing for clustering 10K models with RMSD?

mgiulini · 2023-11-07T09:54:44Z

it doesn't make much sense to let the users modify this parameter..10k is already a lot, if there're more input models you should never use RMSD-based clustering (at least in docking related contexts)
Any idea of the timing for clustering 10K models with RMSD?

clustering is never a problem, the matrix calculation is..with the native python implementation in HADDOCK3 the ilrmsdmatrix in the glycan example took ~10 minutes for 1k models on 10 CPU cores..since it scales quadratically probably we should limit the number of models to 4-5k instead of 10k

amjjbonvin · 2023-11-07T11:38:42Z

I would make that a parameter - up to the user to decide how much time they want to spend on it. If you select less than the number of available models, will that be done automatically on the top ranked ones? Or would you need a seletop step before that, e.g. to reduce from 20k to 5k before clustering

mgiulini · 2023-11-07T11:54:01Z

I would make that a parameter - up to the user to decide how much time they want to spend on it. If you select less than the number of available models, will that be done automatically on the top ranked ones? Or would you need a seletop step before that, e.g. to reduce from 20k to 5k before clustering

OK about making it a parameter, but the max value should not exceed 20k imo, so as to avoid memory problems and super long executions
currently if MAX_MODELS is less than the number of models the code raises an error

amjjbonvin · 2023-11-07T11:57:09Z

Ok - the max param can be 10K (expert users can modify this) And this means a seletop step would be needed first to cluster less than the total number of models

rvhonorato · 2023-11-07T12:09:22Z

I would make that a parameter - up to the user to decide how much time they want to spend on it.

Shouldn't we simply improve the code to make this faster? Adding this as a parameter and all this contour conditions just to avoid programimg?

mgiulini · 2023-11-15T11:34:17Z

I would make that a parameter - up to the user to decide how much time they want to spend on it.

Shouldn't we simply improve the code to make this faster? Adding this as a parameter and all this contour conditions just to avoid programimg?

not sure code improvements can make much of a difference here: for sure the computation takes longer than it should, but the matrix calculation scales quadratically with the number of models..even if we made the code faster by a factor of 10, the execution will take a lot of time for 10k input models, because 50 million distances have to be computed.

as for the code improvements, it's a big choice, as it means adding extra dependencies and spending a lot of time on it. And there aren't so many cases in which we really need that efficiency. I am not saying I am against that, just that it must be discussed with the other developers

added first draft of ilrmsdmatrix module

0493df4

mgiulini self-assigned this Aug 28, 2023

mgiulini changed the title ~~added first draft of ilrmsdmatrix module~~ ilrmsdmatrix module Aug 28, 2023

mgiulini mentioned this pull request Oct 10, 2023

Glycan example #711

Merged

12 tasks

mgiulini and others added 13 commits October 10, 2023 17:22

Merge branch 'main' into ilrmsd_clustering

60698c1

improved module

89e0366

Merge branch 'main' into ilrmsd_clustering

f918a3e

added tests for ilrmsdmatrix module

d7977fd

improved code

98a903e

improved ilrmsdmatrix tests

71865aa

fixed ilrmsd files

5561312

added ilrmsdmatrix integration test

f28c060

added ilrmsdmatrix example

cceda6c

Merge branch 'main' into ilrmsd_clustering

a6df25e

Merge branch 'ilrmsd_clustering' of https://github.com/haddocking/had…

96bc411

…dock3 into ilrmsd_clustering

added pytest_mock

0fdefbd

fixed run test

d421c12

mgiulini marked this pull request as ready for review October 27, 2023 18:49

mgiulini requested review from rvhonorato, amjjbonvin and VGPReys October 27, 2023 18:49

amjjbonvin reviewed Oct 29, 2023

View reviewed changes

examples/docking-protein-glycan/docking-protein-glycan-ilrmsd-test.cfg Show resolved Hide resolved

mgiulini mentioned this pull request Oct 30, 2023

clarify naming in clustrmsd and clustfcc #739

Closed

rvhonorato requested changes Oct 30, 2023

View reviewed changes

mgiulini mentioned this pull request Oct 30, 2023

alascan module #690

Merged

12 tasks

mgiulini added 4 commits November 6, 2023 13:52

Merge branch 'main' into ilrmsd_clustering

dd51339

added check to get_index_list

bd683c7

added data dir to integration tests

b2326b4

adjusted integration test

8edac03

mgiulini added 4 commits November 6, 2023 13:58

fixed ilrmsd module

57a0321

updated tests

c77381c

removed max_models

4aa377b

added protein glycan complexes

559410d

VGPReys reviewed Nov 6, 2023

View reviewed changes

src/haddock/libs/libparallel.py Outdated Show resolved Hide resolved

VGPReys previously approved these changes Nov 6, 2023

View reviewed changes

src/haddock/modules/analysis/ilrmsdmatrix/__init__.py Outdated Show resolved Hide resolved

more explicit check on nmodels and ncores

f7eeda1

mgiulini dismissed VGPReys’s stale review via f7eeda1 November 6, 2023 16:37

Merge branch 'main' into ilrmsd_clustering

a96027f

rvhonorato approved these changes Nov 7, 2023

View reviewed changes

VGPReys approved these changes Nov 7, 2023

View reviewed changes

VGPReys added feature New feature request m|ilrmsdmatrix interface-ligand RMSD matrix calculation module labels Dec 13, 2023

added max_models as a parameter

f609b64

mgiulini merged commit 6389a20 into main Jan 9, 2024
4 checks passed

mgiulini deleted the ilrmsd_clustering branch January 9, 2024 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ilrmsdmatrix module #685

ilrmsdmatrix module #685

mgiulini commented Aug 28, 2023

codecov bot commented Oct 27, 2023 •

edited

Loading

amjjbonvin left a comment

rvhonorato Oct 30, 2023

mgiulini Nov 1, 2023

rvhonorato Nov 1, 2023

mgiulini commented Nov 6, 2023

amjjbonvin commented Nov 7, 2023 via email

mgiulini commented Nov 7, 2023

amjjbonvin commented Nov 7, 2023 via email

mgiulini commented Nov 7, 2023

amjjbonvin commented Nov 7, 2023 via email

rvhonorato commented Nov 7, 2023 •

edited

Loading

mgiulini commented Nov 15, 2023

ilrmsdmatrix module #685

ilrmsdmatrix module #685

Conversation

mgiulini commented Aug 28, 2023

codecov bot commented Oct 27, 2023 • edited Loading

Codecov Report

amjjbonvin left a comment

Choose a reason for hiding this comment

rvhonorato Oct 30, 2023

Choose a reason for hiding this comment

mgiulini Nov 1, 2023

Choose a reason for hiding this comment

rvhonorato Nov 1, 2023

Choose a reason for hiding this comment

mgiulini commented Nov 6, 2023

amjjbonvin commented Nov 7, 2023 via email

mgiulini commented Nov 7, 2023

amjjbonvin commented Nov 7, 2023 via email

mgiulini commented Nov 7, 2023

amjjbonvin commented Nov 7, 2023 via email

rvhonorato commented Nov 7, 2023 • edited Loading

mgiulini commented Nov 15, 2023

codecov bot commented Oct 27, 2023 •

edited

Loading

rvhonorato commented Nov 7, 2023 •

edited

Loading