Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to select all clusters in seletopclusts #286

Merged
merged 4 commits into from
Jan 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,17 @@ sampling = 100
noecv = true

[clustfcc]
threshold=1

[seletopclusts]
# select the best 4 clusters
top_cluster = [1, 2, 3, 4]
# select the best 4 models of each cluster
top_models = 4
## select the best 4 clusters
# top_cluster = [1, 2, 3, 4]
## select all the clusters
top_cluster = []
## select the best 4 models of each cluster
# top_models = 4
##select all the models
top_models = nan

[caprieval]
reference_fname = "data/e2a-hpr_1GGR.pdb"
Expand Down
42 changes: 29 additions & 13 deletions src/haddock/modules/analysis/seletopclusts/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""HADDOCK3 module to select a top cluster/model."""
import math
from pathlib import Path

from haddock.libs.libontology import ModuleIO
Expand All @@ -14,13 +15,8 @@ class HaddockModule(BaseHaddockModule):

name = RECIPE_PATH.name

def __init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black lint is killing us. Much better before, also contributions signatures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*killing you, the linting is passing 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, because is not a matter of lint, is a matter of style :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mine or the lack of yours? 🔥

self,
order,
path,
*ignore,
init_params=DEFAULT_CONFIG,
**everything):
def __init__(self, order, path, *ignore, init_params=DEFAULT_CONFIG,
**everything):
super().__init__(order, path, init_params)

@classmethod
Expand All @@ -30,24 +26,44 @@ def confirm_installation(cls):

def _run(self):
"""Execute the module's protocol."""
if self.params["top_models"] <= 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fine for now but I need to find a better way around this. Also to make sure the right types are passed in the user config files.

_msg = "top_models must be either > 0 or nan."
self.finish_with_error(_msg)

if not isinstance(self.params["top_cluster"], list):
_msg = "top_cluster must be a list, it can be an empty one."
self.finish_with_error(_msg)

models_to_select = self.previous_io.retrieve_models()

# how many models should we output?
models = []
for target_ranking in self.params['top_cluster']:
if self.params['top_models'] == 'all':
if not self.params["top_cluster"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happen if the user inputs top_cluster = 0 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question! let me address that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an extra check, thanks!

target_rankings = list(set([p.clt_rank for p in models_to_select]))
target_rankins_str = ",".join(map(str, target_rankings))
self.log(f"Selecting all clusters: {target_rankins_str}")
else:
target_rankings = list(set(self.params["top_cluster"]))
target_rankins_str = ",".join(map(str, target_rankings))
self.log(f"Selecting clusters: {target_rankins_str}")

for target_ranking in target_rankings:
if math.isnan(self.params["top_models"]):
for pdb in models_to_select:
if pdb.clt_rank == target_ranking:
models.append(pdb)
else:
for model_ranking in range(1, self.params['top_models'] + 1):
for model_ranking in range(1, self.params["top_models"] + 1):
for pdb in models_to_select:
if (pdb.clt_rank == target_ranking
and pdb.clt_model_rank == model_ranking):
if (
pdb.clt_rank == target_ranking
and pdb.clt_model_rank == model_ranking
):
self.log(
f"Selecting {pdb.file_name} "
f"Cluster {target_ranking} "
f"Model {model_ranking}")
f"Model {model_ranking}"
)
models.append(pdb)

# Save module information
Expand Down
4 changes: 2 additions & 2 deletions src/haddock/modules/analysis/seletopclusts/defaults.cfg
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
top_cluster = [1, 2, 3, 4]
top_models = 4
top_cluster = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want the defaults to provide all models? Or should the defaults have a more natural value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all models, but models for all clusters - this is a different thing. There is no warranty that the correct solutions are in the top4... and further refinement could well change the ranking at the end.

top_models = nan