Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create haddock3-score CLI #510

Merged
merged 25 commits into from
Aug 31, 2022
Merged

Create haddock3-score CLI #510

merged 25 commits into from
Aug 31, 2022

Conversation

rvhonorato
Copy link
Member

@rvhonorato rvhonorato commented Jul 25, 2022

I deleted the PR template since this is an extra tool and not a module/gear/libs/core/clis/cns

Adds a tool to calculate the haddock-score of a complex.

I see quite some more things that can be implemented such as support for pdb.list, custom-weights, use restraints, etc - we can open additional issues for that.

(venv) repos/haddock3 » pwd
/Users/rodrigo/repos/haddock3
(venv) repos/haddock3 » export PYTHONPATH=pwd/src
(venv) repos/haddock3 » time python tools/haddock-score.py examples/data/2oob.pdb
HADDOCK-score (emscoring): -53.8110

python tools/haddock-score.py examples/data/2oob.pdb 2.32s user 0.08s system 97% cpu 2.477 total

@joaomcteixeira:

  • Add a new command-line using the libworkflow machinery to run emscoring directly on a complex (PDB file) using the command line.
  • Idea was develop by @rvhonorato and @amjjbonvin
  • @joaomcteixeira worked out the integration in haddock. See past commits for details.~

Cheers,

@rvhonorato rvhonorato self-assigned this Jul 25, 2022
@rvhonorato rvhonorato linked an issue Jul 25, 2022 that may be closed by this pull request
Copy link
Member

@amjjbonvin amjjbonvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be an idea to add a few options to:

  1. Possibly printout also the components of the score

  2. Save the generated PDB file (now only kept into memory

This command should in principle return a score rather similar to the EM scoring example, right?

Copy link
Contributor

@mgiulini mgiulini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User experience is great! but I agree that having the different components would be useful.

I calculated the haddock-score for the first 20 pdbs of 1_emscoring for capri Target218 (check out on tintin if you like). It is highly correlated to the true haddock score, but the pearson coeff. is not super (r~0.88). Should we add more steps to this? Or is it ok as it is?

@amjjbonvin
Copy link
Member

amjjbonvin commented Jul 25, 2022 via email

@rvhonorato
Copy link
Member Author

The number of EM steps in emscoring in only 50 while it is 200 in emref And I assume autohis is true for both?

Yep, this tool runs emscoring under the hood

And I assume autohis is true for both?

This tool uses the default parameters

@rvhonorato
Copy link
Member Author

  • Possibly printout also the components of the score

Done

  • Save the generated PDB file (now only kept into memory

It is saved as calc-hs_1.pdb in the directory that you ran the command

@amjjbonvin
Copy link
Member

amjjbonvin commented Jul 25, 2022 via email

Comment on lines 186 to 193
print("-----")
print(f"vdw\t{vdw:.4f}")
print(f"elec't{elec:.4f}")
print(f"desolv\t{desolv:.4f}")
print(f"air\t{air:.4f}")
print("HADDOCK-score = (1.0 * vdw) + (0.2 * elec) + (1.0 * desolv) + (0.1 * air)")
print("-----")
print(f"HADDOCK-score (emscoring) = {haddock_score_itw:.4f}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

air will be in principle 0
may-be more interesting to output the BSA.
And I would not output everything by default, but only as an option (e.g. if the -full argument is given).
Simpler for incorporating the score in other scripts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same thing for outputting the pdb and psf files, only if an option is given, e.q. -outpdb -outpsf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

air will be in principle 0
may-be more interesting to output the BSA.
And I would not output everything by default, but only as an option (e.g. if the -full argument is given).
Simpler for incorporating the score in other scripts.

Added --full option

And same thing for outputting the pdb and psf files, only if an option is given, e.q. -outpdb -outpsf

Added --outputpdb and --outputpsf options

@joaomcteixeira
Copy link
Member

Hi,
Quick question before a more in-depth review. Why can't this be a CLI?
cheers

@@ -48,5 +48,5 @@ jobs:
uses: codecov/codecov-action@v2
with:
files: ./coverage.xml
fail_ci_if_error: true
fail_ci_if_error: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this to false?

@codecov-commenter
Copy link

codecov-commenter commented Jul 26, 2022

Codecov Report

Merging #510 (5c15203) into main (de997e4) will decrease coverage by 1.44%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #510      +/-   ##
==========================================
- Coverage   74.84%   73.40%   -1.45%     
==========================================
  Files         105      103       -2     
  Lines        6953     6621     -332     
==========================================
- Hits         5204     4860     -344     
- Misses       1749     1761      +12     
Impacted Files Coverage Δ
src/haddock/modules/topology/topoaa/__init__.py 44.44% <0.00%> (+0.37%) ⬆️
src/haddock/libs/libfunc.py 80.95% <0.00%> (-9.53%) ⬇️
src/haddock/modules/__init__.py 73.37% <0.00%> (-3.36%) ⬇️
tests/test_modules_general.py 97.08% <0.00%> (-0.56%) ⬇️
examples/compare_runs.py 23.77% <0.00%> (-0.51%) ⬇️
src/haddock/libs/libworkflow.py 30.13% <0.00%> (-0.42%) ⬇️
src/haddock/gear/prepare_run.py 50.77% <0.00%> (-0.01%) ⬇️
tests/test_libutil.py 100.00% <0.00%> (ø)
tests/test_gear_config_writer.py
src/haddock/gear/config_reader.py
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@rvhonorato
Copy link
Member Author

I would rather save it as <original-pdb-filename)_hs.pdb to avoid overwriting it if running on multiple PDBs in the same dir.

Done

@rvhonorato
Copy link
Member Author

I calculated the haddock-score for the first 20 pdbs of 1_emscoring for capri Target218 (check out on tintin if you like). It is highly correlated to the true haddock score, but the pearson coeff. is not super (r~0.88). Should we add more steps to this? Or is it ok as it is?

We figured out that the source of the difference is that @mgiulini was calculating the haddock-scores in the already-minimized models instead of the input ones.

MANIFEST.in Outdated
@@ -7,6 +7,8 @@ include LICENSE
include requirements.txt
include requirements.yml

recursive-include tools *.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree here. I strongly suggest making haddock-score a CLI.

Comment on lines 46 to 47
for line in tidy_pdbfile(inp_fh):
out_fh.write(line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single call to save to disk

Suggested change
for line in tidy_pdbfile(inp_fh):
out_fh.write(line)
lines = list(tidy_pdbfile(inp_fh))
out_fh.write(os.linesep.join(lines))

"""

import argparse
import subprocess
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import subprocess
import os
import subprocess

Comment on lines 49 to 51
main_topoaa_cns_script_as_string = Path(
topoaa_module_folder, "cns/generate-topology.cns"
).read_text()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
main_topoaa_cns_script_as_string = Path(
topoaa_module_folder, "cns/generate-topology.cns"
).read_text()
main_topoaa_cns_script_as_string = Path(
topoaa_module_folder,
"cns",
"generate-topology.cns"
).read_text()

Comment on lines 119 to 121
main_emscoring_cns_script_as_string = Path(
emscoring_module_folder, "cns/emscoring.cns"
).read_text()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
main_emscoring_cns_script_as_string = Path(
emscoring_module_folder, "cns/emscoring.cns"
).read_text()
main_emscoring_cns_script_as_string = Path(
emscoring_module_folder,
"cns",
"emscoring.cns",
).read_text()

Copy link
Member

@joaomcteixeira joaomcteixeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made small code suggestions. I think the topo and emscoring wrappers are a great opportunity to try to use the module objects directly, or at least to study and learn why we can't yet (if that's the case) use them here already. Anyway, that can be for another PR if there's rush merging this one.

Finally, I really think we should have haddock-score as a client as all that is necessary is already placed.

Cheers,

@joaomcteixeira
Copy link
Member

Notethis is not ready to merge. I made an edit in the topoaa I need to confirm it does not affect the workflows. Maybe it still needs some additional work internally.

@mgiulini
Copy link
Contributor

Notethis is not ready to merge. I made an edit in the topoaa I need to confirm it does not affect the workflows. Maybe it still needs some additional work internally.

sure! I was testing it and noticed a couple of improvements :)

@joaomcteixeira
Copy link
Member

@mgiulini can you give a look now, also to the new parameter -p? Let me know if the program outputs the values you are expecting. Cheers

@joaomcteixeira
Copy link
Member

All workflows perform correctly despite the change in topoaa. Everything looks good. 👍

@rvhonorato
Copy link
Member Author

I was testing it and found a mini-bug. I wonder if it makes sense to give the user the possibility to choose the number of energy minimization steps (nemsteps) directly from the cli..In my case study it would be useful

Don't think this makes sense, the idea is to calculate the haddock score according to the defined weights, not use the calculation routine to do minimisation steps.

I know it's doable but in my opinion it falls out of scope, sounds like this is the use case for the default workflow pipeline.

@amjjbonvin
Copy link
Member

amjjbonvin commented Aug 21, 2022 via email

@rvhonorato
Copy link
Member Author

rvhonorato commented Aug 21, 2022

But then we need to make this very clear, imagine someone is trying to recalculate the haddock score of a given complex and the numbers are different because the minimisation steps were not the default.

Being able to customize the steps is nice indeed, I'm just thinking about reproducibility.

Maybe print it out in big letter that's the steps were altered (and also the weights if that's the case) and that this information needs to accompany the publication.

(Or leave this option out and let the user use the default workflow running scheme instead)

@amjjbonvin
Copy link
Member

amjjbonvin commented Aug 21, 2022 via email

@mgiulini
Copy link
Contributor

Hi there! I agree with @rvhonorato that the use of non-default parameters is dangerous and that this should be made very clear during the execution. I created a couple of warnings for that. If you agree l will give the possibility to modify the weights.

@joaomcteixeira
Copy link
Member

To help solving the discussion, the haddock3-score CLI will use the default values unless otherwise specified. Currently, users can modify any parameter of the emscore using the -p option, for example:

haddock3-score complex.pdb -p nemsteps 50 w_air 1 electflag True

See more with haddock3-score -h.

For reporting purposes, users can always save the command line use.

I am okay with the warning messages proposed by @mgiulini .

For me, it is okay to be merged. 👍 Good work everyone 🚀

@mgiulini
Copy link
Contributor

hello, I changed a bit the code to consider (possibly) new weights in the calculation of the score.

@joaomcteixeira
Copy link
Member

joaomcteixeira commented Aug 30, 2022 via email

Copy link
Member

@joaomcteixeira joaomcteixeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @mgiulini 👍

src/haddock/clis/cli_score.py Outdated Show resolved Hide resolved
@mgiulini mgiulini merged commit d19bbf6 into main Aug 31, 2022
@mgiulini mgiulini deleted the 365-implement-haddock3-score branch August 31, 2022 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement haddock3-score
5 participants