Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interactive rescoring and reclustering #769

Merged
merged 94 commits into from
Feb 1, 2024
Merged

Conversation

mgiulini
Copy link
Contributor

@mgiulini mgiulini commented Jan 3, 2024

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines and that you comply with the following criteria:

  • You have sticked to Python. Please talk to us before adding other programming languages to HADDOCK3
  • Your PR is about CNS
  • Your code is well documented: proper docstrings and explanatory comments for those tricky parts
  • You structured the code into small functions as much as possible. You can use classes if there is a (state) purpose
  • Your code follows our coding style
  • You wrote tests for the new code
  • tox tests pass. Run tox command inside the repository folder
  • -test.cfg examples execute without errors. Inside examples/ run python run_tests.py -b
  • PR does not add any dependencies, unless permission granted by the HADDOCK team
  • PR does not break licensing
  • Your PR is about writing documentation for already existing code 🔥
  • Your PR is about writing tests for already existing code :godmode:

Closes #739
Closes #686
Closes #786
Closes #631

Briefly, this PR solves the following:

  • clarifies the naming in the clustering modules (and related examples)
  • breaks down the clustering modules in small functions
  • creates a new CLI (haddock3-re) dedicated to interactive tasks. This command can be launched with three subcommands (so far): score, clustfcc, and clustrmsd
  • embeds json schemas into the output html files

As an example, the rescoring CLI can be applied to a caprieval step as follows:
haddock3-re score run1-ranairCDR-test/02_caprieval/ -e 0.3
this will change the weight of the electrostatic component to 0.3, keeping the other weights to the ones that were originally used by haddock3. A new directory will be created in the workflow, named 02_caprieval_interactive in this case, that will be used by the haddock3 webapp to allow for interactive rescoring in the browser.

The same logic applies to the two reclustering subcommands. They have the feature that they can trace back the closest caprieval folder and update it, thus allowing to rerun the analysis with the modified clusters. If a CNS module is found in between, the search is interrupted.

As an example, take a workflow with
0_topoaa
1_rigidbody
2_caprieval
3_clustfcc

running haddock3-re clustfcc workflow/3_clustfcc will create a folder named 3_clustfcc_interactive, that contains the data present in 2_caprieval with the new clustering information. if there was a flexref folder between 2_caprieval and 3_clustfcc the caprieval data would not be touched.

@sverhoeven cannot add you as a reviewer, but please have a look at this PR.

@sverhoeven
Copy link
Contributor

Not all examples have their clustr params renamed completely. For example

tolerance = 50 # the number of clusters to be formed

Is there a test or command to check that the recipes in examples directory are valid?

@mgiulini
Copy link
Contributor Author

Not all examples have their clustr params renamed completely. For example

tolerance = 50 # the number of clusters to be formed

Is there a test or command to check that the recipes in examples directory are valid?

good catch, thanks! they should be OK now. They were mainly coming from a merging of another branch

we don't have that validation yet, but it would be a very nice integration test..let me think about it!

@VGPReys
Copy link
Contributor

VGPReys commented Jan 16, 2024

we don't have that validation yet, but it would be a very nice integration test..let me think about it!

Great idea !
The functions validating the config files should do the trick.
Should look like the function setup_run(cfg_filepath), present in gear/prepare_run.py

@mgiulini mgiulini requested a review from VGPReys January 26, 2024 09:39
@sverhoeven
Copy link
Contributor

Re-Rerunning does not seem to work correctly.

When I run docking-antibody-antigen-ranairCDR-full.cfg and then

haddock3-re clustfcc --clust_cutoff 0.6 --strictness 0.25 --min_population 4 output/09_clustfcc/ && \
haddock3-analyse --is_cleaned True --inter True -m 9 -r output/
wc -l  output/09_clustfcc_interactive/capri_clt.tsv output/analysis/09_clustfcc_interactive_analysis/capri_clt.tsv
   9 output/09_clustfcc_interactive/capri_clt.tsv
   9 output/analysis/09_clustfcc_interactive_analysis/capri_clt.tsv

and then run haddock3-re with different arguments.

haddock3-re clustfcc --clust_cutoff 0.6 --strictness 0.5 --min_population 4 output/09_clustfcc/ && \
haddock3-analyse --is_cleaned True --inter True -m 9 -r output/
wc -l  output/09_clustfcc_interactive/capri_clt.tsv output/analysis/09_clustfcc_interactive_analysis/capri_clt.tsv
   4 output/09_clustfcc_interactive/capri_clt.tsv
   9 output/analysis/09_clustfcc_interactive_analysis/capri_clt.tsv

I expected output/analysis/09_clustfcc_interactive_analysis/capri_clt.tsv to have 4 lines not 9.

@sverhoeven
Copy link
Contributor

re-re-running clustrmsd, using docking-protein-glycan-test.cfg example, also goes out of sync

wc -l 10_clustrmsd_interactive/capri_clt.tsv analysis/10_clustrmsd_interactive_analysis/capri_clt.tsv
   4 10_clustrmsd_interactive/capri_clt.tsv
  10 analysis/10_clustrmsd_interactive_analysis/capri_clt.tsv

@mgiulini
Copy link
Contributor Author

mgiulini commented Jan 29, 2024

@sverhoeven I think I fixed your problem, can you check if your examples work now?

Copy link
Contributor

@VGPReys VGPReys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, good to go 🚀

@sverhoeven
Copy link
Contributor

@sverhoeven I think I fixed your problem, can you check if your examples work now?

Yep it works, running clustfcc a second time has updated the tsv file.

@mgiulini mgiulini merged commit 30685df into main Feb 1, 2024
4 checks passed
@VGPReys VGPReys deleted the interactive_rescoring branch April 18, 2024 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis/postprocessing related to the analysis/postprocessing of a haddock3 run enhancement Enhancing an existing feature of adding a new one m|clustfcc clustfcc module m|clustrmsd
Projects
None yet
4 participants