Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize wfmash #72

Merged
merged 2 commits into from
Nov 30, 2021
Merged

Parallelize wfmash #72

merged 2 commits into from
Nov 30, 2021

Conversation

subwaystation
Copy link
Collaborator

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/pangenome branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

This PR parallelizes the wfmash process so it can scale on a whole cluster. This is possible with the https://github.com/ekg/wfmash/blob/master/scripts/split_approx_mappings_in_chunks.py script thanks to @AndreaGuarracino!

  • Added process wfmashMap which only runs the approximate mapping step of wfmash.
  • Added process splitApproxMappingsInChunks which divides the approximate .paf into chunks of equal problem size.
  • Added process wfmashAlign which performs the base-level alignment step of wfmash for each chunk in parallel process submissions.
  • The --wfmash_only parameter stops the pipeline after the wfmash step. This makes it possible for user to scale only wfmash across a whole cluster.
  • Updated tests, JSON schemas, etc.

@subwaystation
Copy link
Collaborator Author

As expected, only the nf-core linting is failing. Ready to merge :)

@subwaystation subwaystation merged commit 0d49ec1 into dev Nov 30, 2021
@heuermh heuermh deleted the parallelize_wfmash branch February 22, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant