Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better parallelization for medaka #131

Closed
jmtsuji opened this issue Feb 15, 2024 · 4 comments
Closed

Better parallelization for medaka #131

jmtsuji opened this issue Feb 15, 2024 · 4 comments
Labels
enhancement New feature or request long_term

Comments

@jmtsuji
Copy link
Collaborator

jmtsuji commented Feb 15, 2024

A possible way to speed up rotary: according to the medaka GitHub repo, the consensus step of medaka is basically capped at 2 threads. It might make sense for us to implement the more advanced usage of medaka, as shown in the link above, to achieve maximum parallelization. Basically, we can use the --regions flag of medaka consensus to run different regions of the input BAM file (e.g., different contigs, or 1 Mb chunks of contigs) through the consensus step in parallel.

If we wanted to add this speed-up to rotary, we could either implement it directly in snakemake or by making a Python script that runs the medaka (including the consensus step) in parallel.

I don't think this is a high priority, but I wanted to post it here while it is on my mind.

@jmtsuji jmtsuji added the enhancement New feature or request label Feb 15, 2024
@LeeBergstrand
Copy link
Collaborator

@jmtsuji We should profile the pipeline first to figure out the slowest steps and target those first. Medaka is one of the slower steps, but I don't think it's the slowest.

@jmtsuji
Copy link
Collaborator Author

jmtsuji commented Feb 16, 2024

Yes, sounds good to profile the pipeline and figure out what steps are the slowest. Let's table this for now.

@LeeBergstrand
Copy link
Collaborator

@jmtsuji We now have polishing by contig in #147

@LeeBergstrand
Copy link
Collaborator

Addressed in #147

If we run into speed issues, we can look into doing sub-contig polishing, but that might be an over optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request long_term
Projects
None yet
Development

No branches or pull requests

2 participants