Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency improvements #1

Open
ktmeaton opened this issue Apr 24, 2023 · 4 comments
Open

Efficiency improvements #1

ktmeaton opened this issue Apr 24, 2023 · 4 comments

Comments

@ktmeaton
Copy link
Collaborator

Currently, recombination detection is slow at 5 seconds / sequences. Multiprocessing helps (--threads) but certainly there is code efficiency improvements needed.

@ktmeaton
Copy link
Collaborator Author

I think most of the slowdown is purely in passing large objects as parameters to genome_mp.

  • Passing simple parameters (boolean, logger, int), runs at 200-350 sequences / second.
  • Passing complex parameters (tree, dataframe), runs at 1-3 sequences/second.
  • Passing+analyzing complex parameters, runs at 0.2 sequences/second (~5 seconds / sequence).

Because of this, I don't think my implementation of the analysis is the biggest problem.

@ktmeaton
Copy link
Collaborator Author

ktmeaton commented Apr 24, 2023

This might be a problem in which rust could solve.

@wtchoga
Copy link

wtchoga commented Jun 11, 2023

image

This was referenced Nov 22, 2023
@ktmeaton
Copy link
Collaborator Author

The code base rewrite to Rust (PR #5) was extremely helpful for this issue. On a single core, I've seen processing speeds ranging from 1-100 sequences/second. And that's not even taking --threads into account 😀

But I am leaving this unresolved until I benchmark with a larger dataset (ex. VirusSeq).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants