Efficiency improvements #1

ktmeaton · 2023-04-24T18:18:48Z

Currently, recombination detection is slow at 5 seconds / sequences. Multiprocessing helps (--threads) but certainly there is code efficiency improvements needed.

The text was updated successfully, but these errors were encountered:

ktmeaton · 2023-04-24T18:48:50Z

I think most of the slowdown is purely in passing large objects as parameters to genome_mp.

Passing simple parameters (boolean, logger, int), runs at 200-350 sequences / second.
Passing complex parameters (tree, dataframe), runs at 1-3 sequences/second.
Passing+analyzing complex parameters, runs at 0.2 sequences/second (~5 seconds / sequence).

Because of this, I don't think my implementation of the analysis is the biggest problem.

ktmeaton · 2023-04-24T21:18:06Z

This might be a problem in which rust could solve.

wtchoga · 2023-06-11T00:18:22Z

ktmeaton · 2023-11-22T15:53:51Z

The code base rewrite to Rust (PR #5) was extremely helpful for this issue. On a single core, I've seen processing speeds ranging from 1-100 sequences/second. And that's not even taking --threads into account 😀

But I am leaving this unresolved until I benchmark with a larger dataset (ex. VirusSeq).

This was referenced Nov 22, 2023

Rustify #4

Closed

Rustify #5

Merged

ktmeaton mentioned this issue Dec 5, 2023

explore: Find unknown recombinants in dataset #12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiency improvements #1

Efficiency improvements #1

ktmeaton commented Apr 24, 2023

ktmeaton commented Apr 24, 2023

ktmeaton commented Apr 24, 2023 •

edited

Loading

wtchoga commented Jun 11, 2023

ktmeaton commented Nov 22, 2023

Efficiency improvements #1

Efficiency improvements #1

Comments

ktmeaton commented Apr 24, 2023

ktmeaton commented Apr 24, 2023

ktmeaton commented Apr 24, 2023 • edited Loading

wtchoga commented Jun 11, 2023

ktmeaton commented Nov 22, 2023

ktmeaton commented Apr 24, 2023 •

edited

Loading