Logging performance of de novo variant discovery with racon #303

leoisl · 2022-10-31T13:13:06Z

To be done by @mbhall88 . Could you please use this commit?

leoisl · 2022-10-31T15:04:47Z

@mbhall88 could you also please report the max RAM usage? I am quite concerned with this part in denovo racon: https://github.com/rmcolq/pandora/blob/12a08c5483c19fc12411e174970d31c86e842a2d/src/denovo_discovery/discover_main.cpp#L205-L206

This is a dictionary from loci names to the subreads that map to each locus, inferred by pandora map. This structure could get potentially very large, as we basically store a substring of every read that map to each locus (is just the region of the read that maps to that specific locus, but still...). There are potentially many better ways to store this info, but I also want to avoid premature optimisation, and just work on this if RAM is indeed an issue.

mbhall88 · 2022-11-01T00:24:41Z

The nanopore runtime and memory usage of drprg with the previous version of de novo discovery

mbhall88/drprg#15 (comment)

and the Illumina

mbhall88/drprg#15 (comment)

Pandora is the vast majority of runtime and memory in drprg so these act as good benchmark figures, especially given the only thing that will change between them is the pandora version

leoisl · 2022-11-01T22:40:07Z

I just finished running pandora discover at this commit on the paper ONT data. Comparison with paper run:

Paper run

Threads: 16
Runtime: 7.59h
RAM: 10.2 GB

commit 02f9ec

Threads: 16
Runtime: 7.58h
RAM: 16.7 GB

I think runtime is totally fine, we got almost the same exact runtime.

RAM usage is much higher though, probably related to this: #303 (comment) . It will be the RAM bottleneck of the pipeline (pandora compare on ONT data takes 15.7GB, this new denovo implementation takes 1 GB more).

OFC this is data dependent, so let's see how @mbhall88 benchmarks compare

iqbal-lab · 2022-11-01T22:49:55Z

This RAM use is, IMO, acceptable for the moment - i wouldn't postpone the merge to reduce RAM

mbhall88 · 2022-11-07T05:21:56Z

Okay, so here are the updated benchmark figures

Runtime

Illumina

Median is 91s, down from 98s. So it is faster now on Illumina!

Nanopore

Median is now 171s, up from 163s.

Memory

Illumina

Median is 46MB, down from 54MB, so Memory is marginally down too.

Nanopore

Median is now 270MB, up from 240MB.

All in all, I think this acceptable.

leoisl mentioned this issue Oct 31, 2022

Improve runtime of de novo variant discovery with racon #302

Closed

mbhall88 mentioned this issue Nov 1, 2022

Use racon for de novo variant discovery #299

Merged

8 tasks

mbhall88 mentioned this issue Nov 7, 2022

Benchmark figures mbhall88/drprg#15

Closed

leoisl mentioned this issue Dec 8, 2022

Improve pandora discovery RAM usage #315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging performance of de novo variant discovery with racon #303

Logging performance of de novo variant discovery with racon #303

leoisl commented Oct 31, 2022 •

edited

Loading

leoisl commented Oct 31, 2022

mbhall88 commented Nov 1, 2022

leoisl commented Nov 1, 2022

iqbal-lab commented Nov 1, 2022

mbhall88 commented Nov 7, 2022

Logging performance of de novo variant discovery with racon #303

Logging performance of de novo variant discovery with racon #303

Comments

leoisl commented Oct 31, 2022 • edited Loading

leoisl commented Oct 31, 2022

mbhall88 commented Nov 1, 2022

leoisl commented Nov 1, 2022

Paper run

commit 02f9ec

iqbal-lab commented Nov 1, 2022

mbhall88 commented Nov 7, 2022

Runtime

Illumina

Nanopore

Memory

Illumina

Nanopore

leoisl commented Oct 31, 2022 •

edited

Loading