Stereo Duplex sloooooow #68

adbeggs · 2022-12-28T12:29:38Z

Hi all

Running the Stereo pipeline on both a V100 (our P24, fully updated) and a HPC A30 node.. both are considerably slower than the Guppy Duplex pipeline... any suggestions? Ironically the A30 at full tilt seems slower than the V100

From the P24:

/data/software/dorado/bin/dorado duplex "/data/software/dorado/models/dna_r10.4.1_e8.2_400bps_sup@v4.0.0" pod5/ --pairs pairs_from_bam/pair_ids_filtered.txt | samtools view -b > duplex_dorado.bam

From our HPC:

#SBATCH --gres gpu:a30:1
#SBATCH --time 7-0:0:0
#SBATCH --tasks 20
module purge
module load bluebear
module load bear-apps/2021b
module load CUDA/11.4.1
module load SAMtools/1.15.1-GCC-11.2.0
export LD_LIBRARY_PATH=/rds/projects/b/beggsa-clinicalnanopore/software/dorado/lib:$LD_LIBRARY_PATH
/rds/projects/b/beggsa-clinicalnanopore/software/dorado/bin/dorado duplex /rds/projects/b/beggsa-clinicalnanopore/software/dorado/models/dna_r10.4.1_e8.2_400bps_sup@v4.0.0 pod5/ --pairs pairs_from_bam/pair_ids_filtered.txt | samtools view -h > duplexcalls.bam

Many thanks in advance!

Andrew

The text was updated successfully, but these errors were encountered:

adbeggs · 2022-12-28T12:31:05Z

PS At the current rate it is going on the V100 it won't finish for 90 days! Guppy would usually take 4-5 days depending on the volume of data

vellamike · 2022-12-28T14:02:57Z

Hi Andrew, that seems odd, a few questions:

How much available RAM is there on the system?
What duplex pairing rates are you observing?

There is an edge case where Stereo will run slowly if follow on rates are low, especially if you run out of RAM. I suspect this is what you are encountering. It's something we will fix early in the new year.

adbeggs · 2022-12-28T14:16:13Z

HI MIke

The nodes have 500GB of system RAM but weren't being given the entire node, I have set it to giving it the entire node but still very very slow, in fact on our HPC dorado initiates but doesn't run - I might recompile from source to see if that makes any difference. Output is here:

CUDA/11.4.1
GCCcore/11.2.0
zlib/1.2.11-GCCcore-11.2.0
binutils/2.37-GCCcore-11.2.0
GCC/11.2.0
ncurses/6.2-GCCcore-11.2.0
zlib/1.2.11-GCCcore-11.2.0
bzip2/1.0.8-GCCcore-11.2.0
XZ/5.2.5-GCCcore-11.2.0
OpenSSL/1.1
cURL/7.78.0-GCCcore-11.2.0
SAMtools/1.15.1-GCC-11.2.0
[2022-12-28 14:14:22.917] [info] > Loading pairs file
[2022-12-28 14:14:22.939] [info] > Pairs file loaded
[2022-12-28 14:14:25.542] [warning] > warning: auto batchsize detection failed
[2022-12-28 14:14:27.389] [info] > Starting Stereo Duplex pipeline

It just sits there for hours and hours not doing anything. Duplex pairing rates on this library are 60%.

BW

Andrew

vellamike · 2022-12-28T14:26:44Z

That doesn’t match my theory, could you confirm if simplex calling works on this node? Could you also check if you are running out of RAM and falling back to swap memory?

…

On Wed, 28 Dec 2022 at 14:16, Andrew Beggs ***@***.***> wrote: HI MIke The nodes have 500GB of system RAM but weren't being given the entire node, I have set it to giving it the entire node but still very very slow, in fact on our HPC dorado initiates but doesn't run - I might recompile from source to see if that makes any difference. Output is here: CUDA/11.4.1 GCCcore/11.2.0 zlib/1.2.11-GCCcore-11.2.0 binutils/2.37-GCCcore-11.2.0 GCC/11.2.0 ncurses/6.2-GCCcore-11.2.0 zlib/1.2.11-GCCcore-11.2.0 bzip2/1.0.8-GCCcore-11.2.0 XZ/5.2.5-GCCcore-11.2.0 OpenSSL/1.1 cURL/7.78.0-GCCcore-11.2.0 SAMtools/1.15.1-GCC-11.2.0 [2022-12-28 14:14:22.917] [info] > Loading pairs file [2022-12-28 14:14:22.939] [info] > Pairs file loaded [2022-12-28 14:14:25.542] [warning] > warning: auto batchsize detection failed [2022-12-28 14:14:27.389] [info] > Starting Stereo Duplex pipeline It just sits there for hours and hours not doing anything. Duplex pairing rates on this library are 60%. BW Andrew — Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALYB7JCXJIQMGZZUMGSIALWPRDTRANCNFSM6AAAAAATLHC4SE> . You are receiving this because you commented.Message ID: ***@***.***>

vellamike · 2022-12-28T14:33:49Z

Ps how many CPU cores are available to the job on this node?

…

On Wed, 28 Dec 2022 at 14:26, Mike Vella ***@***.***> wrote: That doesn’t match my theory, could you confirm if simplex calling works on this node? Could you also check if you are running out of RAM and falling back to swap memory? On Wed, 28 Dec 2022 at 14:16, Andrew Beggs ***@***.***> wrote: > HI MIke > > The nodes have 500GB of system RAM but weren't being given the entire > node, I have set it to giving it the entire node but still very very slow, > in fact on our HPC dorado initiates but doesn't run - I might recompile > from source to see if that makes any difference. Output is here: > > CUDA/11.4.1 > GCCcore/11.2.0 > zlib/1.2.11-GCCcore-11.2.0 > binutils/2.37-GCCcore-11.2.0 > GCC/11.2.0 > ncurses/6.2-GCCcore-11.2.0 > zlib/1.2.11-GCCcore-11.2.0 > bzip2/1.0.8-GCCcore-11.2.0 > XZ/5.2.5-GCCcore-11.2.0 > OpenSSL/1.1 > cURL/7.78.0-GCCcore-11.2.0 > SAMtools/1.15.1-GCC-11.2.0 > [2022-12-28 14:14:22.917] [info] > Loading pairs file > [2022-12-28 14:14:22.939] [info] > Pairs file loaded > [2022-12-28 14:14:25.542] [warning] > warning: auto batchsize detection failed > [2022-12-28 14:14:27.389] [info] > Starting Stereo Duplex pipeline > > It just sits there for hours and hours not doing anything. Duplex pairing > rates on this library are 60%. > > BW > > Andrew > > — > Reply to this email directly, view it on GitHub > <#68 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AALYB7JCXJIQMGZZUMGSIALWPRDTRANCNFSM6AAAAAATLHC4SE> > . > You are receiving this because you commented.Message ID: > ***@***.***> >

adbeggs · 2022-12-28T14:37:16Z

Hi Mike

Yes, simplex calling is working fine, calling very quickly as expected. There are 20 cores available on this node (it's an Icelake one). When I run it memory usage peaks at only 5G:

| Requested cpu=20,mem=400G,node=1,billing=20,gres/gpu=1 - 7-00:00:00 walltime
| Assigned to nodes bear-pg0103u14a
| Command /rds/projects/b/beggsa-clinicalnanopore/adb/NA12878/20221212_1633_3E_PAM86221_1ab2d60f/rundorado.slurm
| WorkDir /rds/projects/b/beggsa-clinicalnanopore/adb/NA12878/20221212_1633_3E_PAM86221_1ab2d60f
+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
| Finished at Wed Dec 28 14:35:15 2022 for beggsa(8152) on the BlueBEAR Cluster
| Required (00:13.314 cputime, 5017850K memory used) - 00:01:29 walltime
| JobState COMPLETING - Reason None
| Exitcode 0:15
+--------------------------------------------------------------------------+

I terminated the job as it isn't doing anything...

adbeggs · 2022-12-28T14:39:30Z

Even on the P24 it is painfully slow, it has been running for 2 hours and has only managed to process 7200 reads!

adbeggs · 2022-12-28T14:41:17Z

Only thing I can think of is I am running it on a single, very large pod5 file (1100GB) - would that make a difference - it doesn't seem to for simplex.

vellamike · 2022-12-28T15:36:55Z

Ah, a very large pod5 is a relatively untested case and I can see several ways it would cause poor performance - luckily all quite fixable. I will keep this issue open and get a fix to you in early Jan. In the meantime could you demux the pod5 into smaller ones by channel ID and run stereo independently for each? This should be a best case scenario for performance with the present implementation.

…

On Wed, 28 Dec 2022 at 14:41, Andrew Beggs ***@***.***> wrote: Only thing I can think of is I am running it on a single, very large pod5 file (1100GB) - would that make a difference - it doesn't seem to for simplex. — Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALYB7OGWKNKOW43VYQKWSDWPRGRPANCNFSM6AAAAAATLHC4SE> . You are receiving this because you commented.Message ID: ***@***.***>

incoherentian · 2022-12-28T18:24:29Z

#SBATCH --tasks 20

I can't explain the V100, but I think this SBATCH parameter is going to try loading 20x instances of Dorado, all of them trying to access the entire A30. What happens when you change this to the following?

#SBATCH --tasks 1
#SBATCH --cpus-per-task=20

incoherentian · 2022-12-28T18:30:23Z

What I was actually thinking was

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20

adbeggs · 2022-12-30T15:32:13Z

Update - it is a lot quicker with single POD5 files... interesting!

dithiii · 2022-12-31T22:26:25Z

Same issue here with slow stereo calling but it persists even when using multiple small pod5s. I'm using dorado 0.1.1, ubuntu 22.04. Simplex calling with the dna_r10.4.1_e8.2_400bps_fast@v4.0.0 model calls at 40,000 reads/s, but when I try stereo duplex, even with the "fast" model, it calls at 300 reads per minute.

Duplex tools claimed I had 18% duplex rate. Any fix?

Kirk3gaard · 2023-01-19T14:29:25Z

Hi

Would there be a speed benefit from using the sam file from the simplex super accuracy basecalling as input for dorado duplex calling (I think that was mentioned at NCM)? And if so how is that supplied?

I tried running
dorado duplex dna_r10.4.1_e8.2_400bps_sup@v4.0.0 --pairs pairs_from_sam/pair_ids_filtered.txt sam_dir/ > duplex_orig.sam

However, it did not did find any reads and just completed with 0 reads basecalled.

dorado duplex -h
Usage: dorado [-h] [--pairs VAR] [--emit-fastq] [--threads VAR] [--device VAR] [--batchsize VAR] [--chunksize VAR] [--overlap VAR] [--num_runners VAR] model reads

Positional arguments:
model Model
reads Reads in Pod5 format or BAM/SAM format for basespace.

vellamike · 2023-01-19T15:54:48Z

Hi @Kirk3gaard in sam_dir do you have pod5 files or a SAM file? Dorado Duplex calling requires the raw data in POD5 format, this is what reads in the help is referring to.

Kirk3gaard · 2023-01-20T07:04:54Z

Hi @vellamike so the help function suggesting "BAM/SAM format for basespace." is not an option for speeding things up? Or even a real option anymore?
I was just wondering how I get to the "duplex for free" scenario mentioned in the NCM presentation (see below) when I have done simplex calling with super accuracy mode already.
(The RTX 4090 card basecalled our best promethion run ~200 Gbp in 3 days with sup for simplex reads)
Reference: https://youtu.be/8DVMG7FEBys

vellamike · 2023-01-20T09:58:15Z

Ah, that is a hidden method for the eagle-eyed :)

This is a method which is very fast but works in sequence-space only so is less accurate, please run it like so:

duplex basespace /path/to/bam.bam --pairs /path/to/pairs.txt

This method is experimental - feedback welcome!

Kirk3gaard · 2023-01-20T10:18:32Z

Sneaky. Thanks a lot!
Okay so the recommended way of getting the most out of a sequencing run (and the GPUs) at the moment is to

basecall all the pod5s with fast for getting the pairs
sort pod5 by channel ID (someone wrote a script for that?)
then run duplex calling with the sup model on the pairs using sorted pod5s
run simplex calling on the remaining reads with sup

Looking forward to see a simplification of this process to output simplex and duplex with one command.
I will give the basespace and pod5 based versions a try and see how long it takes.

vellamike · 2023-01-20T15:14:01Z

Hi @Kirk3gaard - yes, that is currently the best method. We are working on usability and performance improvements all the time and any feedback is very welcome.

vellamike · 2023-01-20T15:14:25Z

P.S sorting pod5 by channel ID is a "Nice to have" but not crucial.

adbeggs · 2023-01-23T16:56:33Z

Hi @vellamike still seeing this issue. Have single POD5 files, fast calling on dorado on our A30 completes at 3e07 samples/s but when call duplex it justs sits there saying "Starting stereo duplex pipeline".

I've checked and it has the whole a30 node available to it so shouldn't be running slowly. I am running it on Redhat but can't see anything specific that might be causing the issue

adbeggs · 2023-01-23T16:56:56Z

THe whole run is teeny tiny - only 200k reads but meant to be 40% duplex

vellamike · 2023-01-23T19:02:08Z

Can you show me the Duplex command you are running?

Also, is your pairs file tab or space delimited? It needs to be space delimited, could you check this?

Kirk3gaard · 2023-01-24T12:13:13Z

"basespace" mode tried to load the entire BAM file into RAM before starting and died when it ran out of RAM. Maybe worth enabling a smarter way to avoid the need for massive memory.

I assume that only the two reads in the pair are needed to perform duplex calling so it should be possible to load subsets of pairs without crashing. Enabling the use of fastq files as input might make it even more flexible for people to prepare subsets using existing tools in combination with the par id file.

vellamike · 2023-01-24T13:58:41Z

Hi @Kirk3gaard - that is indeed a problem with the current implementation of the Basespace method, especially for very large BAMs. Could split your BAM by channel ID into multiple BAMs and run duplex on each?

Kirk3gaard · 2023-01-26T09:13:13Z

Tried running duplex with the pod5 files rather than basespace and it crashed after generating a sam file of the same size every time I tried. I looked through the syslog and it apparently runs well for some time and then suddenly runs out of memory.

"Out of memory: killed process 50831 (dorado)"
"oom_reaper: reaped process 50831 (dorado)"

I would assume that it should be possible to run stereo duplex calling on a machine with 96 GB RAM and 24 GB GPU RAM as the software should not need to load all of the pod5 data into memory at once or whatever is causing this.
Any hint as to what could be causing this?

vellamike · 2023-01-28T09:02:44Z

Hi Rasmus, right now the host memory consumption is governed in a complicated way by a few parameters:

Number of reads
Read length
Pairing rate
POD5 ordering

We have an upcoming release soon which significantly reduces the memory requirement on the host side for duplex. In the meantime, one thing you could do is demultiplex your pod5 by channel into multiple pod5s and run stereo on each independently.

vellamike · 2023-02-22T17:33:12Z

Hi @Kirk3gaard @adbeggs @incoherentian @dithiii ,

Version 0.2.1 of Dorado introduces big speed and RAM utilisation improvements to Duplex calling - could you try this?

Kirk3gaard · 2023-02-22T19:58:11Z

Should we test whether it runs without splitting reads by channel?

vellamike · 2023-02-22T19:59:27Z

Yes please - memory consumption is down quite a bit and this should work fine now.

…

On Wed, 22 Feb 2023 at 19:58, Rasmus Kirkegaard ***@***.***> wrote: Should we test whether it runs without splitting reads by channel? — Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALYB7JATR6R35D5AEKJPP3WYZVV5ANCNFSM6AAAAAATLHC4SE> . You are receiving this because you were assigned.Message ID: ***@***.***>

adbeggs · 2023-02-25T11:41:27Z

I will try! From: Mike Vella ***@***.***> Sent: 22 February 2023 20:00 To: nanoporetech/dorado ***@***.***> Cc: Andrew Beggs (Cancer and Genomic Sciences) ***@***.***>; Mention ***@***.***> Subject: Re: [nanoporetech/dorado] Stereo Duplex sloooooow (Issue #68) Yes please - memory consumption is down quite a bit and this should work fine now.

On Wed, 22 Feb 2023 at 19:58, Rasmus Kirkegaard ***@***.***> wrote: Should we test whether it runs without splitting reads by channel? — Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALYB7JATR6R35D5AEKJPP3WYZVV5ANCNFSM6AAAAAATLHC4SE> . You are receiving this because you were assigned.Message ID: ***@***.***>

— Reply to this email directly, view it on GitHub<#68 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AC7KTDEA3KAJZXRRFINNAQDWYZV2VANCNFSM6AAAAAATLHC4SE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Kirk3gaard · 2023-02-27T10:23:59Z

It started nicely. Then processed 310600 reads before it got "Killed"

Commands used to run dorado and output:

MODELPATH="/home/ubuntu/Desktop/software/dorado-0.2.1-linux-x64/models"
MODEL="dna_r10.4.1_e8.2_400bps_sup@v4.1.0"
POD5DIR=pod5/

dorado duplex $MODELPATH/$MODEL --device "cuda:all" --min-qscore 25 --pairs pairs_from_sam/pair_ids_filtered.txt $POD5DIR/ > duplex_$MODEL.sam
[2023-02-23 15:52:55.097] [info] > Loading pairs file
[2023-02-23 15:52:55.400] [info] > Pairs file loaded
[2023-02-23 15:52:59.938] [info] > Starting Stereo Duplex pipeline
> Reads processed: 310600Killed

iiSeymour · 2023-04-04T18:18:17Z

Stereo performance improvements in https://github.com/nanoporetech/dorado/releases/tag/v0.2.2

Kirk3gaard mentioned this issue Jan 25, 2023

pod5 inspect reads format to match the needs for pod5 subset nanoporetech/pod5-file-format#21

Closed

Kirk3gaard mentioned this issue Jan 31, 2023

RAM needs? #86

Closed

iiSeymour assigned vellamike Feb 22, 2023

iiSeymour added the enhancement New feature or request label Feb 22, 2023

iiSeymour closed this as completed Apr 4, 2023

shenker mentioned this issue Sep 10, 2023

GPU usage metrics nextflow-io/nextflow#4286

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stereo Duplex sloooooow #68

Stereo Duplex sloooooow #68

adbeggs commented Dec 28, 2022

adbeggs commented Dec 28, 2022

vellamike commented Dec 28, 2022

adbeggs commented Dec 28, 2022

vellamike commented Dec 28, 2022 via email

vellamike commented Dec 28, 2022 via email

adbeggs commented Dec 28, 2022

adbeggs commented Dec 28, 2022

adbeggs commented Dec 28, 2022

vellamike commented Dec 28, 2022 via email

incoherentian commented Dec 28, 2022

incoherentian commented Dec 28, 2022

adbeggs commented Dec 30, 2022

dithiii commented Dec 31, 2022

Kirk3gaard commented Jan 19, 2023

vellamike commented Jan 19, 2023 •

edited

Kirk3gaard commented Jan 20, 2023 •

edited

vellamike commented Jan 20, 2023

Kirk3gaard commented Jan 20, 2023

vellamike commented Jan 20, 2023

vellamike commented Jan 20, 2023

adbeggs commented Jan 23, 2023

adbeggs commented Jan 23, 2023

vellamike commented Jan 23, 2023 •

edited

Kirk3gaard commented Jan 24, 2023

vellamike commented Jan 24, 2023

Kirk3gaard commented Jan 26, 2023

vellamike commented Jan 28, 2023

vellamike commented Feb 22, 2023

Kirk3gaard commented Feb 22, 2023

vellamike commented Feb 22, 2023 via email

adbeggs commented Feb 25, 2023 via email

Kirk3gaard commented Feb 27, 2023 •

edited

iiSeymour commented Apr 4, 2023

Stereo Duplex sloooooow #68

Stereo Duplex sloooooow #68

Comments

adbeggs commented Dec 28, 2022

adbeggs commented Dec 28, 2022

vellamike commented Dec 28, 2022

adbeggs commented Dec 28, 2022

vellamike commented Dec 28, 2022 via email

vellamike commented Dec 28, 2022 via email

adbeggs commented Dec 28, 2022

adbeggs commented Dec 28, 2022

adbeggs commented Dec 28, 2022

vellamike commented Dec 28, 2022 via email

incoherentian commented Dec 28, 2022

incoherentian commented Dec 28, 2022

adbeggs commented Dec 30, 2022

dithiii commented Dec 31, 2022

Kirk3gaard commented Jan 19, 2023

vellamike commented Jan 19, 2023 • edited

Kirk3gaard commented Jan 20, 2023 • edited

vellamike commented Jan 20, 2023

Kirk3gaard commented Jan 20, 2023

vellamike commented Jan 20, 2023

vellamike commented Jan 20, 2023

adbeggs commented Jan 23, 2023

adbeggs commented Jan 23, 2023

vellamike commented Jan 23, 2023 • edited

Kirk3gaard commented Jan 24, 2023

vellamike commented Jan 24, 2023

Kirk3gaard commented Jan 26, 2023

vellamike commented Jan 28, 2023

vellamike commented Feb 22, 2023

Kirk3gaard commented Feb 22, 2023

vellamike commented Feb 22, 2023 via email

adbeggs commented Feb 25, 2023 via email

Kirk3gaard commented Feb 27, 2023 • edited

iiSeymour commented Apr 4, 2023

vellamike commented Jan 19, 2023 •

edited

Kirk3gaard commented Jan 20, 2023 •

edited

vellamike commented Jan 23, 2023 •

edited

Kirk3gaard commented Feb 27, 2023 •

edited