Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stereo Duplex sloooooow #68

Closed
adbeggs opened this issue Dec 28, 2022 · 33 comments
Closed

Stereo Duplex sloooooow #68

adbeggs opened this issue Dec 28, 2022 · 33 comments
Assignees
Labels
enhancement New feature or request

Comments

@adbeggs
Copy link

adbeggs commented Dec 28, 2022

Hi all

Running the Stereo pipeline on both a V100 (our P24, fully updated) and a HPC A30 node.. both are considerably slower than the Guppy Duplex pipeline... any suggestions? Ironically the A30 at full tilt seems slower than the V100

From the P24:

/data/software/dorado/bin/dorado duplex "/data/software/dorado/models/dna_r10.4.1_e8.2_400bps_sup@v4.0.0" pod5/ --pairs pairs_from_bam/pair_ids_filtered.txt | samtools view -b > duplex_dorado.bam

From our HPC:

#SBATCH --gres gpu:a30:1
#SBATCH --time 7-0:0:0
#SBATCH --tasks 20
module purge
module load bluebear
module load bear-apps/2021b
module load CUDA/11.4.1
module load SAMtools/1.15.1-GCC-11.2.0
export LD_LIBRARY_PATH=/rds/projects/b/beggsa-clinicalnanopore/software/dorado/lib:$LD_LIBRARY_PATH
/rds/projects/b/beggsa-clinicalnanopore/software/dorado/bin/dorado duplex /rds/projects/b/beggsa-clinicalnanopore/software/dorado/models/dna_r10.4.1_e8.2_400bps_sup@v4.0.0 pod5/ --pairs pairs_from_bam/pair_ids_filtered.txt | samtools view -h > duplexcalls.bam

Many thanks in advance!

Andrew

@adbeggs
Copy link
Author

adbeggs commented Dec 28, 2022

PS At the current rate it is going on the V100 it won't finish for 90 days! Guppy would usually take 4-5 days depending on the volume of data

@vellamike
Copy link
Collaborator

Hi Andrew, that seems odd, a few questions:

  1. How much available RAM is there on the system?
  2. What duplex pairing rates are you observing?

There is an edge case where Stereo will run slowly if follow on rates are low, especially if you run out of RAM. I suspect this is what you are encountering. It's something we will fix early in the new year.

@adbeggs
Copy link
Author

adbeggs commented Dec 28, 2022

HI MIke

The nodes have 500GB of system RAM but weren't being given the entire node, I have set it to giving it the entire node but still very very slow, in fact on our HPC dorado initiates but doesn't run - I might recompile from source to see if that makes any difference. Output is here:

CUDA/11.4.1
GCCcore/11.2.0
zlib/1.2.11-GCCcore-11.2.0
binutils/2.37-GCCcore-11.2.0
GCC/11.2.0
ncurses/6.2-GCCcore-11.2.0
zlib/1.2.11-GCCcore-11.2.0
bzip2/1.0.8-GCCcore-11.2.0
XZ/5.2.5-GCCcore-11.2.0
OpenSSL/1.1
cURL/7.78.0-GCCcore-11.2.0
SAMtools/1.15.1-GCC-11.2.0
[2022-12-28 14:14:22.917] [info] > Loading pairs file
[2022-12-28 14:14:22.939] [info] > Pairs file loaded
[2022-12-28 14:14:25.542] [warning] > warning: auto batchsize detection failed
[2022-12-28 14:14:27.389] [info] > Starting Stereo Duplex pipeline

It just sits there for hours and hours not doing anything. Duplex pairing rates on this library are 60%.

BW

Andrew

@vellamike
Copy link
Collaborator

vellamike commented Dec 28, 2022 via email

@vellamike
Copy link
Collaborator

vellamike commented Dec 28, 2022 via email

@adbeggs
Copy link
Author

adbeggs commented Dec 28, 2022

Hi Mike

Yes, simplex calling is working fine, calling very quickly as expected. There are 20 cores available on this node (it's an Icelake one). When I run it memory usage peaks at only 5G:

| Requested cpu=20,mem=400G,node=1,billing=20,gres/gpu=1 - 7-00:00:00 walltime
| Assigned to nodes bear-pg0103u14a
| Command /rds/projects/b/beggsa-clinicalnanopore/adb/NA12878/20221212_1633_3E_PAM86221_1ab2d60f/rundorado.slurm
| WorkDir /rds/projects/b/beggsa-clinicalnanopore/adb/NA12878/20221212_1633_3E_PAM86221_1ab2d60f
+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
| Finished at Wed Dec 28 14:35:15 2022 for beggsa(8152) on the BlueBEAR Cluster
| Required (00:13.314 cputime, 5017850K memory used) - 00:01:29 walltime
| JobState COMPLETING - Reason None
| Exitcode 0:15
+--------------------------------------------------------------------------+

I terminated the job as it isn't doing anything...

@adbeggs
Copy link
Author

adbeggs commented Dec 28, 2022

Even on the P24 it is painfully slow, it has been running for 2 hours and has only managed to process 7200 reads!

@adbeggs
Copy link
Author

adbeggs commented Dec 28, 2022

Only thing I can think of is I am running it on a single, very large pod5 file (1100GB) - would that make a difference - it doesn't seem to for simplex.

@vellamike
Copy link
Collaborator

vellamike commented Dec 28, 2022 via email

@incoherentian
Copy link

#SBATCH --tasks 20

I can't explain the V100, but I think this SBATCH parameter is going to try loading 20x instances of Dorado, all of them trying to access the entire A30. What happens when you change this to the following?

#SBATCH --tasks 1
#SBATCH --cpus-per-task=20

@incoherentian
Copy link

What I was actually thinking was

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20

@adbeggs
Copy link
Author

adbeggs commented Dec 30, 2022

Update - it is a lot quicker with single POD5 files... interesting!

@dithiii
Copy link

dithiii commented Dec 31, 2022

Same issue here with slow stereo calling but it persists even when using multiple small pod5s. I'm using dorado 0.1.1, ubuntu 22.04. Simplex calling with the dna_r10.4.1_e8.2_400bps_fast@v4.0.0 model calls at 40,000 reads/s, but when I try stereo duplex, even with the "fast" model, it calls at 300 reads per minute.

Duplex tools claimed I had 18% duplex rate. Any fix?

@Kirk3gaard
Copy link

Hi

Would there be a speed benefit from using the sam file from the simplex super accuracy basecalling as input for dorado duplex calling (I think that was mentioned at NCM)? And if so how is that supplied?

I tried running
dorado duplex dna_r10.4.1_e8.2_400bps_sup@v4.0.0 --pairs pairs_from_sam/pair_ids_filtered.txt sam_dir/ > duplex_orig.sam

However, it did not did find any reads and just completed with 0 reads basecalled.

dorado duplex -h
Usage: dorado [-h] [--pairs VAR] [--emit-fastq] [--threads VAR] [--device VAR] [--batchsize VAR] [--chunksize VAR] [--overlap VAR] [--num_runners VAR] model reads

Positional arguments:
model Model
reads Reads in Pod5 format or BAM/SAM format for basespace.

@vellamike
Copy link
Collaborator

vellamike commented Jan 19, 2023

Hi @Kirk3gaard in sam_dir do you have pod5 files or a SAM file? Dorado Duplex calling requires the raw data in POD5 format, this is what reads in the help is referring to.

@Kirk3gaard
Copy link

Kirk3gaard commented Jan 20, 2023

Hi @vellamike so the help function suggesting "BAM/SAM format for basespace." is not an option for speeding things up? Or even a real option anymore?
I was just wondering how I get to the "duplex for free" scenario mentioned in the NCM presentation (see below) when I have done simplex calling with super accuracy mode already.
(The RTX 4090 card basecalled our best promethion run ~200 Gbp in 3 days with sup for simplex reads)
Reference: https://youtu.be/8DVMG7FEBys
image

@vellamike
Copy link
Collaborator

Ah, that is a hidden method for the eagle-eyed :)

This is a method which is very fast but works in sequence-space only so is less accurate, please run it like so:

duplex basespace /path/to/bam.bam --pairs /path/to/pairs.txt

This method is experimental - feedback welcome!

@Kirk3gaard
Copy link

Sneaky. Thanks a lot!
Okay so the recommended way of getting the most out of a sequencing run (and the GPUs) at the moment is to

  1. basecall all the pod5s with fast for getting the pairs
  2. sort pod5 by channel ID (someone wrote a script for that?)
  3. then run duplex calling with the sup model on the pairs using sorted pod5s
  4. run simplex calling on the remaining reads with sup

Looking forward to see a simplification of this process to output simplex and duplex with one command.
I will give the basespace and pod5 based versions a try and see how long it takes.

@vellamike
Copy link
Collaborator

Hi @Kirk3gaard - yes, that is currently the best method. We are working on usability and performance improvements all the time and any feedback is very welcome.

@vellamike
Copy link
Collaborator

P.S sorting pod5 by channel ID is a "Nice to have" but not crucial.

@adbeggs
Copy link
Author

adbeggs commented Jan 23, 2023

Hi @vellamike still seeing this issue. Have single POD5 files, fast calling on dorado on our A30 completes at 3e07 samples/s but when call duplex it justs sits there saying "Starting stereo duplex pipeline".

I've checked and it has the whole a30 node available to it so shouldn't be running slowly. I am running it on Redhat but can't see anything specific that might be causing the issue

@adbeggs
Copy link
Author

adbeggs commented Jan 23, 2023

THe whole run is teeny tiny - only 200k reads but meant to be 40% duplex

@vellamike
Copy link
Collaborator

vellamike commented Jan 23, 2023

Can you show me the Duplex command you are running?

Also, is your pairs file tab or space delimited? It needs to be space delimited, could you check this?

@Kirk3gaard
Copy link

"basespace" mode tried to load the entire BAM file into RAM before starting and died when it ran out of RAM. Maybe worth enabling a smarter way to avoid the need for massive memory.

I assume that only the two reads in the pair are needed to perform duplex calling so it should be possible to load subsets of pairs without crashing. Enabling the use of fastq files as input might make it even more flexible for people to prepare subsets using existing tools in combination with the par id file.

@vellamike
Copy link
Collaborator

Hi @Kirk3gaard - that is indeed a problem with the current implementation of the Basespace method, especially for very large BAMs. Could split your BAM by channel ID into multiple BAMs and run duplex on each?

@Kirk3gaard
Copy link

Tried running duplex with the pod5 files rather than basespace and it crashed after generating a sam file of the same size every time I tried. I looked through the syslog and it apparently runs well for some time and then suddenly runs out of memory.

"Out of memory: killed process 50831 (dorado)"
"oom_reaper: reaped process 50831 (dorado)"

I would assume that it should be possible to run stereo duplex calling on a machine with 96 GB RAM and 24 GB GPU RAM as the software should not need to load all of the pod5 data into memory at once or whatever is causing this.
Any hint as to what could be causing this?

@vellamike
Copy link
Collaborator

Hi Rasmus, right now the host memory consumption is governed in a complicated way by a few parameters:

  1. Number of reads
  2. Read length
  3. Pairing rate
  4. POD5 ordering

We have an upcoming release soon which significantly reduces the memory requirement on the host side for duplex. In the meantime, one thing you could do is demultiplex your pod5 by channel into multiple pod5s and run stereo on each independently.

@iiSeymour iiSeymour added the enhancement New feature or request label Feb 22, 2023
@vellamike
Copy link
Collaborator

Hi @Kirk3gaard @adbeggs @incoherentian @dithiii ,

Version 0.2.1 of Dorado introduces big speed and RAM utilisation improvements to Duplex calling - could you try this?

@Kirk3gaard
Copy link

Should we test whether it runs without splitting reads by channel?

@vellamike
Copy link
Collaborator

vellamike commented Feb 22, 2023 via email

@adbeggs
Copy link
Author

adbeggs commented Feb 25, 2023 via email

@Kirk3gaard
Copy link

Kirk3gaard commented Feb 27, 2023

It started nicely. Then processed 310600 reads before it got "Killed"

Commands used to run dorado and output:

MODELPATH="/home/ubuntu/Desktop/software/dorado-0.2.1-linux-x64/models"
MODEL="dna_r10.4.1_e8.2_400bps_sup@v4.1.0"
POD5DIR=pod5/

dorado duplex $MODELPATH/$MODEL --device "cuda:all" --min-qscore 25 --pairs pairs_from_sam/pair_ids_filtered.txt $POD5DIR/ > duplex_$MODEL.sam
[2023-02-23 15:52:55.097] [info] > Loading pairs file
[2023-02-23 15:52:55.400] [info] > Pairs file loaded
[2023-02-23 15:52:59.938] [info] > Starting Stereo Duplex pipeline
> Reads processed: 310600Killed

@iiSeymour
Copy link
Member

Stereo performance improvements in https://github.com/nanoporetech/dorado/releases/tag/v0.2.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants