-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How fast should dorado simplex (modified bases) basecalling be on a PromethION data tower with 4 A100s? #252
Comments
Hey @rainwala the driver version on the PromethION is recent and the CUDA version doesn't need changing to run dorado. You should expect Two things I would check if you are seeing much lower performance than this:
|
HI @iiSeymour , nvidai-smi confirms that dorado is running on all 4 A100s and not the T1000, but so is guppy_basecaller_server . The former is taking ~40GB graphics RAM, and the latter around 2GB. I have stopped guppy_basecaller_server using the systemctl command you specified (how do I restart it for all 4 GPUs -- is that just |
Yes, |
I did restart dorado, and there was no improvement in the predicted time (sorry that wasn't clear) |
Also, how does one see the speed for dorado is there an option to diplay that? |
Dorado will report the speed at the end of calling, during, you have the ETA from the progress bar only. Is Can you copy and paste the |
Yes pod5_all/ is on the /data volume. Here is the output of nvidia-smi with dorado running:
|
Would actually be pretty cool to get the current speed reported somewhat live Gbp/hour to estimate whether the GPU resource used is suitable for the amount of data + patience that the user has. |
@rainwala the GPU power draw and utilisation are really bad 🤔 what's the CPU load and system memory available like? Can you test the performance on this set without calling mods? |
@iiSeymour this is what it looks like with this command:
|
Would the nvidia-smi output for an anaologous call to guppy be instructive? Also, what is the best command to check cpu load? I'm not currently running anything else on that machine. |
You can use
|
|
In case it helps (while I work out the best analogous guppy command), here is the usage and GPU power draw with nothing running:
|
@iiSeymour OK and here is the analogous guppy command and nvidia-smi output
|
@rainwala thanks, can you also check the ETA with dorado for |
@iiSeymour the guppy run tookk 12 hours, so just over half the time for the anaologus dorado run. -x cuda:1 = 5hr:20 min I'm not sure how to interpret this except that it's good I can get 5 hours again, but. only with a subset of GPUs, and I don't seem to get any improvement from going from 1 GPU to 2 GPUs, and then performance drops off a cliff with 3 and 4 GPUs.. I get similar results with modified_bases (except that 2 GPUs does seem to be better than 1, with 4h:30min vs 6h ETA), so it's not about modified_bases, but rather it seems to be about dorado's performace with multiple A100s on a promethion data tower. On modified_bases, with -x cuda:1,2, I got this speed: |
Thanks @rainwala I have a theory. Can you try again the above test but with |
HI @iiSeymour , I set and then the ETAs were as follows: |
@rainwala thanks for providing all the information - we are putting together a fix now. |
Thanks @iiSeymour ! |
Thanks! Will keep a look out for that release then give it a go.
…On Tue, 27 Jun 2023, 13:36 Chris Seymour, ***@***.***> wrote:
@rainwala <https://github.com/rainwala> this should be resolve in v0.3.1
<https://github.com/nanoporetech/dorado/releases/tag/v0.3.1>.
—
Reply to this email directly, view it on GitHub
<#252 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRDOMR75HGWVUZ4PJFTZD3XNLHTRANCNFSM6AAAAAAZM45LSQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@rainwala it's available now https://github.com/nanoporetech/dorado#installation |
Thank you! I'm on holiday now but I'll try it as soon as I'm back and post
the results here.
…On Tue, 27 Jun 2023, 21:44 Chris Seymour, ***@***.***> wrote:
@rainwala <https://github.com/rainwala> it's available now
https://github.com/nanoporetech/dorado#installation
—
Reply to this email directly, view it on GitHub
<#252 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRDOMUNLZQ5ACLBCGLPCJLXNMZ2ZANCNFSM6AAAAAAZM45LSQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I have a similar problem with slightly different initial system settings. The v.0.3.1 finished 9% within 13 minutes (currently running) and shows 02h:08m estimated run time, but then the ET starts to grow rapidly. After 25 min of run time, it shows still 9% and about 3h 35m. After 30 min - 10% progress and 4h35m ET I was trying to run it with fully automated settings, and using slightly customized values for "batchsize", "chunksize" and "overlap", which results in a ~5-7% decrease of ET in the beginning, compared to an auto mode. Here is my command: Below is the nvidia-smi:
|
@iiSeymour , wow the ETA for basecalling (no modified bases) with 4 A100s is now ~1 hour 15 minutes! It's around 3 hours with modified bases. What did you fix in version 0.3.1 to make this possible? |
I consider this issue closed now. @homeveg maybe you would like to start a new issue with the problem you are facing. |
I am basecalling pod5 directories on a PromethION data tower (with 4 A100s), using dorado version 3.
On a pod5 directory with 760GB pods5s, I had changed the CUDA version to either 11.4 or 12 (I can't remember now), which made one of the 4 A100s not be recognised, but I basecalled anyway, and this took 5 hours! I did a factory resent on the PromethION data twoer after that to make sure all 4 A100s got recognised again.
After the factory reset, this machine has the following versions for software and drivers: Ubuntu 20.04, and Nvidia Driver Version: 515.65.01 and CUDA Version: 11.7 . On this setup, with 4 A100s, dorado basecalling is taking 19 hours on a 780GB directory.
The specific command in both cases was:
dorado basecaller /home/prom/dorado-0.3.0-linux-x64/models/dna_r10.4.1_e8.2_400bps_hac@v4.1.0 pod5_all/ --modified-bases 5mCG_5hmCG -x "cuda:1,2,3,4" > mod_bases.bam
I had made sure to make the cuda device numbering the same as the system device IDs, using this:
export CUDA_DEVICE_ORDER=PCI_BUS_ID
Could you please tell me what might explain the discrepancy, and are there any benchmarks for what speed to expect with 4 A100s?
The text was updated successfully, but these errors were encountered: