How to acquire Remora models in the toml format that Dorado expects as input? #38

oneillkza · 2022-10-20T19:34:37Z

What's the timeline for getting support for modified basecalling models in Dorado?

(Or is this possible already?)

sklages · 2022-10-27T06:57:59Z

Well, listed under "Features":

Modified basecalling (Remora models).

oneillkza · 2022-10-27T19:11:47Z

Hmm -- yep it does look like the -h suggests this might be possible, although it's not very informative as to what format it wants the remora models in...

dorado basecaller -h
Usage: dorado [options] model data 

Positional arguments:
model              	the basecaller model to run.
data               	the data directory.


Optional arguments:
-h --help          	shows help message and exits
-v --version       	prints version information and exits
-x --device        	device string in format "cuda:0,...,N", "cuda:all", "metal" etc.. [default: "cuda:all"]
-b --batchsize     	if 0 an optimal batchsize will be selected [default: 0]
-c --chunksize     	[default: 10000]
-o --overlap       	[default: 500]
-r --num_runners   	[default: 2]
--emit-fastq       	[default: false]
--remora-batchsize 	[default: 1000]
--remora-threads   	[default: 1]
--remora_models    	a comma separated list of remora models [default: ""]

When I try to pass it an .onnx from the Remora repository, it treats it like a directory that should contain a .toml file.

> Creating basecall pipeline
toml::parse: file open error -> ../remora/r9_4_1_sup_5mc_5hmc.onnx/config.toml

But there are no toml files in the Remora repository, or in rerio, and none for the basecall models distributed with Guppy. There's also no clear documentation on this format, although it seems to be alluded to in nanoporetech/bonito#278 which talks about "a bonito basecalling model [tar+toml]".

So the question is, how does one acquire (or create?) Remora models in the tar+toml format that Dorado accepts?

iiSeymour · 2022-10-29T20:21:04Z

I will get dorado download supporting mods models over the next few days.

oneillkza · 2022-11-01T06:19:39Z

Thanks! Looking forward to it!

iiSeymour · 2022-11-10T16:32:56Z

@oneillkza v0.0.2 has a matching 5mC model for each simplex model.

$ dorado download --list 
[2022-11-10 16:25:06.843] [info] > simplex models
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_260bps_fast@v3.5.2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_260bps_hac@v3.5.2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_260bps_sup@v3.5.2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_400bps_fast@v3.5.2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_400bps_hac@v3.5.2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_400bps_sup@v3.5.2
[2022-11-10 16:25:06.846] [info]  - dna_r9.4.1_e8_fast@v3.4
[2022-11-10 16:25:06.846] [info]  - dna_r9.4.1_e8_hac@v3.3
[2022-11-10 16:25:06.846] [info]  - dna_r9.4.1_e8_sup@v3.3
[2022-11-10 16:25:06.846] [info] > modification models
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_260bps_fast@v3.5.2_5mCG@v2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_260bps_hac@v3.5.2_5mCG@v2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_260bps_sup@v3.5.2_5mCG@v2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_400bps_fast@v3.5.2_5mCG@v2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_400bps_hac@v3.5.2_5mCG@v2
[2022-11-10 16:25:06.846] [info]  - dna_r10.4.1_e8.2_400bps_sup@v3.5.2_5mCG@v2
[2022-11-10 16:25:06.846] [info]  - dna_r9.4.1_e8_fast@v3.4_5mCG@v0
[2022-11-10 16:25:06.846] [info]  - dna_r9.4.1_e8_hac@v3.4_5mCG@v0
[2022-11-10 16:25:06.846] [info]  - dna_r9.4.1_e8_sup@v3.4_5mCG@v0

In this release you have to specify the model manually like so:

$ dorado basecaller ${models}/dna_r10.4.1_e8.2_400bps_hac@v3.5.2 ${data} \
    --remora-models ${models}/dna_r10.4.1_e8.2_400bps_hac@v3.5.2_5mCG@v2 > mods.sam

But I intend to simplify this with automatic model matching and a simpler cli i.e.

$ dorado basecaller ${models}/dna_r10.4.1_e8.2_400bps_hac@v3.5.2 ${data} --mods 5mCG > mods.sam

oneillkza · 2022-11-10T17:31:21Z

Thanks @iiSeymour !

jcolicchio-soundag · 2022-11-10T22:00:20Z

Does this also work using the rerio remora all cytosine context model?

iiSeymour · 2022-11-17T17:37:59Z

@jcolicchio-soundag not yet.

oneillkza changed the title ~~Remora support?~~ How to acquire Remora models in the toml format that Dorado expects as input? Oct 27, 2022

iiSeymour self-assigned this Oct 29, 2022

iiSeymour added the enhancement New feature or request label Oct 29, 2022

iiSeymour closed this as completed Nov 10, 2022

iiSeymour mentioned this issue Nov 29, 2022

[error] [error] key "qscore" not found in the top-level table while trying to perform basecalling with 5mCG models #51

Closed

olawa mentioned this issue Dec 9, 2022

CUDAOutOfMemoryError for duplex with 3080ti (12Gb) #57

Closed

Kirk3gaard mentioned this issue Jan 9, 2023

out of memory core dump with dna_r10.4.1_e8.2_400bps_sup@v4.0.0 #64

Closed

ritma001 mentioned this issue Dec 4, 2023

dorado basecaller on linux with Tesla K20Xm #498

Closed

wangguiqian mentioned this issue May 20, 2024

CUDA device requested but no devices found #251

Closed

fehofman mentioned this issue Jun 4, 2024

CUDA illegal memory access was encountered with Dorado v0.7.1 #866

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to acquire Remora models in the toml format that Dorado expects as input? #38

How to acquire Remora models in the toml format that Dorado expects as input? #38

oneillkza commented Oct 20, 2022

sklages commented Oct 27, 2022

oneillkza commented Oct 27, 2022

iiSeymour commented Oct 29, 2022

oneillkza commented Nov 1, 2022

iiSeymour commented Nov 10, 2022

oneillkza commented Nov 10, 2022

jcolicchio-soundag commented Nov 10, 2022

iiSeymour commented Nov 17, 2022

How to acquire Remora models in the toml format that Dorado expects as input? #38

How to acquire Remora models in the toml format that Dorado expects as input? #38

Comments

oneillkza commented Oct 20, 2022

sklages commented Oct 27, 2022

oneillkza commented Oct 27, 2022

iiSeymour commented Oct 29, 2022

oneillkza commented Nov 1, 2022

iiSeymour commented Nov 10, 2022

oneillkza commented Nov 10, 2022

jcolicchio-soundag commented Nov 10, 2022

iiSeymour commented Nov 17, 2022