Skip to content

slow5-dorado-v0.0.1

Compare
Choose a tag to compare
@hasindu2008 hasindu2008 released this 30 Jun 05:19
· 2856 commits to master since this release

This is a minimal build of Dorado for Linux x86_64 systems with NVIDIA GPUs, from our fork that supports SLOW5, built for our in-house systems, with features like auto model download turned off. Likely to work on yours, if the systems are similar.

Setting up binaries

Download and Extract the relevant tarball from the links below and run the binary under bin/slow-dorado. slow5-dorado-vx.y.z-x86_64-linux-cu102.tar.gz is likely to work on up to Volta GPUs (tested on Tesla V100). slow5-dorado-vx.y.z-x86_64-linux-cu111.tar.gz is likely to work on the latest Ampere GPUs (tested on GeForce 3090).

wget <link> -O slow5-dorado.tar.gz
tar xf
tar xf slow5-dorado.tar.gz
cd slow5-dorado/bin
./slow5-dorado --version

Setting up Models

A dorado model must be manually downloaded from ONT based on URL under dorado/model.h and extracted. For example, to download and download dna_r9.4.1_e8.1_hac@v3.3 model:

mkdir models && cd models/
# based on {"dna_r9.4.1_e8.1_hac@v3.3", "/shared/static/i2rjjq0t3tlaktkipjjl8ef14p29eiss.zip"} under dorado/model.h, download
wget "https://nanoporetech.box.com/shared/static/i2rjjq0t3tlaktkipjjl8ef14p29eiss.zip" -O model.zip
# apt-get install unzip on Ubuntu if you do not have unzip command
unzip model.zip
# check if they have been extracted
ls dna_r9.4.1_e8.1_hac@v3.3/
cd ..

The links to some of the models from ONT are below for your convenience:

model link
dna_r9.4.1_e8.1_fast@v3.4 https://nanoporetech.box.com/shared/static/buvtwoh7wg73yext2wphq5mkqkqltgzz.zip
dna_r9.4.1_e8.1_hac@v3.3 https://nanoporetech.box.com/shared/static/i2rjjq0t3tlaktkipjjl8ef14p29eiss.zip
dna_r9.4.1_e8.1_sup@v3.3 https://nanoporetech.box.com/shared/static/xmpfpcq9eplsr2yoxha9pzmpurcerfey.zip
dna_r10.4.1_e8.2_fast@v3.5.1 https://nanoporetech.box.com/shared/static/d4wnbro47x1kbyhunhqu5x1lguq6yczu.zip
dna_r10.4.1_e8.2_hac@v3.5.1 https://nanoporetech.box.com/shared/static/9wo87gztgmz38mmeikwyfy05yfax4axr.zip
dna_r10.4.1_e8.2_sup@v3.5.1 https://nanoporetech.box.com/shared/static/ny4684yq0194t2mrda21x0v26ywkiog1.zip

Execution

Launch a base-calling by providing the extracted model directory and a directory containing BLOW5 files (or the path to a single BLOW5 file) as arguments:

./slow5-dorado basecaller /path/to/extracted/model_dir/ /path/to/blow5_dir/or/file > calls.fastq

examples:

# a single BLOW5 file
./slow5-dorado basecaller --emit-fastq ./dna_r9.4.1_e8.1_hac@v3.3 merged.blow5 > calls.fastq
# a BLOW5 dir
./slow5-dorado basecaller --emit-fastq ./dna_r9.4.1_e8.1_hac@v3.3 ./blow5_dir/ > calls.fastq
# on a particular GPU (GPU 3 in this example)
./slow5-dorado basecaller --emit-fastq ./dna_r9.4.1_e8.1_hac@v3.3 merged.blow5 -x cuda:3 > calls.fastq

If you run out of GPU memory provide a smaller batch size using -b option, for instance:

./slow5-dorado basecaller -b 100 --emit-fastq ./dna_r9.4.1_e8.1_hac@v3.3 merged.blow5 > calls.fastq

If your GPU memory is as low as 4GB, you could start with a small batch size like 10 and increment to find the best batch size for the particular model.