Skip to content

Commit

Permalink
Merge branch 'jdaw/add-poly-option' into 'master'
Browse files Browse the repository at this point in the history
add poly(a) documentation

See merge request machine-learning/dorado!702
  • Loading branch information
tijyojwad committed Nov 13, 2023
2 parents 1c2a4a3 + 1388ca3 commit 0f282cd
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 8 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nan
* [Duplex basecalling](#duplex) (watch the following video for an introduction to [Duplex](https://youtu.be/8DVMG7FEBys)).
* Simplex [barcode classification](#barcode-classification).
* Support for aligned read output in SAM/BAM.
* Initial support for [poly(A) tail estimation](#polya-tail-estimation).
* [POD5](https://github.com/nanoporetech/pod5-file-format) support for highest basecalling performance.
* Based on libtorch, the C++ API for pytorch.
* Multiple custom optimisations in CUDA and Metal for maximising inference performance.
Expand Down Expand Up @@ -203,6 +204,10 @@ unclassified.bam
#### Using a Sample Sheet
`dorado` is able to use a sample sheet to restrict the barcode classifications to only those present, and to apply aliases to the detected classifications. This is enabled by passing the path to a sample sheet to the `--sample-sheet` argument when using the `basecaller` or `demux` commands. See [here](documentation/SampleSheets.md) for more information.

### Poly(A) tail estimation

Dorado has initial support for estimating poly(A) tail lengths for DNA and RNA. Note that Oxford Nanopore cDNA reads sequence in two different orientations and transcript poly(A) length estimation handles both (A and T homopolymers). This feature can be enabled by passing `--estimate-poly-a` to the `basecaller` command. It is disabled by default. The estimated tail length is stored in the `pt:i` tag of the output record. Reads for which the tail length could not be estimated will not have the `pt:i` tag.

## Available basecalling models

To download all available Dorado models, run:
Expand Down
2 changes: 1 addition & 1 deletion documentation/SAM.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
| dx:i: | bool to signify duplex read _(only in duplex mode)_ |
| pi:Z: | parent read id for a split read |
| sp:i: | start coordinate of split read in parent read signal |
| pt:i: | estimated poly(A) tail length in cDNA and dRNA reads |
| pt:i: | estimated poly(A/T) tail length in cDNA and dRNA reads |

#### Modified Base Tags

Expand Down
12 changes: 5 additions & 7 deletions dorado/cli/basecaller.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -368,17 +368,15 @@ int basecaller(int argc, char* argv[]) {
parser.visible.add_argument("--sample-sheet")
.help("Path to the sample sheet to use.")
.default_value(std::string(""));

cli::add_minimap2_arguments(parser, AlignerNode::dflt_options);
cli::add_internal_arguments(parser);

// Add hidden arguments that only apply to simplex calling.
parser.hidden.add_argument("--estimate-poly-a")
parser.visible.add_argument("--estimate-poly-a")
.help("Estimate poly-A/T tail lengths (beta feature). Primarily meant for cDNA and "
"dRNA use cases.")
.default_value(false)
.implicit_value(true);

cli::add_minimap2_arguments(parser, AlignerNode::dflt_options);
cli::add_internal_arguments(parser);

// Create a copy of the parser to use if the resume feature is enabled. Needed
// to parse the model used for the file being resumed from. Note that this copy
// needs to be made __before__ the parser is used.
Expand Down Expand Up @@ -462,7 +460,7 @@ int basecaller(int argc, char* argv[]) {
parser.visible.get<bool>("--barcode-both-ends"),
parser.visible.get<bool>("--no-trim"),
parser.visible.get<std::string>("--sample-sheet"), resume_parser,
parser.hidden.get<bool>("--estimate-poly-a"));
parser.visible.get<bool>("--estimate-poly-a"));
} catch (const std::exception& e) {
spdlog::error("{}", e.what());
return 1;
Expand Down

0 comments on commit 0f282cd

Please sign in to comment.