Skip to content

Commit

Permalink
Merge branch 'jdaw/remove-plasmid-polya-option' into 'release-v0.6.0'
Browse files Browse the repository at this point in the history
Remove plasmid polyA configurability

See merge request machine-learning/dorado!903
  • Loading branch information
tijyojwad committed Mar 22, 2024
2 parents ddac0ef + 91b6c4e commit ce957ae
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 50 deletions.
15 changes: 0 additions & 15 deletions documentation/PolyTailConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ Dorado supports estimation of Poly(A/T) tails for DNA (PCS AND PCB) and RNA samp
Dorado also supports additional features that can be customized through a configuration file (described below):
* Custom primer sequence for cDNA tail estimation
* Clustering of interrupted Poly(A/T) tails
* Estimation of Poly(A/T) length in plasmids

## Poly(A/T) Reference Diagram

Expand All @@ -25,16 +24,6 @@ dRNA
3' ---- ADAPTER ---- poly(A) ---- RNA ---- 5'
```

```
Plasmid
5' ---- ADAPTER ---- DNA ---- FRONT_FLANK ---- poly(A) ---- REAR_FLANK --- DNA ---- 3'
OR
5' ---- ADAPTER ---- RC(DNA) ---- RC(REAR_FLANK) ---- poly(T) ---- RC(FRONT_FLANK) ---- RC(DNA) ---- 3'
```

## Configuration Format

The configuration file needs to be in the `toml` format.
Expand All @@ -43,8 +32,6 @@ The configuration file needs to be in the `toml` format.
[anchors]
front_primer = "ATCG"
rear_primer = "CGTA"
plasmid_front_flank = "CGATCG"
plasmid_rear_flank = "TGACTGC"
[threshold]
flank_threshold = 10
Expand All @@ -59,7 +46,5 @@ tail_interrupt_length = 10
| -- | -- |
| front_primer | Front primer sequence for cDNA |
| rear_primer | Rear primer sequence for cDNA |
| plasmid_front_flank | Front flanking sequence of poly(A) in plasmid |
| plasmid_rear_flank | Rear flanking sequence of poly(A) in plasmid |
| flank_threshold | The edit distance threshold to use for detection of the flank/primer sequences |
| tail_interrupt_length | Combine tails that are within this distance of each other (default is 0, i.e. don't combine any) |
13 changes: 0 additions & 13 deletions dorado/poly_tail/poly_tail_config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,19 +29,6 @@ PolyTailConfig prepare_config(std::istream& is) {
config.front_primer = toml::find<std::string>(anchors, "front_primer");
config.rear_primer = toml::find<std::string>(anchors, "rear_primer");
}

if (anchors.contains("plasmid_front_flank") || anchors.contains("plasmid_rear_flank")) {
if (!(anchors.contains("plasmid_front_flank") &&
anchors.contains("plasmid_rear_flank"))) {
throw std::runtime_error(
"Both plasmid_front_flank and plasmid_rear_flank must be provided in "
"the PolyA configuration file.");
}
config.plasmid_front_flank = toml::find<std::string>(anchors, "plasmid_front_flank");
config.plasmid_rear_flank = toml::find<std::string>(anchors, "plasmid_rear_flank");
config.is_plasmid = true;
config.flank_threshold = 10; // reduced default for plasmids
}
}

if (config_toml.contains("threshold")) {
Expand Down
26 changes: 4 additions & 22 deletions tests/PolyACalculatorTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,6 @@ TEST_CASE("PolyTailConfig: Test parsing file", TEST_GROUP) {
}

SECTION("Only one primer is provided") {
auto path = (tmp_dir.m_path / "only_one_primer.toml").string();
const toml::value data{{"anchors", toml::table{{"front_primer", "ACTG"}}}};
const std::string fmt = toml::format(data);
std::stringstream buffer(fmt);
Expand All @@ -116,23 +115,11 @@ TEST_CASE("PolyTailConfig: Test parsing file", TEST_GROUP) {
"configuration file.");
}

SECTION("Only one plasmid flank is provided") {
auto path = (tmp_dir.m_path / "only_one_flank.toml").string();
const toml::value data{{"anchors", toml::table{{"plasmid_rear_flank", "ACTG"}}}};
const std::string fmt = toml::format(data);
std::stringstream buffer(fmt);

CHECK_THROWS_WITH(dorado::poly_tail::prepare_config(buffer),
"Both plasmid_front_flank and plasmid_rear_flank must be provided in the "
"PolyA configuration file.");
}

SECTION("Parse all supported configs") {
auto path = (tmp_dir.m_path / "only_one_flank.toml").string();
const toml::value data{{"anchors", toml::table{{"plasmid_front_flank", "CGTA"},
{"plasmid_rear_flank", "ACTG"},
{"front_primer", "AAAAAA"},
{"rear_primer", "GGGGGG"}}},
const toml::value data{{"anchors",
toml::table{

{"front_primer", "AAAAAA"}, {"rear_primer", "GGGGGG"}}},
{"tail", toml::table{{"tail_interrupt_length", 10}}}};
const std::string fmt = toml::format(data);
std::stringstream buffer(fmt);
Expand All @@ -142,11 +129,6 @@ TEST_CASE("PolyTailConfig: Test parsing file", TEST_GROUP) {
CHECK(config.rc_front_primer == "TTTTTT");
CHECK(config.rear_primer == "GGGGGG");
CHECK(config.rc_rear_primer == "CCCCCC");
CHECK(config.plasmid_front_flank == "CGTA");
CHECK(config.rc_plasmid_front_flank == "TACG");
CHECK(config.plasmid_rear_flank == "ACTG");
CHECK(config.rc_plasmid_rear_flank == "CAGT");
CHECK(config.is_plasmid); // Since the plasmid flanks were specified
CHECK(config.tail_interrupt_length == 10);
}
}

0 comments on commit ce957ae

Please sign in to comment.