[FEATURE] Modularization of the methods for detecting junctions. #45

Irallia · 2021-01-20T13:28:03Z

Resolves #40
Adds also some new tests for different method combinations.

[TEST] Add new tests for different method combinations. Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>

Irallia · 2021-01-20T13:29:22Z

src/detect_breakends.cpp

+    // seqan3::value_list_validator method_validator{"1", "cigar_string",
+    //                                               "2", "split_read",
+    //                                               "3", "read_pairs",
+    //                                               "4", "read_depth"};


I was thinking, if the user wants to insert names instead of numbers, so we could offer both.

test/api/junction_detection_test.cpp

joshuak94

Looks good! We should figure out if it really makes sense to have all four of the options. Also, duplicate entries in the test cases definitely need to be solved...

joshuak94 · 2021-01-20T15:06:39Z

src/detect_breakends.cpp

+
+    // Options - Methods:
+    parser.add_option(args.methods, 'm', "method", "Choose the method to be used.",
+                      seqan3::option_spec::ADVANCED, method_validator);


The way this is, a user could only choose one of the four methods, no? E.g. you can't use -m 1, 2, 3, 4 or something. Maybe instead, it's better to have four flags instead of options? Then the user could do something like -c -s -d for cigar, split reads, and read depth.

Although to be completely honest, I can't really think of a reason why a user would not want to use the CIGAR string. For split reads, if the user has very short reads I guess they might think split reads aren't useful. Same for read depth if the sequencing was very shallow.

With regards to read pairs, I think this is something we should detect automatically (whether the sequencing was done with single end or paired end sequencing), so that a user doesn't accidentally turn on the read pairs option with single end sequencing data.

I think with the current implementation, a user can choose four methods using something like -m 1 -m 2 -m 3 -m 4 because args.methods is a std::vector.

You have a good point, though, with the CIGAR strings and the read pairs. It might be good to spend some time thinking about efficient processing of the SAM/BAM files:

cigar and split reads collect evidence from each individual read.

read depth, in contrast, is an analysis performed on the depth profile along the genomes, not individual reads.

read pair should (like you say) only be used for paired-end data. But for this type of data it works on a per-read basis like cigar and split reads.

So we should maybe have a separate detection function for read depth because it's completely separate from the other methods.

I think my change (the modularisation) is there to compare all methods with all methods.
But later, such a recognition from the SAM/BAM files would certainly be useful. I would create a new issue for this.
And then you could make some methods dependent on the input files.

include/detect_breakends/junction_detection.hpp

eldariont · 2021-01-22T08:54:44Z

include/detect_breakends/junction_detection.hpp

 *
 * \details Detects junctions from the CIGAR strings and supplementary alignment tags of read alignment records.
 *          We sort out unmapped alignments, secondary alignments, duplicates and alignments with low mapping quality.
 *          Then, the CIGAR string of all remaining alignments is analyzed.
 *          For primary alignments, also the split read information is analyzed.
 */
 void detect_junctions_in_alignment_file(const std::filesystem::path & alignment_file_path,
-                                        const std::filesystem::path & insertion_file_path);
+                                        const std::filesystem::path & insertion_file_path,
+                                        const std::vector<uint8_t> methods);


If I understand correctly, this function implements all four detection methods but can be parametrized by the methods parameter to only use a subset of methods. This makes a lot of sense because the alignment file has to be processed only once (in the best case) and different methods can be applied on the same read in one go.

However, this architecture is not really modularized, right? To be really modularized we would have four detect_junctionsfunctions that each implement one method. But then, the alignment file would be processed four times in the worst case. So I guess that's why you decided to choose the other option?!

Yes, that is correct. Maybe the term modularize is not really the right one.
The idea is, that we can combine different methods. And compare them. As we also want to compare run time, the complete decoupling would end in a higher run time if we combine two methods.
So I would leave it and would change the term modularisation with decoupling?

I see. But I think it's not really a decoupling either. How about "Enable selection of methods for detecting junctions" as new title for this pull request?

src/detect_breakends/junction_detection.cpp

test/api/junction_detection_test.cpp

Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>

joshuak94

Looks good! I'm still more of a fan of using flags instead of numbered methods (I think -c -s -d looks more descriptive/intuitive than -m 1 -m 2 -m 4), but it's largely a stylistic choice so both are fine!

eldariont

Looks all good now :)

[FEATURE] Modularization of the methods for detecting junctions.

f85c677

[TEST] Add new tests for different method combinations. Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>

Irallia self-assigned this Jan 20, 2021

Irallia requested a review from joshuak94 January 20, 2021 13:28

Irallia commented Jan 20, 2021

View reviewed changes

test/api/junction_detection_test.cpp Outdated Show resolved Hide resolved

joshuak94 requested changes Jan 20, 2021

View reviewed changes

eldariont reviewed Jan 22, 2021

View reviewed changes

include/detect_breakends/junction_detection.hpp Outdated Show resolved Hide resolved

eldariont reviewed Jan 22, 2021

View reviewed changes

src/detect_breakends/junction_detection.cpp Outdated Show resolved Hide resolved

eldariont reviewed Jan 22, 2021

View reviewed changes

test/api/junction_detection_test.cpp Outdated Show resolved Hide resolved

eldariont reviewed Jan 22, 2021

View reviewed changes

test/api/junction_detection_test.cpp Outdated Show resolved Hide resolved

[FIX] Add missing break statement, which removes the duplicated results.

d93a8a1

Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>

Irallia requested review from joshuak94 and eldariont January 25, 2021 12:56

joshuak94 approved these changes Jan 25, 2021

View reviewed changes

eldariont approved these changes Jan 27, 2021

View reviewed changes

Irallia merged commit f825e23 into seqan:master Jan 27, 2021

eldariont mentioned this pull request Feb 11, 2021

Method selection over CLI does not work properly #59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Modularization of the methods for detecting junctions. #45

[FEATURE] Modularization of the methods for detecting junctions. #45

Irallia commented Jan 20, 2021

Irallia Jan 20, 2021

joshuak94 left a comment

joshuak94 Jan 20, 2021

eldariont Jan 22, 2021

Irallia Jan 25, 2021

eldariont Jan 22, 2021

Irallia Jan 25, 2021

eldariont Jan 27, 2021

joshuak94 left a comment

eldariont left a comment

[FEATURE] Modularization of the methods for detecting junctions. #45

[FEATURE] Modularization of the methods for detecting junctions. #45

Conversation

Irallia commented Jan 20, 2021

Irallia Jan 20, 2021

Choose a reason for hiding this comment

joshuak94 left a comment

Choose a reason for hiding this comment

joshuak94 Jan 20, 2021

Choose a reason for hiding this comment

eldariont Jan 22, 2021

Choose a reason for hiding this comment

Irallia Jan 25, 2021

Choose a reason for hiding this comment

eldariont Jan 22, 2021

Choose a reason for hiding this comment

Irallia Jan 25, 2021

Choose a reason for hiding this comment

eldariont Jan 27, 2021

Choose a reason for hiding this comment

joshuak94 left a comment

Choose a reason for hiding this comment

eldariont left a comment

Choose a reason for hiding this comment