-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Modularisation clustering #49
[FEATURE] Modularisation clustering #49
Conversation
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small stuff.
Plus, you could use an enum for the clustering methods, if you want. :)
// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21 | ||
// INS from Primary Read - Sequence Type: Reference; Sequence Name: m2257/8161/CCS; Position: 41972616; Orientation: Reverse | ||
// Sequence Type: Read; Sequence Name: 0; Position: 3975; Orientation: Reverse | ||
// Chromosome: chr21 | ||
// Reference\tchr22\t17458417\tForward\tReference\tchr21\t41972615\tForward\tm41327/11677/CCS | ||
// BND from SA Tag - Sequence Type: Reference; Chromosome: chr22; Position: 17458417; Orientation: Forward | ||
// Sequence Type: Reference; Chromosome: chr21; Position: 41972615; Orientation: Forward | ||
// Sequence Name: m41327/11677/CCS | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I'd add an explanation of the string as I stumbled over it and was confused as to what the parts of the string were.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this string:
// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21
It should consist of breakend1, breakend2 and the read name it was detected from. However, the read name here is chr21
and the reference chromosome of breakend1 is m2257/8161/CCS\t41972616
. Those are swapped but I don't immediately see why 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21 | ||
// INS from Primary Read - Sequence Type: Reference; Sequence Name: m2257/8161/CCS; Position: 41972616; Orientation: Reverse | ||
// Sequence Type: Read; Sequence Name: 0; Position: 3975; Orientation: Reverse | ||
// Chromosome: chr21 | ||
// Reference\tchr22\t17458417\tForward\tReference\tchr21\t41972615\tForward\tm41327/11677/CCS | ||
// BND from SA Tag - Sequence Type: Reference; Chromosome: chr22; Position: 17458417; Orientation: Forward | ||
// Sequence Type: Reference; Chromosome: chr21; Position: 41972615; Orientation: Forward | ||
// Sequence Name: m41327/11677/CCS | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of cluster method to the parser looks good.
However, I think that the implementation of the simple clustering method (including the corresponding changes to the junction class) could be improved. Proviously, the junction class represented a single junction detected from a single read. Now, it is also used to represent clusters of junctions. IMO, we should represent clusters of junctions with a separate class or a std::vector. Then, we could keep all the cluster-related information, such as the number of supporting reads, the read names, etc. separated from the information on a single junction.
@@ -16,6 +17,10 @@ | |||
* 2: split_read, | |||
* 3: read_pairs, | |||
* 4: read_depth) | |||
* \param clustering_method list of Methods for clustering junctions (0: simple_clustering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* \param clustering_method list of Methods for clustering junctions (0: simple_clustering | |
* \param clustering_method method for clustering junctions (0: simple_clustering |
// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21 | ||
// INS from Primary Read - Sequence Type: Reference; Sequence Name: m2257/8161/CCS; Position: 41972616; Orientation: Reverse | ||
// Sequence Type: Read; Sequence Name: 0; Position: 3975; Orientation: Reverse | ||
// Chromosome: chr21 | ||
// Reference\tchr22\t17458417\tForward\tReference\tchr21\t41972615\tForward\tm41327/11677/CCS | ||
// BND from SA Tag - Sequence Type: Reference; Chromosome: chr22; Position: 17458417; Orientation: Forward | ||
// Sequence Type: Reference; Chromosome: chr21; Position: 41972615; Orientation: Forward | ||
// Sequence Name: m41327/11677/CCS | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this string:
// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21
It should consist of breakend1, breakend2 and the read name it was detected from. However, the read name here is chr21
and the reference chromosome of breakend1 is m2257/8161/CCS\t41972616
. Those are swapped but I don't immediately see why 😕
Yes, thats definetly a good idea. You said, that you worked on a cluster method. My idea with this simple method was to create a placeholder for a real one. I will write a new issue for decouple junctions and cluster of junctions. Otherwise it would blow up this PR I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for creating the two new issues. I will have a look. As far as I'm concerned, this PR can be merged now :)
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
0328007
to
318628b
Compare
[DOC] Correct path to binary
Resolves #41.
I recommend reviewing commit wise.
What have I done:
First I added the new parser option
-c
with 4 different clustering methods. (none is implemented)Then I implemented a simple clustering method as an example module. This method compares junctions, and if they are equal, one will be removed from the junction list.
This commit just added some more debug messages.
As some junctions are removed, I have included a counter for how many reads support a junction.