Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Modularisation clustering #49

Merged
merged 5 commits into from
Feb 8, 2021

Conversation

Irallia
Copy link
Collaborator

@Irallia Irallia commented Feb 4, 2021

Resolves #41.
I recommend reviewing commit wise.

What have I done:

  • [FEATURE] Add parser option for clustering
    First I added the new parser option -c with 4 different clustering methods. (none is implemented)
  • [FEATURE] Implement simple clustering method.
    Then I implemented a simple clustering method as an example module. This method compares junctions, and if they are equal, one will be removed from the junction list.
  • [MISC] Add 'not implemented' messages.
    This commit just added some more debug messages.
  • [FEATURE] Add a 'supporting reads' count.
    As some junctions are removed, I have included a counter for how many reads support a junction.

@Irallia Irallia requested a review from a team February 4, 2021 13:47
@Irallia Irallia self-assigned this Feb 4, 2021
@Irallia Irallia requested review from MitraDarja and removed request for a team February 4, 2021 13:47
Copy link
Contributor

@MitraDarja MitraDarja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small stuff.
Plus, you could use an enum for the clustering methods, if you want. :)

src/detect_breakends.cpp Outdated Show resolved Hide resolved
include/junction.hpp Outdated Show resolved Hide resolved
src/detect_breakends/junction_detection.cpp Outdated Show resolved Hide resolved
src/detect_breakends/junction_detection.cpp Outdated Show resolved Hide resolved
Comment on lines +17 to +25
// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21
// INS from Primary Read - Sequence Type: Reference; Sequence Name: m2257/8161/CCS; Position: 41972616; Orientation: Reverse
// Sequence Type: Read; Sequence Name: 0; Position: 3975; Orientation: Reverse
// Chromosome: chr21
// Reference\tchr22\t17458417\tForward\tReference\tchr21\t41972615\tForward\tm41327/11677/CCS
// BND from SA Tag - Sequence Type: Reference; Chromosome: chr22; Position: 17458417; Orientation: Forward
// Sequence Type: Reference; Chromosome: chr21; Position: 41972615; Orientation: Forward
// Sequence Name: m41327/11677/CCS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I'd add an explanation of the string as I stumbled over it and was confused as to what the parts of the string were.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this string:

// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21

It should consist of breakend1, breakend2 and the read name it was detected from. However, the read name here is chr21 and the reference chromosome of breakend1 is m2257/8161/CCS\t41972616. Those are swapped but I don't immediately see why 😕

test/cli/detect_breakends_cli_test.cpp Show resolved Hide resolved
src/detect_breakends/junction_detection.cpp Outdated Show resolved Hide resolved
test/cli/detect_breakends_cli_test.cpp Outdated Show resolved Hide resolved
include/junction.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@MitraDarja MitraDarja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Comment on lines +17 to +25
// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21
// INS from Primary Read - Sequence Type: Reference; Sequence Name: m2257/8161/CCS; Position: 41972616; Orientation: Reverse
// Sequence Type: Read; Sequence Name: 0; Position: 3975; Orientation: Reverse
// Chromosome: chr21
// Reference\tchr22\t17458417\tForward\tReference\tchr21\t41972615\tForward\tm41327/11677/CCS
// BND from SA Tag - Sequence Type: Reference; Chromosome: chr22; Position: 17458417; Orientation: Forward
// Sequence Type: Reference; Chromosome: chr21; Position: 41972615; Orientation: Forward
// Sequence Name: m41327/11677/CCS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense :)

Copy link
Collaborator

@eldariont eldariont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of cluster method to the parser looks good.

However, I think that the implementation of the simple clustering method (including the corresponding changes to the junction class) could be improved. Proviously, the junction class represented a single junction detected from a single read. Now, it is also used to represent clusters of junctions. IMO, we should represent clusters of junctions with a separate class or a std::vector. Then, we could keep all the cluster-related information, such as the number of supporting reads, the read names, etc. separated from the information on a single junction.

@@ -16,6 +17,10 @@
* 2: split_read,
* 3: read_pairs,
* 4: read_depth)
* \param clustering_method list of Methods for clustering junctions (0: simple_clustering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* \param clustering_method list of Methods for clustering junctions (0: simple_clustering
* \param clustering_method method for clustering junctions (0: simple_clustering

include/junction.hpp Outdated Show resolved Hide resolved
Comment on lines +17 to +25
// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21
// INS from Primary Read - Sequence Type: Reference; Sequence Name: m2257/8161/CCS; Position: 41972616; Orientation: Reverse
// Sequence Type: Read; Sequence Name: 0; Position: 3975; Orientation: Reverse
// Chromosome: chr21
// Reference\tchr22\t17458417\tForward\tReference\tchr21\t41972615\tForward\tm41327/11677/CCS
// BND from SA Tag - Sequence Type: Reference; Chromosome: chr22; Position: 17458417; Orientation: Forward
// Sequence Type: Reference; Chromosome: chr21; Position: 41972615; Orientation: Forward
// Sequence Name: m41327/11677/CCS

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this string:

// Reference\tm2257/8161/CCS\t41972616\tForward\tRead \t0\t2294\tForward\tchr21

It should consist of breakend1, breakend2 and the read name it was detected from. However, the read name here is chr21 and the reference chromosome of breakend1 is m2257/8161/CCS\t41972616. Those are swapped but I don't immediately see why 😕

test/api/junction_detection_test.cpp Show resolved Hide resolved
@Irallia Irallia changed the title Feature/modularisation clustering [FEATUrE] Modularisation clustering Feb 8, 2021
@Irallia Irallia changed the title [FEATUrE] Modularisation clustering [FEATURE] Modularisation clustering Feb 8, 2021
@Irallia
Copy link
Collaborator Author

Irallia commented Feb 8, 2021

The addition of cluster method to the parser looks good.

However, I think that the implementation of the simple clustering method (including the corresponding changes to the junction class) could be improved. Proviously, the junction class represented a single junction detected from a single read. Now, it is also used to represent clusters of junctions. IMO, we should represent clusters of junctions with a separate class or a std::vector. Then, we could keep all the cluster-related information, such as the number of supporting reads, the read names, etc. separated from the information on a single junction.

Yes, thats definetly a good idea. You said, that you worked on a cluster method. My idea with this simple method was to create a placeholder for a real one. I will write a new issue for decouple junctions and cluster of junctions. Otherwise it would blow up this PR I think.
-> #51

Copy link
Collaborator

@eldariont eldariont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for creating the two new issues. I will have a look. As far as I'm concerned, this PR can be merged now :)

Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
@Irallia Irallia force-pushed the FEATURE/modularisation_clustering branch from 0328007 to 318628b Compare February 8, 2021 17:23
@Irallia Irallia merged commit 9dbeb6a into seqan:master Feb 8, 2021
Irallia pushed a commit to Irallia/iGenVar that referenced this pull request Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Modularize methods for clustering junctions
3 participants