Add a description of the toolkit to the README #34

clintval · 2024-03-13T17:29:02Z

Enhancing the README to better communicate how this toolkit works and should be used.

README.md

clintval · 2024-05-10T19:21:06Z

Make it very clear coordinates are always 1-based inclusive in the output files unless the output file is a BEDPE file (Fix coordinate conversion for BEDPE files #40 (comment))

clintval · 2024-05-10T20:12:43Z

@nh13 ready for a re-review when you have time! Thanks!

README.md

msto · 2024-05-10T20:39:28Z

README.md

+The tool [`fgsv SvPileup`](https://github.com/fulcrumgenomics/fgsv/blob/main/docs/tools/SvPileup.md) takes a query-grouped BAM file as input and scans through each template one at a time, where a template is the full collection of reads and alignments from a single source molecule.
+For example, a paired-end read may have an alignment per read: one alignment for read 1 and another alignment for read 2.
+
+Primary and supplementary alignments for a template (see the [SAM Format Specification v1](https://samtools.github.io/hts-specs/SAMv1.pdf) for more information) are used to construct a “chain” of aligned sub-segments in a way that honors the logical ordering of sub-segments and their strandeness in relation to the reference sequence.


introduce template here, or swap to queryname/group?

msto · 2024-05-10T20:48:55Z

README.md

+    --output sample.svpileup.aggregate.txt
+```
+
+Because of variability in typical short-read alignments, evidence for a single breakpoint may span a few loci near the true breakend loci. For example, if the breakpoint only has intra-read evidence, then the breakpoint could coincidentally occur within the unobserved bases between read 1 and read 2 in a pair. In other cases and due to sequence similarity or homology between each breakend locus, it is not always possible to locate the exact nucleotide point where the breakends occur, and instead a plausible region may exist that supports either breakend loci.


Suggested change

Because of variability in typical short-read alignments, evidence for a single breakpoint may span a few loci near the true breakend loci. For example, if the breakpoint only has intra-read evidence, then the breakpoint could coincidentally occur within the unobserved bases between read 1 and read 2 in a pair. In other cases and due to sequence similarity or homology between each breakend locus, it is not always possible to locate the exact nucleotide point where the breakends occur, and instead a plausible region may exist that supports either breakend loci.

Because of variability in typical short-read alignments, evidence for a single breakpoint may span a few loci near the true breakend loci. For example, if the breakpoint only has paired-end evidence, then the breakpoint could coincidentally occur within the unobserved bases between read 1 and read 2 in a pair. In other cases and due to sequence similarity or homology between each breakend locus, it is not always possible to locate the exact nucleotide point where the breakends occur, and instead a plausible region may exist that supports either breakend loci.

or inter-read, but I find split-read and paired-end to be more familiar terms

I've read this a few times and don't quite understand what it's saying

msto · 2024-05-10T20:50:31Z

README.md

+
+Because of variability in typical short-read alignments, evidence for a single breakpoint may span a few loci near the true breakend loci. For example, if the breakpoint only has intra-read evidence, then the breakpoint could coincidentally occur within the unobserved bases between read 1 and read 2 in a pair. In other cases and due to sequence similarity or homology between each breakend locus, it is not always possible to locate the exact nucleotide point where the breakends occur, and instead a plausible region may exist that supports either breakend loci.
+
+The tool [`fgsv AggregateSvPileup`](https://github.com/fulcrumgenomics/fgsv/blob/main/docs/tools/AggregateSvPileup.md) is used to coalesce nearby breakpoints into one event if they appear to belong to one true breakpoint.


Suggested change

The tool [`fgsv AggregateSvPileup`](https://github.com/fulcrumgenomics/fgsv/blob/main/docs/tools/AggregateSvPileup.md) is used to coalesce nearby breakpoints into one event if they appear to belong to one true breakpoint.

The tool [`fgsv AggregateSvPileup`](https://github.com/fulcrumgenomics/fgsv/blob/main/docs/tools/AggregateSvPileup.md) is used to coalesce pileups in proximity to each other into one event if they appear to support the same breakpoint.

README.md

msto · 2024-05-10T20:52:19Z

README.md

+The tool [`fgsv AggregateSvPileup`](https://github.com/fulcrumgenomics/fgsv/blob/main/docs/tools/AggregateSvPileup.md) is used to coalesce nearby breakpoints into one event if they appear to belong to one true breakpoint.
+This polishing step preserves true positive breakpoint events and intends to reduce the number of false positive breakpoint events.
+
+Adjacent breakpoints are only merged if their left breakends map to the same reference sequence, their right breakends map to the same reference sequence, the strandedness of the left and right aligned sub-segments is the same, and their left and right genomic breakend positions are both within a given length threshold.


Suggested change

Adjacent breakpoints are only merged if their left breakends map to the same reference sequence, their right breakends map to the same reference sequence, the strandedness of the left and right aligned sub-segments is the same, and their left and right genomic breakend positions are both within a given length threshold.

Adjacent breakpoints are only merged if their left breakends map to the same strand of the same reference chromosome (or contig), their right breakends map to the same strand of the same reference chromosome (or contig), and their left and right genomic breakend positions are both within a given length threshold.

Also, what is the threshold?

msto · 2024-05-10T20:54:17Z

README.md

+
+One shortcoming of the existing behavior, which should be corrected at some point, is that intra-read breakpoint evidence is considered similarly to inter-pair breakpoint evidence even though intra-read breakpoint evidence often has nucleotide-level alignment resolution and inter-pair breakpoint evidence does not.
+
+The output of this tool is a metrics file tabulating the coalesced breakpoints with all previous breakpoint IDs listed for the new breakpoint event and an estimation of the allele frequency of the event based on the alignments that support the breakpoint.


Suggested change

The output of this tool is a metrics file tabulating the coalesced breakpoints with all previous breakpoint IDs listed for the new breakpoint event and an estimation of the allele frequency of the event based on the alignments that support the breakpoint.

The output of this tool is a table of coalesced breakpoints. Each aggregate breakpoint is annotated with the constituent pileup IDs and an estimation of the allele frequency of the breakpoint based on the alignments that support the breakpoint.

How is allele frequency calculated? Especially without the original bam?

Add a description of the toolkit to the README

798a5ac

clintval requested a review from nh13 March 13, 2024 17:29

clintval assigned nh13 Mar 13, 2024

Generate docs files

e3e3dc1

nh13 requested changes Mar 14, 2024

View reviewed changes

clintval mentioned this pull request May 10, 2024

Fix coordinate conversion for BEDPE files #40

Merged

clintval and others added 3 commits May 10, 2024 12:28

Merge remote-tracking branch 'origin/main' into cv_README

0ade11f

Generate docs files

531a122

Fix up README after a review

17e55e4

clintval force-pushed the cv_README branch from d36e72e to 17e55e4 Compare May 10, 2024 20:09

clintval added 4 commits May 10, 2024 13:09

Fix up README after a review

983dbb7

Remove outdated intro in Overview

5a903a0

Fixup a sentence

a62b275

Whitespace

a8cc755

clintval requested a review from nh13 May 10, 2024 20:12

Nobody and others added 4 commits May 10, 2024 20:13

Generate docs files

9304a02

Generate docs files

475c3e9

Remove duplicate .gitignore line

c2ca29a

Generate docs files

0090746

msto requested changes May 10, 2024

View reviewed changes

clintval and others added 2 commits May 10, 2024 14:03

Small review fixups

7067311

Generate docs files

9a45075

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a description of the toolkit to the README #34

Add a description of the toolkit to the README #34

clintval commented Mar 13, 2024 •

edited

Loading

clintval commented May 10, 2024 •

edited

Loading

clintval commented May 10, 2024

msto May 10, 2024

msto May 10, 2024

msto May 10, 2024

msto May 10, 2024

msto May 10, 2024

msto May 10, 2024


		Because of variability in typical short-read alignments, evidence for a single breakpoint may span a few loci near the true breakend loci. For example, if the breakpoint only has intra-read evidence, then the breakpoint could coincidentally occur within the unobserved bases between read 1 and read 2 in a pair. In other cases and due to sequence similarity or homology between each breakend locus, it is not always possible to locate the exact nucleotide point where the breakends occur, and instead a plausible region may exist that supports either breakend loci.

		The tool [`fgsv AggregateSvPileup`](https://github.com/fulcrumgenomics/fgsv/blob/main/docs/tools/AggregateSvPileup.md) is used to coalesce nearby breakpoints into one event if they appear to belong to one true breakpoint.

	Adjacent breakpoints are only merged if their left breakends map to the same reference sequence, their right breakends map to the same reference sequence, the strandedness of the left and right aligned sub-segments is the same, and their left and right genomic breakend positions are both within a given length threshold.
	Adjacent breakpoints are only merged if their left breakends map to the same strand of the same reference chromosome (or contig), their right breakends map to the same strand of the same reference chromosome (or contig), and their left and right genomic breakend positions are both within a given length threshold.


		One shortcoming of the existing behavior, which should be corrected at some point, is that intra-read breakpoint evidence is considered similarly to inter-pair breakpoint evidence even though intra-read breakpoint evidence often has nucleotide-level alignment resolution and inter-pair breakpoint evidence does not.

		The output of this tool is a metrics file tabulating the coalesced breakpoints with all previous breakpoint IDs listed for the new breakpoint event and an estimation of the allele frequency of the event based on the alignments that support the breakpoint.

	The output of this tool is a metrics file tabulating the coalesced breakpoints with all previous breakpoint IDs listed for the new breakpoint event and an estimation of the allele frequency of the event based on the alignments that support the breakpoint.
	The output of this tool is a table of coalesced breakpoints. Each aggregate breakpoint is annotated with the constituent pileup IDs and an estimation of the allele frequency of the breakpoint based on the alignments that support the breakpoint.

Add a description of the toolkit to the README #34

Are you sure you want to change the base?

Add a description of the toolkit to the README #34

Conversation

clintval commented Mar 13, 2024 • edited Loading

clintval commented May 10, 2024 • edited Loading

clintval commented May 10, 2024

msto May 10, 2024

Choose a reason for hiding this comment

msto May 10, 2024

Choose a reason for hiding this comment

msto May 10, 2024

Choose a reason for hiding this comment

msto May 10, 2024

Choose a reason for hiding this comment

msto May 10, 2024

Choose a reason for hiding this comment

msto May 10, 2024

Choose a reason for hiding this comment

clintval commented Mar 13, 2024 •

edited

Loading

clintval commented May 10, 2024 •

edited

Loading