New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paired read tracks #1235

Merged
merged 168 commits into from Dec 7, 2018

Conversation

Projects
None yet
7 participants
@cmdcolin
Copy link
Contributor

cmdcolin commented Oct 16, 2018

I wanted to initiate a pull request for the paired read viewing functionality cause it's kinda exciting :)

This tries to address the long standing need to view paired end reads as connected entities

See #521 for the basic idea

The code in @gmod/bam features a "read dispatching method" to grab connected pairs. The functionality from that module is compared against some output from "bazam" https://github.com/ssadedin/bazam/

Updated:

Possible TODO items

  • If the pairs of the reads overlap (e.g. short insert size), then it is a little confusing
  • Reads in the view as read cloud are not clickable. It could be done not too hard but even so they all overlap each other

DONE

  • Use RadioMenuItems in track menu
  • Auto infer view as pairs from glyph type
  • Make it work for CRAM
  • If a read is split on two sides of a block in the block based track, then the block in the middle won't get info about that read so the feature will not display properly between them

@wafflebot wafflebot bot added the in progress label Oct 16, 2018

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch 4 times, most recently from 8c5e865 to 033832a Oct 16, 2018

@cmdcolin

This comment has been minimized.

Copy link
Contributor

cmdcolin commented Oct 19, 2018

Here is a data directory that can be used to test large insert sizes

large_inserts.tar.gz

The requests for paired end reads is highly reliant on what is in the "current view". It appears read pairs are resolved well, the efficiency is possibly a little slower than what I want, and especially on a full sized dataset? For some reason this smaller sample data is better. But functionally it is working pretty well.

screenshot-localhost-2018 10 19-17-27-53

@cmdcolin

This comment was marked as off-topic.

Copy link
Contributor

cmdcolin commented Oct 22, 2018

I haven't evaluated this possibility but a "push it to the limit" extension of this PR is to render "linked reads" e.g. 10x genomics data. This has not just "paired reads" but many reads with the same molecular identifier

10x22
example screenshot from https://www.biocompare.com/Bench-Tips/341771-Optimizing-your-Linked-Read-Genome-and-Exome-Analyses-4-Practical-Considerations/

Similar things can also happen with long reads too with linked supplementary alignments https://www.pacb.com/blog/igv-3-improves-support-pacbio-long-reads/

@cmdcolin

This comment has been minimized.

Copy link
Contributor

cmdcolin commented Oct 22, 2018

Random other thing: color alignments by tag can reveal some interesting information about haplotypes if the tags exist https://whatshap.readthedocs.io/en/latest/guide.html

@cmdcolin

This comment has been minimized.

Copy link
Contributor

cmdcolin commented Oct 24, 2018

I did a couple other things for this PR including

  • Adding triangulated ends to indicate strand orientation to the alignments...no more trying to think if red or blue means positive/negative
  • Added a menu item to toggle SNPCoverage which is related to #1154

screenshot-localhost-2018 10 24-14-31-21

@cmdcolin

This comment was marked as resolved.

Copy link
Contributor

cmdcolin commented Oct 24, 2018

The aforementioned issue of resolving "If a read is split on two sides of a block" has a solution now by keeping track of things in a featureCache at the store level, and filling in features from the cache that overlap block requests. It also clears out the featureCache once things stop overlapping the blocks. Testing on the large_inserts.tar.gz data is the best way to see this aspect of the code

@cmdcolin

This comment was marked as resolved.

Copy link
Contributor

cmdcolin commented Oct 24, 2018

@rbuels if you have any input on CRAM support. The basic idea of the retrofitting that was added to the BAM adapters was to fetch reads in a region and then fetch all their connecting mate pairs when a getRecordsForRange has a viewAsPairs flag enabled. I know CRAM implementation of this might be two-faceted related to the fact that reads within segments can be resolved automatically

@rbuels

This comment has been minimized.

Copy link
Collaborator

rbuels commented Oct 26, 2018

these menu items need to be disabled for all the stores that don't support viewAsPairs.

should make an attribute of all stores that indicates whether view as pairs is supported, and have the track check that

Show resolved Hide resolved src/JBrowse/Store/SeqFeature/BAM.js Outdated
Show resolved Hide resolved src/JBrowse/Store/SeqFeature/BAM.js
@rbuels

This comment was marked as resolved.

Copy link
Collaborator

rbuels commented Oct 26, 2018

probably should also rename "scaleFactor" to something more descriptive too

@rbuels

This comment was marked as resolved.

Copy link
Collaborator

rbuels commented Oct 26, 2018

probably worth implementing a dialog box and an auto-estimation for scaleFactor as well

@rbuels rbuels added this to the 1.15.5 milestone Oct 26, 2018

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch 3 times, most recently from 6b0ec6a to 81b62c3 Oct 27, 2018

@cmdcolin

This comment was marked as resolved.

Copy link
Contributor

cmdcolin commented Nov 1, 2018

During dev meeting I mentioned that very large data requests were being launched at startup (e.g. 100MB) and this was due to the funkyness that is related to #1187. It may be of interest to consider merging this PR in order to mitigate this issue, because without #1187 there are just a bunch of requests to eventually unused regions of data. This actually happens for all tracks but sorta slams the paired read resolvers unduly. It does point to the fact that sometimes large region requests incur unreasonably large downloads on the paired reads which might need mitigation in other ways, but it is just especially noticeable due to #1187

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch from 73c6753 to 27f8c25 Nov 2, 2018

@cmdcolin

This comment has been minimized.

Copy link
Contributor

cmdcolin commented Nov 2, 2018

Added the essentials for CRAM support on this branch!

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch from 92cb81f to 7b8cb40 Nov 6, 2018

@cmdcolin

This comment was marked as resolved.

Copy link
Contributor

cmdcolin commented Nov 8, 2018

TODO type things

Large insert sizes can cause visual glitches especially when you are scrolling and then you encounter a feature that is suddenly paired on the other side of where you were, so then the old blocks that rendered nothing don't get updated to fill in.

Some approaches

  • Filter larger insert sizes, and then this basically will not happen (even at high zoom levels it shouldn't happen because it requests large enough regions)
  • Make some mechanism to re-render all blocks if this situation occurs
  • Make some other glyph that combo's with the normal glyphs that indicates the crazy pairing e.g. a vertical line or arc

Another possible TODO is the optimization to not resolve pairs of reads from getFeatures for ultra large inserts as this generally just requests a lot of data and then doesn't actually seem necessary. Instead just create a feature span that uses PNEXT (position of pair read) instead of the actual PNEXT resolved read

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch 2 times, most recently from 7163a00 to 10f14bd Nov 8, 2018

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch from d51193f to 66e7d36 Nov 15, 2018

@rbuels

This comment was marked as resolved.

Copy link
Collaborator

rbuels commented Nov 16, 2018

We need checkboxes or radio buttons in the "Track visualization types" menu to see what visualization you are currently using:
image

@AndyMenzies

This comment has been minimized.

Copy link

AndyMenzies commented Dec 6, 2018

I agree with @keiranmraine changing the colours to be different from the standard pallet really highlights the overlaps and read discrepancies. This will be really useful for variant chasing.

@cmdcolin

This comment has been minimized.

Copy link
Contributor

cmdcolin commented Dec 6, 2018

I bumped @gmod/cram version on this branch and it gives us the ability to use lossy read names and fixes #1271

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch 2 times, most recently from 1f2c0ea to 04cc891 Dec 7, 2018

@rbuels

rbuels approved these changes Dec 7, 2018

Show resolved Hide resolved src/JBrowse/Store/SeqFeature/CRAM.js

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch from cc2b8d1 to f5eea30 Dec 7, 2018

@rbuels

This comment has been minimized.

Copy link
Collaborator

rbuels commented Dec 7, 2018

wooo you may merge when ready ;-)

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch 6 times, most recently from 6583328 to dda2cb3 Dec 7, 2018

@cmdcolin cmdcolin force-pushed the paired_read_tracks branch from dda2cb3 to 5fc4420 Dec 7, 2018

@cmdcolin

This comment has been minimized.

Copy link
Contributor

cmdcolin commented Dec 7, 2018

Alrighty I think this is ready to merge after discussion today! I am going to hit the merge button and then we can move towards 1.16.0 release after some other milestone issues are closed!

@cmdcolin cmdcolin merged commit af3cf8d into dev Dec 7, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@wafflebot wafflebot bot removed the in progress label Dec 7, 2018

@cmdcolin cmdcolin deleted the paired_read_tracks branch Dec 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment