New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved support for PacBio reads #277
Comments
I don't understand (8), what do you mean by "operations are performed per block". Could you elaborate with cpu profiling data? The solution we applied earlier to this problem was to set a filter and ignore indels < some size, combining the adjacent blocks. I think the cpu cost is in the drawing operations, it doesn't matter if you loop through 40,000 blocks or 400 blocks if you draw the same elements you will incur the same cost. At least that is my recollection of previous profiling, some hard data is needed here and anywhere where performance is being discussed. |
Which of these do you have solutions for in your personal fork? |
Hi, @jrobinso , for (9), you can see some code I modify from the code for blat here: https://github.com/pb-jchin/igv/blob/ExtView/src/org/broad/igv/util/extview/ExtendViewClient.java Ideally, if there is a generic mini language to send the meta-data/data of current view, selected read and selected features then we can pass the data to external viewer to fetch extra information that might need some other database backend for visualization without modifying the source code. If possible, the IGV can take the HTTP request returns to display it (SVG or PNG data, etc.). Or, the server can return a URL and IGV can initiate a web browser pointing to the URL, that will be great too. I have some example. If it is useful, I can make a screencast to show an example. |
@jrobinso I also use IGV regularly to analyze PacBio raw reads, and would greatly appreciate the suggested changes - Particularly (1) and (2) |
@pb-jchin @amwenger Do you guys have coded solutions for (1) and (2). If not I'm going to proceed with my own. In general I'd like to get your contributions merged within the next few weeks, I'm planning to do some restructuring and simplification of the Alignment model and merging later might be difficult. |
I do have solutions to 1 and 2 in |
@pb-jchin wrt (9), I can work with the code you have above to add a "Send read to URL" function, however I think we should nail down what can be returned from the post a little more tightly. I suggest we defer this one a bit and concentrate on some of the others as its easy to add this at any time. My time is really limited, and with 9 items we need to prioritize. |
@cwhelan if you guys have any input on the PacBio improvements (see items 1-9 above) chime in. |
@jrobinso Yes, for (9), if one wants to be more general about the communication between IGV and external toolsets, it does need some thinking about the design. Here is what I think:
For 1. and 2.i or 2.ii, this should be easy. It is the same as the BLAT request and we don't even need to process more complicated parsing for the returned information. For 2.iii and 3., yes, because their complexity, it should have lower priority comparing to the others. |
Actually there is a lot of complex parsing of the server response from a blat request. The response to the post has to be handled, otherwise nothing will happen. Do you have a working example in one of your branches? |
@jrobinso, yes. I mean you do need to parse the BLAT output. The easier thing to do is not to display the information inside IGV so IGV does not need to parsing the return info. The attached screen shots show how I use it now. This will enable many related applications that needs more extensive backend database support and IGV will be the front-end for navigation. |
So in this example the server is using some push technology, and IGV does not need to look at the response? How do you let IGV know the url, and syntax of the post body? The URL would probably be recorded in the prefs.properties file. Do you want to prepare a pull request for this one? We can pull it in and continue to refine it. |
BTW ultimately I think we should support a structured response, probably json, that can encode either 2i, 2ii, or 2iii. The json could also encode any error or user messages the server wants to send. If the response is empty or not recognized IGV would do nothing, as in your "push" example. |
The URL to the HTTP server is hard coded in the example now. Let me spend sometime understand how to pull information from prefs.properties and I will submit a cleaner PR after that. It will probably take about 1 week. Yes, I use websocket to push update from the local server to the web page. I think JSON is great for the response. Maybe something like this minimum return
|
You can leave it hardcoded for the PR, I will clean that up. If you (1) add a key constant in PreferenceManager (2) add your property to prefs.properties in the form =. (3) access the property with On 7/14/16 1:38 PM, Jason Chin wrote:
|
@jrobinso I have a couple of other possible improvements for dealing with long read alignments (whether they be PacBio or assembled contigs from short reads or something). Specifically, I'm interested in piecing together the primary and supplementary alignments of a long sequence. Some of the improvements to the tool tips in #272 will be very helpful for this, ie the display of left and right clipping. What would also be useful might be the ability to:
Let me know if those aren't clear. @SHuang-Broad do you have any more suggestions for this type of thing? |
@cwhelan that is clear. I also intended to extend the "linked read" view, from the 10x prototype feature, for chimeric ("supplementary") alignments. Would that be useful? It would be helpful to have some test data with chimeric reads (i.e supplementary reads with SA tags). Could you point me to some? You can email me directory on this. |
My $0.02:
|
@SHuang-Broad good suggestions. RE (2), when playing with the 10x view I found myself wanting exactly what you suggest, having all linked reads light up when one is moused over. The colors there are emulating the equivalent "Loupe" view, but I'm not sure its really useful. More than 5 or 6 colors is impossible to distinguish clearly. I think an explicit "SV" mode might make sense, in this mode we might drop certain details (perhaps even the read sequence & snps) and jus emphasize SV information, at much wider genomic ranges than we typically do. |
This helps to export useful data to an external server for an extended view on a PacBio read or a feature.
I have implementations for 1 (quick consensus), 2 (hide small indels), 3 (label large indels), 5 (variation at low zoom), 6 (group by SNV), and some ideas for 8 (performance) in my fork.
Sorry for the opaque comment. To elaborate: The
How do you recommend to run CPU profiling? I can do it if you point me to write tool. In this case, the difference is stark enough that you can see it (and hear it from the CPU fan). I posted a sample BAM and a video of scrolling through hg38 chr7:114,319,594-114,323,597 with that BAM using two versions of IGV. Left (the one that lags) is the current IGV with In general, I think performance could be improved by creating fewer drawing contexts; perhaps they could be created and organized in a global singleton object. |
I think this is a great idea. Connecting primary and supplementary alignments of a read does make it dramatically easier to see structural variants. One caveat of which to be aware when connecting primary and supplementary alignments is that both of the alignments are extended locally to improve the alignment score. It is possible (and in fact common) that the primary and supplementary alignments reuse some of the same bases from the original read:
Interesting idea if we could define it properly. It is not too hard with Illumina reads, which should have sharp clipping boundaries. PacBio reads will require that the definition of the "same clipping" location be somewhat relaxed (e.g. +/- a few bp). I think simply having the gold tips (idea 4 in the first post in the issue) on individual reads will help a lot. That will make it possible to see at a glance whether there are many clipped alignments in a window and whether the clipping is from one direction or both.
If it does not overwhelm, I think it is nice to maintain some of the small-scale information even at low zooms. In particular, it is nice to see haplotype structure and identify single nucleotide variants in/out of phase with structural variants. It would be hard to see that if structural variants were only visible at low zoom and single nucleotide variants were visible only at high zoom. |
@amwenger Thanks for elaborating. In general graphics contexts are cached and reused, however not consistently. I agree this is an area that can be improved on. For profiling I use JProfiler, there are other tools, including some built into the JDK, but I don't know much about them. |
@amwenger I absolutely agree on the complexity of overlapping supplementary alignments. There are also plenty of other weird cases -- for example two different aligned chunks of the long read can end up overlapping on the reference, indicating a duplication or tandem repeat expansion. @jrobinso In regards to this I've also been thinking about some sort of a popup for each long read that would display all the supplementary alignments together in context. Ideally this would be a visual representation like in @amwenger's picture above, but even a list similar to what's currently in the "BLAST read sequence" results popup would be helpful. |
Simple example implementing issue #277 Item 9
Re point 8 performance, with longer CIGAR strings (currently using MinION data) with 1.5kb-3kb reads performance is really poor, IGV just hangs at 100% CPU load for minutes on end before rending anything, I've only got a BAM with 5k reads too, just at high depth for a few select areas, hanging appears to be random - some times IGV works fine other times it fails about 50% of the time. I'm using version 2.4.5 |
Also because reads are not paired it's difficult to track down the secondary or split reads to investigate translocations etc. So some of the utility you had with paired end reads is now missing. |
@MattBashton Could you possibly supply a test bam file to reproduce this problem? I'm not experiencing that with the PacBio test data I have, but then I don't have anything with deep coverage. Just a small slice around some deep coverage would probably suffice. Also, maybe open a new issue for the second issue raised. I think there is a tag we could use to restore some or all of the paired-end functionality (jump to mate / view mate in split screen). I need to investigate but open a new ticket and we'll continue from there. |
A quick samtools view should reveal there about 4 main locations most of the reads fall in, swapping between those locations by pasting in the co-ordinates in to the search bar should trigger the issue as should panning around, the hang up appears to be a bit random, sometimes IGV is fine other times it gets stuck, but mostly occurs after viewing only a handful of locations. I produced these files via minimap2 then samtools 1.6 |
Can you reproduce the issue with this example bam file? I can't so far. If you can produce it give me the genomic location or any other information that might be relevant. Also, look at igv.log in the igv folder (under user home) for stack traces, or just attach it here. |
I'll try pin down a set of co-ordinates and operations, will also check logs for stack trace. |
Ok I've now replicated this three times over. I have set alignment downsampling off - this might be relevant! Using I'm using Hg38 from IGVs own list, assuming the built in aliases handle my usage of GRCh38 from Ensembl as a ref here. Jumpt to: 10:86078632 Zoom out twice, some time issue will trigger here, some times it won't. I think the issue might be with parsing the BAM. Then jump to: 10:133667016 And again zoom out you should now have the spinning blue ball freeze if you've not got it from the first jump. This is what I get in the log all the freezes are caused by the same execption:
|
Just to add I've now upgraded to minimap2.6 which appears to have slightly different SAM output (the header is now present and correct) however the same issue is occurs with IGV freezing up on 100% CPU usage after jumping to the second region, eventually after spamming the zoom out button I finally got IGV to render the region, so it looks like possibly the unresponsiveness can be rescued. These files can be found here: https://www.dropbox.com/s/41ea1x4rpexsc5d/mm2.6_test_L.bam?dl=0 The error in the log is as before:
|
OK, thanks for the investigative work. I will try again. Sorry for the
delay, many things happening in parallel right now.
…On Mon, Dec 18, 2017 at 2:30 AM, Matthew Bashton ***@***.***> wrote:
Just to add I've now upgraded to minimap2.6 which appears to have slightly
different SAM output (the header is now present and correct) however the
same issue is occurs with IGV freezing up on 100% CPU usage after jumping
to the second region, eventually after spamming the zoom out button I
finally got IGV to render the region, so it looks like possibly the
unresponsiveness can be rescued. These files can be found here:
https://www.dropbox.com/s/41ea1x4rpexsc5d/mm2.6_test_L.bam?dl=0
https://www.dropbox.com/s/maty2ntpr1hrvsj/mm2.6_test_L.bam.bai?dl=0
The error in the log is as before:
INFO [2017-12-18 10:22:36,852] [Main.java:155] Java 1.8.0_151
INFO [2017-12-18 10:22:36,853] [DirectoryManager.java:76] Fetching user directory...
INFO [2017-12-18 10:22:36,951] [Main.java:156] Default User Directory: /Users/bashton
INFO [2017-12-18 10:22:36,951] [Main.java:157] OS: Mac OS X
INFO [2017-12-18 10:22:45,780] [GenomeManager.java:182] Loading genome: /Users/bashton/igv/genomes/hg38.genome
INFO [2017-12-18 10:22:50,016] [GenomeComboBox.java:79] Enter genome combo box
INFO [2017-12-18 10:22:50,035] [GenomeManager.java:271] Genome loaded. id= hg38
INFO [2017-12-18 10:22:50,162] [CommandListener.java:120] Listening on port 60151
INFO [2017-12-18 10:23:01,457] [IGV.java:1383] Loading 1 resources.
INFO [2017-12-18 10:23:01,458] [TrackLoader.java:126] Loading resource, path /Users/bashton/Desktop/mm2.6_test_L.bam
INFO [2017-12-18 10:23:56,245] [HttpUtils.java:873] Range-byte request succeeded
ERROR [2017-12-18 10:24:18,699] [DataPanel.java:252] Error:
java.util.concurrent.CompletionException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.remove(ArrayList.java:496)
at java.util.Collections$SynchronizedList.remove(Collections.java:2426)
at org.broad.igv.sam.AlignmentDataManager.trimCache(AlignmentDataManager.java:333)
at org.broad.igv.sam.AlignmentDataManager.load(AlignmentDataManager.java:297)
at org.broad.igv.sam.CoverageTrack.load(CoverageTrack.java:184)
at org.broad.igv.ui.panel.DataPanel.lambda$load$3(DataPanel.java:225)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
... 3 more```
The input files are small and I'm reading them from SSD.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#277 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA49HOT0mQqqrhgtr-fEX3SgDwwT5EVYks5tBj7KgaJpZM4JKHGd>
.
|
BTW setting downsampling off will absolutely freeze IGV on very deep
coverage, that's why it is implemented, but pacbio reads are not
especially deep so ti should have no effect. But to clarify Is
downsampling off for all of these test cases?
…On Sat, Dec 16, 2017 at 5:29 AM, Matthew Bashton ***@***.***> wrote:
Ok I've now replicated this three times over.
I have set alignment downsampling off - this might be relevant!
Using I'm using Hg38 from IGVs own list, assuming the built in aliases
handle my usage of GRCh38 from Ensembl as a ref here.
Jumpt to:
10:86078632
Zoom out twice, some time issue will trigger here, some times it won't. I
think the issue might be with parsing the BAM.
Then jump to:
10:133667016
And again zoom out you should now have the spinning blue ball freeze if
you've not got it from the first jump.
This is what I get in the log all the freezes are caused by the same
execption:
INFO [2017-12-16 13:18:38,111] [Main.java:154] Startup IGV Version 2.4.5 12/14/2017 01:18 AM
INFO [2017-12-16 13:18:38,112] [Main.java:155] Java 1.8.0_152
INFO [2017-12-16 13:18:38,112] [DirectoryManager.java:76] Fetching user directory...
INFO [2017-12-16 13:18:38,200] [Main.java:156] Default User Directory: /Users/bashton
INFO [2017-12-16 13:18:38,201] [Main.java:157] OS: Mac OS X
INFO [2017-12-16 13:18:49,444] [GenomeManager.java:182] Loading genome: /Users/bashton/igv/genomes/hg38.genome
INFO [2017-12-16 13:18:52,987] [GenomeComboBox.java:79] Enter genome combo box
INFO [2017-12-16 13:18:53,006] [GenomeManager.java:271] Genome loaded. id= hg38
INFO [2017-12-16 13:18:53,164] [CommandListener.java:120] Listening on port 60151
INFO [2017-12-16 13:19:00,609] [IGV.java:1383] Loading 1 resources.
INFO [2017-12-16 13:19:00,610] [TrackLoader.java:126] Loading resource, path /Users/bashton/Dropbox/LRCG/Test_IGV_BAM/barcode01.bam
INFO [2017-12-16 13:19:05,265] [HttpUtils.java:873] Range-byte request succeeded
ERROR [2017-12-16 13:19:43,830] [DataPanel.java:252] Error:
java.util.concurrent.CompletionException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.remove(ArrayList.java:496)
at java.util.Collections$SynchronizedList.remove(Collections.java:2426)
at org.broad.igv.sam.AlignmentDataManager.trimCache(AlignmentDataManager.java:333)
at org.broad.igv.sam.AlignmentDataManager.load(AlignmentDataManager.java:297)
at org.broad.igv.sam.CoverageTrack.load(CoverageTrack.java:184)
at org.broad.igv.ui.panel.DataPanel.lambda$load$3(DataPanel.java:225)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
... 3 more
INFO [2017-12-16 13:21:19,583] [ShutdownThread.java:47] Shutting down
INFO [2017-12-16 13:21:19,608] [ShutdownThread.java:47] Shutting down
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#277 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA49HCUbdBkoEuEnhleyZxqwHnGRduarks5tA8W0gaJpZM4JKHGd>
.
|
Hey thanks for getting back to me, yes downsampling is off for these test cases, however some regions are indeed deep owing to targeted nature of experiment, but no deeper than I normally use with illumina short reads were I have no issues with IGV. My JVM is 8GB and I'm not anywhere near the limit on that either if that helps. |
Hi all, sorry to post my bug report into this thread. A google search for I am also observing a |
It looks like my error is similar to #499 |
Hey all,this was opened as a discussion thread for which it was really useful, but there are many disparate issues here and so it remains perpetually open. I am going to close it, if there is a specific issue not addressed that you think should be please open an issue focused on that, along with steps to reproduce including test data if applicable. |
IGV is a very useful viewer for PacBio long read data, but it could be even better with a few modifications. Ideas to improve support for PacBio data are:
I have a version of many of these changes in a personal fork. I am happy to clean them up and contribute them to the main project.
The text was updated successfully, but these errors were encountered: