Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved support for PacBio reads #277

Closed
amwenger opened this issue Jul 12, 2016 · 39 comments
Closed

Improved support for PacBio reads #277

amwenger opened this issue Jul 12, 2016 · 39 comments

Comments

@amwenger
Copy link
Contributor

IGV is a very useful viewer for PacBio long read data, but it could be even better with a few modifications. Ideas to improve support for PacBio data are:

  1. Add a "quick consensus" mode that only shows mismatches at positions with a consistent alternative variant. The coverage track has logic that uses an allele threshold to determine whether to show variation at a position. Add a "quick consensus" option and apply the same logic to mismatches within a read. This would greatly hide the "random errors" in long reads and make it possible to see haplotype structure by eye. It would look something like this:
    quick-consensus
  2. Hide small indels at low zoom. Individual PacBio reads have random small deletions errors that show as a series of black dots at low zoom (see picture below). Hide deletions that occupy <3px of at the current zoom.
    many-deletions
  3. Label the size of large insertions and deletions. Modify the "Flag insertions large than N bases" option to be "Label insertions and deletions larger than N bases". "Large" insertions and deletions would show at all zoom levels. One idea for how to show the variant size is to use a filled upward pointing trapezoid for insertions and a hollow download pointing trapezoid for deletions:
    insertion-size
    deletion-size
  4. Show clipping information at the end of reads. Read clipping can indicate the presence of a structural variant. Show the number of clipped bases in a "cap" at the end of reads:
    cliipping-cap
  5. Show variation at low zoom to enable viewing haplotype structure. I would prefer to see SNVs and large indels even when zoomed out to 100kb+.
  6. Add a "group by SNV" feature to provide a "quick phasing" of reads. Right click on a position would provide the option to group reads based on the basepair at that position. By selecting a position with a heterozygous SNV, the reads could be "phased" into haplotypes:
    group-by-snv
  7. Color basepairs based on interpulse distances, which indicate methylation status. The interpulse distance is provided in the "ip" SAM annotation. Instead of the standard "gray" background color, this would show a different shade at each basepair as a function of IPD.
  8. Improve performance. Rendering PacBio alignments in IGV is often slow, and it is not practical currently to render read information at low zoom. One cause is that PacBio reads have frequent indel errors that break the alignment into many, many CIGAR blocks. Some of the rendering logic is performed per CIGAR block, not per alignment. Thus, rendering PacBio reads is much more expensive than the equivalent coverage in Illumina reads, which are often 1-2 blocks per alignment. For example, 40-fold coverage over 1kb would be something like 40 PacBio reads, each broken into ~100 CIGAR blocks for a total of 4,000 CIGAR blocks; it would require 400 Illumina reads, each broken into ~1 CIGAR block for a total of 400 blocks. So, the estimated cost is 10x higher for PacBio reads when operations are performed per block and not per alignment.
  9. Add a generic "Send read to URL" feature that is like the "Blat read sequence" option but supports user-defined URLs. Some data representation (e.g. read v ref dotplot) is difficult to show within IGV but could be built as a separate web application. The user should be able to add new URLs and define which information is sent with a request: read name, read sequence, reference span, reference sequence, and CIGAR string, and others.
  10. Color / shade basepairs based on percent identity with the reference sequence in a sliding window (say +/-10bp). This would serve as a simple empirical base QV score and would identify low-quality regions of a read.

I have a version of many of these changes in a personal fork. I am happy to clean them up and contribute them to the main project.

@jrobinso
Copy link
Contributor

I don't understand (8), what do you mean by "operations are performed per block". Could you elaborate with cpu profiling data? The solution we applied earlier to this problem was to set a filter and ignore indels < some size, combining the adjacent blocks. I think the cpu cost is in the drawing operations, it doesn't matter if you loop through 40,000 blocks or 400 blocks if you draw the same elements you will incur the same cost. At least that is my recollection of previous profiling, some hard data is needed here and anywhere where performance is being discussed.

@jrobinso
Copy link
Contributor

Which of these do you have solutions for in your personal fork?

@pb-jchin
Copy link

Hi, @jrobinso , for (9), you can see some code I modify from the code for blat here: https://github.com/pb-jchin/igv/blob/ExtView/src/org/broad/igv/util/extview/ExtendViewClient.java
(We also have to add extra corresponding menu items in some other files.)

Ideally, if there is a generic mini language to send the meta-data/data of current view, selected read and selected features then we can pass the data to external viewer to fetch extra information that might need some other database backend for visualization without modifying the source code. If possible, the IGV can take the HTTP request returns to display it (SVG or PNG data, etc.). Or, the server can return a URL and IGV can initiate a web browser pointing to the URL, that will be great too. I have some example. If it is useful, I can make a screencast to show an example.

@bnbowman
Copy link

@jrobinso I also use IGV regularly to analyze PacBio raw reads, and would greatly appreciate the suggested changes - Particularly (1) and (2)

@jrobinso
Copy link
Contributor

@pb-jchin @amwenger Do you guys have coded solutions for (1) and (2). If not I'm going to proceed with my own. In general I'd like to get your contributions merged within the next few weeks, I'm planning to do some restructuring and simplification of the Alignment model and merging later might be difficult.

@amwenger
Copy link
Contributor Author

amwenger commented Jul 14, 2016

I do have solutions to 1 and 2 in
https://github.com/amwenger/igv/tree/amw-pb-consensus-mode. I will prepare
PRs soon.

@jrobinso
Copy link
Contributor

@pb-jchin wrt (9), I can work with the code you have above to add a "Send read to URL" function, however I think we should nail down what can be returned from the post a little more tightly. I suggest we defer this one a bit and concentrate on some of the others as its easy to add this at any time. My time is really limited, and with 9 items we need to prioritize.

@jrobinso
Copy link
Contributor

@cwhelan if you guys have any input on the PacBio improvements (see items 1-9 above) chime in.

@pb-jchin
Copy link

@jrobinso Yes, for (9), if one wants to be more general about the communication between IGV and external toolsets, it does need some thinking about the design.

Here is what I think:

  1. for SAM/BAM reads, the metadata/data set is well defined, so it is easier.
  2. for HTTP request return processing, we can consider multiple levels of support
    1. IGV doesn't need to catch return, it is up to the server to ensure correct query are caught
    2. IGV catches simple return to give user feedback that the HTTP is sent and display the server return message
    3. IGV catches information-rich return, URL, IMAGES, etc., and process accordingly
  3. for features, this is more complicated as BED / GFF, etc., can contain many fields that are not strictly defined. This indeed needs careful thinking.

For 1. and 2.i or 2.ii, this should be easy. It is the same as the BLAT request and we don't even need to process more complicated parsing for the returned information. For 2.iii and 3., yes, because their complexity, it should have lower priority comparing to the others.

@jrobinso
Copy link
Contributor

Actually there is a lot of complex parsing of the server response from a blat request. The response to the post has to be handled, otherwise nothing will happen.

Do you have a working example in one of your branches?

@pb-jchin
Copy link

pb-jchin commented Jul 14, 2016

@jrobinso, yes. I mean you do need to parse the BLAT output. The easier thing to do is not to display the information inside IGV so IGV does not need to parsing the return info.

The attached screen shots show how I use it now. This will enable many related applications that needs more extensive backend database support and IGV will be the front-end for navigation.

On IGV side
scr2016-07-14_11-03-09_am

On a web browser
scr2016-07-14_11-05-25_am

@jrobinso
Copy link
Contributor

So in this example the server is using some push technology, and IGV does not need to look at the response? How do you let IGV know the url, and syntax of the post body? The URL would probably be recorded in the prefs.properties file. Do you want to prepare a pull request for this one? We can pull it in and continue to refine it.

@jrobinso
Copy link
Contributor

BTW ultimately I think we should support a structured response, probably json, that can encode either 2i, 2ii, or 2iii. The json could also encode any error or user messages the server wants to send. If the response is empty or not recognized IGV would do nothing, as in your "push" example.

@pb-jchin
Copy link

pb-jchin commented Jul 14, 2016

@jrobinso

The URL to the HTTP server is hard coded in the example now. Let me spend sometime understand how to pull information from prefs.properties and I will submit a cleaner PR after that. It will probably take about 1 week.

Yes, I use websocket to push update from the local server to the web page.

I think JSON is great for the response. Maybe something like this

minimum return

{"status": "OK|ERROR|others",  
 "msg":"some text message for IGV to show",  
 "payload": (other JSON objects for SVG/PNG/URL or instruction for IGV to display the objects etc.)}

@jrobinso
Copy link
Contributor

You can leave it hardcoded for the PR, I will clean that up. If you
want to use preferences the steps are

(1) add a key constant in PreferenceManager

(2) add your property to prefs.properties in the form =.
See you existing prefs.properties for examples

(3) access the property with
PreferenceManager.getInstance().get(PreferenceManager.YOUR_KEY_CONSTANT)

On 7/14/16 1:38 PM, Jason Chin wrote:

@jrobinso https://github.com/jrobinso

The URL to the HTTP server is hard coded in the example now. Let me
spend sometime understand how to pull information from
prefs.properties and I will submit a cleaner PR after that. It will
probably take about 1 week.

I think JSON for the response. Maybe something like this

minimum returen

|{"status": "OK|ERROR|others", "msg":"some text message for IGV to
show", "payload": (other JSON objects for SVG/PNG/URL or instruction
for IGV to display the objects etc.)} |


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#277 (comment), or
mute the thread
https://github.com/notifications/unsubscribe/AA49HJ4-Mc4xp3WhBk3Bt0awr8xbIbeIks5qVp5cgaJpZM4JKHGd.

@cwhelan
Copy link

cwhelan commented Jul 15, 2016

@jrobinso I have a couple of other possible improvements for dealing with long read alignments (whether they be PacBio or assembled contigs from short reads or something). Specifically, I'm interested in piecing together the primary and supplementary alignments of a long sequence. Some of the improvements to the tool tips in #272 will be very helpful for this, ie the display of left and right clipping. What would also be useful might be the ability to:

  1. Color all the primary and supplementary alignments of the same read the same color. Essentially this would be similar to color by -> read name, but we'd want the ability to only select a single read at a time, of course.

  2. Some sort of functionality like: select a read -> right click -> "go to next/previous alignment in read". This would read the alternate mapping locations in the SA tag of the read, figure out which one represents the next aligned chunk of the read (sorted by coordinates on the read, not on the reference), and allow you to jump to it (or back to the previous).

Let me know if those aren't clear. @SHuang-Broad do you have any more suggestions for this type of thing?

@jrobinso
Copy link
Contributor

@cwhelan that is clear. I also intended to extend the "linked read" view, from the 10x prototype feature, for chimeric ("supplementary") alignments. Would that be useful? It would be helpful to have some test data with chimeric reads (i.e supplementary reads with SA tags). Could you point me to some? You can email me directory on this.

@SHuang-Broad
Copy link

My $0.02:

  1. Echoing @cwhelan 's coloring suggestion, I think the soft clipped bases could be displayed with a single color that will NOT be "mixing well" with the highlight color of reads/contigs.
  2. Linking multiple alignment record for the same read/contig by color is good, but when there are many colors, the human eye is probably not going to be able to distinguish them. What could solve this problem, is to hove over one alignment record, and its peers (other alignment records for the same read/contig) "blinking/boxed" (if not displayable in the current view, have a pop-up?).
  3. It might be useful to have a feature similar to the "Allele Fraction" information on the coverage track for reads. Here, we are not looking for SNP allele fractions, but rather "the fraction of reads that are clipped at the same (or almost the same) position on the ref". Or fraction of reads that have strange insert size, pair orientation.
  4. I do feel that we are probably asking for a "SV mode" in IGV, because there are many features that are quite valuable offered by IGV when viewing short variants but gives less signal in SV mode. When taking the broader view, details can be forgiven sometimes.

@jrobinso
Copy link
Contributor

@SHuang-Broad good suggestions. RE (2), when playing with the 10x view I found myself wanting exactly what you suggest, having all linked reads light up when one is moused over. The colors there are emulating the equivalent "Loupe" view, but I'm not sure its really useful. More than 5 or 6 colors is impossible to distinguish clearly.

I think an explicit "SV" mode might make sense, in this mode we might drop certain details (perhaps even the read sequence & snps) and jus emphasize SV information, at much wider genomic ranges than we typically do.

pb-jchin pushed a commit to pb-jchin/igv that referenced this issue Jul 17, 2016
This helps to export useful data to an external server for an extended view on a
PacBio read or a feature.
@amwenger
Copy link
Contributor Author

amwenger commented Jul 18, 2016

Which of these do you have solutions for in your personal fork?

I have implementations for 1 (quick consensus), 2 (hide small indels), 3 (label large indels), 5 (variation at low zoom), 6 (group by SNV), and some ideas for 8 (performance) in my fork.

I don't understand (8), what do you mean by "operations are performed per block". Could you elaborate with cpu profiling data? The solution we applied earlier to this problem was to set a filter and ignore indels < some size, combining the adjacent blocks. I think the cpu cost is in the drawing operations, it doesn't matter if you loop through 40,000 blocks or 400 blocks if you draw the same elements you will incur the same cost. At least that is my recollection of previous profiling, some hard data is needed here and anywhere where performance is being discussed.

Sorry for the opaque comment. To elaborate: The drawBases() method in sam/AlignmentRenderer.java is called once for each alignment block. It does two expensive operations that could be moved higher up the call stack and performed once per alignment (or even better once per render):

  1. Create a graphics context on which to draw the bases: Graphics2D g = (Graphics 2D) context.getGraphics().create();. The context could be created at a higher level and passed to drawBases(). This seems to be a very expensive operation.
  2. Obtain the reference genome sequence against which to compare the read: genome.getSequence(chr, start, end). This currently allocates a new byte array, which while less expensive than drawing, is still a costly operation.

How do you recommend to run CPU profiling? I can do it if you point me to write tool. In this case, the difference is stark enough that you can see it (and hear it from the CPU fan). I posted a sample BAM and a video of scrolling through hg38 chr7:114,319,594-114,323,597 with that BAM using two versions of IGV. Left (the one that lags) is the current IGV with if (2 > 1) { return; } added immediately after Graphics2D g = (Graphics 2D) context.getGraphics().create(); in drawBases(). Right (the smooth one) has that if statement as the first line in drawBases().

In general, I think performance could be improved by creating fewer drawing contexts; perhaps they could be created and organized in a global singleton object.

@amwenger
Copy link
Contributor Author

amwenger commented Jul 18, 2016

@cwhelan @SHuang-Broad

Specifically, I'm interested in piecing together the primary and supplementary alignments of a long sequence. Some of the improvements to the tool tips in #272 will be very helpful for this, ie the display of left and right clipping.

I think this is a great idea. Connecting primary and supplementary alignments of a read does make it dramatically easier to see structural variants. One caveat of which to be aware when connecting primary and supplementary alignments is that both of the alignments are extended locally to improve the alignment score. It is possible (and in fact common) that the primary and supplementary alignments reuse some of the same bases from the original read:
primary suppl
In this example, a simple visualization that connects the alignments would imply that the read supports a deletion of block C. In fact, it supports a deletion of blocks B and C. One way to handle that for cases of only two alignment blocks (one primary and one supplementary) is to highlight the bases that are reused.

It might be useful to have a feature similar to the "Allele Fraction" information on the coverage track for reads. Here, we are not looking for SNP allele fractions, but rather "the fraction of reads that are clipped at the same (or almost the same) position on the ref". Or fraction of reads that have strange insert size, pair orientation.

Interesting idea if we could define it properly. It is not too hard with Illumina reads, which should have sharp clipping boundaries. PacBio reads will require that the definition of the "same clipping" location be somewhat relaxed (e.g. +/- a few bp).

I think simply having the gold tips (idea 4 in the first post in the issue) on individual reads will help a lot. That will make it possible to see at a glance whether there are many clipped alignments in a window and whether the clipping is from one direction or both.

I do feel that we are probably asking for a "SV mode" in IGV, because there are many features that are quite valuable offered by IGV when viewing short variants but gives less signal in SV mode. When taking the broader view, details can be forgiven sometimes.

If it does not overwhelm, I think it is nice to maintain some of the small-scale information even at low zooms. In particular, it is nice to see haplotype structure and identify single nucleotide variants in/out of phase with structural variants. It would be hard to see that if structural variants were only visible at low zoom and single nucleotide variants were visible only at high zoom.

@jrobinso
Copy link
Contributor

@amwenger Thanks for elaborating. In general graphics contexts are cached and reused, however not consistently. I agree this is an area that can be improved on. For profiling I use JProfiler, there are other tools, including some built into the JDK, but I don't know much about them.

@cwhelan
Copy link

cwhelan commented Jul 18, 2016

@amwenger I absolutely agree on the complexity of overlapping supplementary alignments. There are also plenty of other weird cases -- for example two different aligned chunks of the long read can end up overlapping on the reference, indicating a duplication or tandem repeat expansion. @jrobinso In regards to this I've also been thinking about some sort of a popup for each long read that would display all the supplementary alignments together in context. Ideally this would be a visual representation like in @amwenger's picture above, but even a list similar to what's currently in the "BLAST read sequence" results popup would be helpful.

jrobinso added a commit that referenced this issue Jul 18, 2016
Simple example implementing issue #277 Item 9
@jrobinso
Copy link
Contributor

@amwenger @cwhelan A test bam with supplementary alignments, and a a list of region(s) containing some of the interesting cases (e.g. overlapping supplementary alignments, alignments sharing bases, etc) would be helpful. Actually essential to make progress. Thanks.

@jrobinso
Copy link
Contributor

@amwenger I created a separate ticket for the performance issue #284

@MattBashton
Copy link

Re point 8 performance, with longer CIGAR strings (currently using MinION data) with 1.5kb-3kb reads performance is really poor, IGV just hangs at 100% CPU load for minutes on end before rending anything, I've only got a BAM with 5k reads too, just at high depth for a few select areas, hanging appears to be random - some times IGV works fine other times it fails about 50% of the time. I'm using version 2.4.5

@MattBashton
Copy link

Also because reads are not paired it's difficult to track down the secondary or split reads to investigate translocations etc. So some of the utility you had with paired end reads is now missing.

@jrobinso
Copy link
Contributor

@MattBashton Could you possibly supply a test bam file to reproduce this problem? I'm not experiencing that with the PacBio test data I have, but then I don't have anything with deep coverage. Just a small slice around some deep coverage would probably suffice. Also, maybe open a new issue for the second issue raised. I think there is a tag we could use to restore some or all of the paired-end functionality (jump to mate / view mate in split screen). I need to investigate but open a new ticket and we'll continue from there.

@MattBashton
Copy link

MattBashton commented Dec 15, 2017

A quick samtools view should reveal there about 4 main locations most of the reads fall in, swapping between those locations by pasting in the co-ordinates in to the search bar should trigger the issue as should panning around, the hang up appears to be a bit random, sometimes IGV is fine other times it gets stuck, but mostly occurs after viewing only a handful of locations. I produced these files via minimap2 then samtools 1.6
https://www.dropbox.com/s/jlowijdwjvt28x1/barcode01.bam?dl=0
https://www.dropbox.com/s/5i24fgt5kjyopze/barcode01.bam.bai?dl=0

@jrobinso
Copy link
Contributor

Can you reproduce the issue with this example bam file? I can't so far. If you can produce it give me the genomic location or any other information that might be relevant. Also, look at igv.log in the igv folder (under user home) for stack traces, or just attach it here.

@MattBashton
Copy link

I'll try pin down a set of co-ordinates and operations, will also check logs for stack trace.

@MattBashton
Copy link

Ok I've now replicated this three times over.

I have set alignment downsampling off - this might be relevant!

Using I'm using Hg38 from IGVs own list, assuming the built in aliases handle my usage of GRCh38 from Ensembl as a ref here.

Jumpt to:

10:86078632

Zoom out twice, some time issue will trigger here, some times it won't. I think the issue might be with parsing the BAM.

Then jump to:

10:133667016

And again zoom out you should now have the spinning blue ball freeze if you've not got it from the first jump.

This is what I get in the log all the freezes are caused by the same execption:

INFO [2017-12-16 13:18:38,111] [Main.java:154]  Startup  IGV Version 2.4.5 12/14/2017 01:18 AM
INFO [2017-12-16 13:18:38,112] [Main.java:155]  Java 1.8.0_152
INFO [2017-12-16 13:18:38,112] [DirectoryManager.java:76]  Fetching user directory... 
INFO [2017-12-16 13:18:38,200] [Main.java:156]  Default User Directory: /Users/bashton
INFO [2017-12-16 13:18:38,201] [Main.java:157]  OS: Mac OS X
INFO [2017-12-16 13:18:49,444] [GenomeManager.java:182]  Loading genome: /Users/bashton/igv/genomes/hg38.genome
INFO [2017-12-16 13:18:52,987] [GenomeComboBox.java:79]  Enter genome combo box
INFO [2017-12-16 13:18:53,006] [GenomeManager.java:271]  Genome loaded.  id= hg38
INFO [2017-12-16 13:18:53,164] [CommandListener.java:120]  Listening on port 60151
INFO [2017-12-16 13:19:00,609] [IGV.java:1383]  Loading 1 resources.
INFO [2017-12-16 13:19:00,610] [TrackLoader.java:126]  Loading resource, path /Users/bashton/Dropbox/LRCG/Test_IGV_BAM/barcode01.bam
INFO [2017-12-16 13:19:05,265] [HttpUtils.java:873]  Range-byte request succeeded
ERROR [2017-12-16 13:19:43,830] [DataPanel.java:252]  Error: 
java.util.concurrent.CompletionException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.remove(ArrayList.java:496)
        at java.util.Collections$SynchronizedList.remove(Collections.java:2426)
        at org.broad.igv.sam.AlignmentDataManager.trimCache(AlignmentDataManager.java:333)
        at org.broad.igv.sam.AlignmentDataManager.load(AlignmentDataManager.java:297)
        at org.broad.igv.sam.CoverageTrack.load(CoverageTrack.java:184)
        at org.broad.igv.ui.panel.DataPanel.lambda$load$3(DataPanel.java:225)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
        ... 3 more
INFO [2017-12-16 13:21:19,583] [ShutdownThread.java:47]  Shutting down
INFO [2017-12-16 13:21:19,608] [ShutdownThread.java:47]  Shutting down

@MattBashton
Copy link

Just to add I've now upgraded to minimap2.6 which appears to have slightly different SAM output (the header is now present and correct) however the same issue is occurs with IGV freezing up on 100% CPU usage after jumping to the second region, eventually after spamming the zoom out button I finally got IGV to render the region, so it looks like possibly the unresponsiveness can be rescued. These files can be found here:

https://www.dropbox.com/s/41ea1x4rpexsc5d/mm2.6_test_L.bam?dl=0
https://www.dropbox.com/s/maty2ntpr1hrvsj/mm2.6_test_L.bam.bai?dl=0

The error in the log is as before:

INFO [2017-12-18 10:22:36,852] [Main.java:155]  Java 1.8.0_151
INFO [2017-12-18 10:22:36,853] [DirectoryManager.java:76]  Fetching user directory... 
INFO [2017-12-18 10:22:36,951] [Main.java:156]  Default User Directory: /Users/bashton
INFO [2017-12-18 10:22:36,951] [Main.java:157]  OS: Mac OS X
INFO [2017-12-18 10:22:45,780] [GenomeManager.java:182]  Loading genome: /Users/bashton/igv/genomes/hg38.genome
INFO [2017-12-18 10:22:50,016] [GenomeComboBox.java:79]  Enter genome combo box
INFO [2017-12-18 10:22:50,035] [GenomeManager.java:271]  Genome loaded.  id= hg38
INFO [2017-12-18 10:22:50,162] [CommandListener.java:120]  Listening on port 60151
INFO [2017-12-18 10:23:01,457] [IGV.java:1383]  Loading 1 resources.
INFO [2017-12-18 10:23:01,458] [TrackLoader.java:126]  Loading resource, path /Users/bashton/Desktop/mm2.6_test_L.bam
INFO [2017-12-18 10:23:56,245] [HttpUtils.java:873]  Range-byte request succeeded
ERROR [2017-12-18 10:24:18,699] [DataPanel.java:252]  Error: 
java.util.concurrent.CompletionException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.remove(ArrayList.java:496)
        at java.util.Collections$SynchronizedList.remove(Collections.java:2426)
        at org.broad.igv.sam.AlignmentDataManager.trimCache(AlignmentDataManager.java:333)
        at org.broad.igv.sam.AlignmentDataManager.load(AlignmentDataManager.java:297)
        at org.broad.igv.sam.CoverageTrack.load(CoverageTrack.java:184)
        at org.broad.igv.ui.panel.DataPanel.lambda$load$3(DataPanel.java:225)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
        ... 3 more```

The input files are small and I'm reading them from SSD.

@jrobinso
Copy link
Contributor

jrobinso commented Dec 20, 2017 via email

@jrobinso
Copy link
Contributor

jrobinso commented Dec 20, 2017 via email

@MattBashton
Copy link

Hey thanks for getting back to me, yes downsampling is off for these test cases, however some regions are indeed deep owing to targeted nature of experiment, but no deeper than I normally use with illumina short reads were I have no issues with IGV. My JVM is 8GB and I'm not anywhere near the limit on that either if that helps.

@winni2k
Copy link

winni2k commented Jan 14, 2018

Hi all, sorry to post my bug report into this thread. A google search for igv java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 brought me here...

I am also observing a java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 error when I switch between reference contigs on some long read data (technically assembly contigs). I have tried igv 2.4.1 and 2.4.5 and the error seems to reproduce with different bam files. Happy to post a minimal input example if this is an unknown class of errors.

@winni2k
Copy link

winni2k commented Jan 14, 2018

It looks like my error is similar to #499

@jrobinso
Copy link
Contributor

Hey all,this was opened as a discussion thread for which it was really useful, but there are many disparate issues here and so it remains perpetually open. I am going to close it, if there is a specific issue not addressed that you think should be please open an issue focused on that, along with steps to reproduce including test data if applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants