Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running on customs bed file #15

Closed
rhshah opened this issue Mar 5, 2015 · 10 comments
Closed

Error running on customs bed file #15

rhshah opened this issue Mar 5, 2015 · 10 comments

Comments

@rhshah
Copy link

rhshah commented Mar 5, 2015

Hi I am trying to run ABRA on Tumor Normal Pairs, but I am trying to create a custom bed file by running GATK's FindCoveredIntervals tools. The reason to do this is that we want to run ABRA also on offtarget regions, for better off target variant calling. While doing this I have got this error:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4143
at abra.CompareToReference2.getBaseAsChar(CompareToReference2.java:368)
at abra.CompareToReference2.getSequence(CompareToReference2.java:401)
at abra.ReAligner.cleanAndOutputContigs(ReAligner.java:952)
at abra.ReAligner.alignAndCleanContigs(ReAligner.java:523)
at abra.ReAligner.reAlign(ReAligner.java:188)
at abra.ReAligner.run(ReAligner.java:1306)
at abra.Abra.main(Abra.java:12)

Will appreciate your insights on this error and how to avoid it.

@mozack
Copy link
Owner

mozack commented Mar 5, 2015

Could you please send me your bed file for this sample along with a BAM header (or something that lists the reference sequences and lengths)?

@rhshah
Copy link
Author

rhshah commented Mar 6, 2015

I am not sure how to attach text file here. I tried replying to your email but it failed. Let me know where should i send you those files.

@mozack
Copy link
Owner

mozack commented Mar 6, 2015

Please send to: lmose at unc dot edu

@mozack
Copy link
Owner

mozack commented Mar 6, 2015

Thanks. You've uncovered a bug in our handling of contigs mapping near the ends of chromosomes. I've committed a fix (lightly tested so far) to the head. Unfortunately, the head has some other recent commits that should undergo additional testing. I expect to have a release including this change available sometime next week.

If you need this urgently, applying this same commit to a previous release should work fine. I can help with that if needed.

Here's the change:
b0a2e67

For your specific test case, the problematic contig appears to be mapping near the end of chromosome MT.

Lastly, I noticed from your logs that you are passing in kmer values on the command line. Abra now can automatically calculate appropriate kmer sizes on a per region basis. We see improved results using this approach. Just omit the kmer param if you'd like to give it a try.

@rhshah
Copy link
Author

rhshah commented Mar 6, 2015

Cool Thanks for your quick reply and appreciate the quick fix. I will wait for you to release the new code. Do you have a summary of improvements for your next release. Also do you know if we can make this code work for Amplicon Based datasets where they have fixed start and stop sites.

I know about the automatic size selection. I was testing this for out next release in the pipeline. But will make sure we test it without the k-mer values once you upload the new code.

JUST FYI:
Also We like you thank you for this amazing tool, one of my summer high school student evaluated it last year and here is his poster:
http://www.slideshare.net/rshah7/final-posterhopp

@mozack
Copy link
Owner

mozack commented Mar 6, 2015

Wow, thanks for the feedback!

We don't typically use the fixed start/stop amplicon datasets you've described. If you have a test set that you are able to share I'd be happy to take a look. I am a bit skeptical though as the assembly generally works better with some read complexity across the variant. Are you using this amplicon method for discovery or for validation?

Will put together notes describing the changes in next week's release.

@rhshah
Copy link
Author

rhshah commented Mar 9, 2015

Thanks for working on this. The amplicon data we are using that for discovery of know variation and we are missing some. I agree due to no read complexity it will be hard to do this. I will mail you the scrubbed data and we can go from there.

Thanks,
Ronak

@mozack
Copy link
Owner

mozack commented Mar 13, 2015

Sorry, but the forthcoming release is going to have to slide to next week.

@rhshah
Copy link
Author

rhshah commented Mar 13, 2015

OK, thanks for keeping me in the loop. I am currently trying to gather scrubbed amplicon based data for testing, will update you once I have that.

@mozack
Copy link
Owner

mozack commented Mar 24, 2015

The original bug reported should be resolved in v0.91. Please let me know if you run into any more problems.

@mozack mozack closed this as completed Mar 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants