Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createNormalDatabase returns error if bed has overlapping intervals #28

Closed
andrewrech opened this issue May 7, 2018 · 9 comments
Closed
Assignees
Labels

Comments

@andrewrech
Copy link
Contributor

andrewrech commented May 7, 2018

Hello Marcus,

I note that if and only if an input .bed file has overlapping intervals:

WARN [2018-05-06 21:03:02] Coverage data contains single nucleotide intervals.
WARN [2018-05-06 21:03:02] Found 2 overlapping intervals, starting at line 5352.
WARN [2018-05-06 21:03:03] Coverage data contains single nucleotide intervals.
WARN [2018-05-06 21:03:03] Found 2 overlapping intervals, starting at line 5352.

createNormalDatabase returns an error due to

split(x$average.coverage, as.character(seqnames(x)))

in PureCN:::getSexFromCoverage:

Error in split.default(x$average.coverage, as.character(seqnames(x))) :
  first argument must be a vector
Calls: <Anonymous> -> sapply -> lapply -> FUN -> split -> split.default

I am using the latest dev version of the package on the latest R. I have created about 40 normal databases without error if the overlapping intervals message does not appear.

Minimun reproducible example:

PureCN::createNormalDatabase(c("file1", "file2"))

Should I pre-filter these, or perhaps add a tryCatch here?

Thank you very much,

Andrew

@lima1 lima1 self-assigned this May 7, 2018
@lima1 lima1 added the bug label May 7, 2018
@lima1
Copy link
Owner

lima1 commented May 7, 2018

Hi Andrew,

I thought I have fixed this, but somehow you have single base intervals in your interval file:

Target total_coverage counts on_target duplication_rate
27949 chr6:47020497 NA NA TRUE 0
30730 chr7:143720806 NA NA TRUE 0
31123 chr7:13910515 NA NA TRUE 0
64593 chr17:75149091 NA NA TRUE 0
67551 chr19:38394361 NA NA TRUE 0
74053 chr22:40860678 NA NA TRUE 0

It's usually much better to use the baits, not the targets file. In the baits file, you should never have such small intervals for which the GC normalization won't work.

Also, I would only add baits from standard chromosomes (chr1-chrY).

@andrewrech
Copy link
Contributor Author

Thank you so much for your fast reply and advice.

I am confused about the difference between bait and target intervals. Bait intervals in Picard nomenclature ~= what Illumnia calls "probe" intervals?

Thank you

@lima1
Copy link
Owner

lima1 commented May 7, 2018

Yep, there are usually two BED files for every capture kit. The first is the location of the actual baits or probes, the second the targets (the exact locations of targeted exons, promotors etc.). Baits are better, because they should give a more even coverage distrbution and a better GC-normalization for short targets.

The documentation is confusing here, I'll clean this up as soon as possible.

It's still a bug, it shouldn't crash with single nucleotide intervals.

@andrewrech
Copy link
Contributor Author

Thank you for the super clear explanation!

@lima1
Copy link
Owner

lima1 commented May 7, 2018

Was this interval file generated by preprocessIntervals? I'm trying to reproduce, but I get something like this:

Target gc_bias mappability reptiming Gene on_target
seq1:101-101 0 NA NA . TRUE

not the crashing

Target gc_bias mappability reptiming Gene on_target
seq1:101 0 NA NA . TRUE

@andrewrech
Copy link
Contributor Author

Yes, it was, however possibly with an older version of the package. I also cannot reproduce this now.

@andrewrech
Copy link
Contributor Author

I just grepped these lines out of my intervals, so don't fix on my behalf at this point.

@lima1
Copy link
Owner

lima1 commented May 7, 2018

Okay, yes, I think fixed this in 1.7 something. 1.9. should also remove the KIR alt contigs (I use the hg38 version without ALTs).

@andrewrech
Copy link
Contributor Author

OK thanks. I somehow regressed to an older package version with the move to R 3.5, but I am on dev now.

Sorry for the trouble, thanks again, please feel free to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants