Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

left joins fail on granges if right has no mcols and some left ranges do not overlap #70

Closed
ripkrizbi opened this issue Oct 29, 2019 · 2 comments

Comments

@ripkrizbi
Copy link

Hi, here is a quick example to demonstrate:

# gr0 has ranges on chr5 while gr1 doesn't
gr0 <- GRanges(Rle(c("chr2", "chr5", "chr1", "chr3"), c(1, 3, 2, 4)),
                IRanges(1:10, width=10:1))

gr1 <- GRanges(Rle(c("chr2", "chr3", "chr1", "chr3"), c(1, 3, 2, 4)),
                IRanges(1:10, width=10:1))

# left join fails
gr0 %>% join_overlap_left(gr1)

## Error in DataFrame(..., check.names = FALSE) : 
##   different row counts implied by arguments

# add some mcols to rhs
gr1$AAA <- 1

# and it starts to work :)
gr0 %>% join_overlap_left(gr1)

## GRanges object with 36 ranges and 1 metadata column:
##        seqnames    ranges strand |       AAA
##           <Rle> <IRanges>  <Rle> | <numeric>
##    [1]     chr2      1-10      * |         1
##    [2]     chr5      2-10      * |      <NA>
##    [3]     chr5      3-10      * |      <NA>
##    [4]     chr5      4-10      * |      <NA>
##    [5]     chr1      5-10      * |         1
##    ...      ...       ...    ... .       ...
##   [32]     chr3        10      * |         1
##   [33]     chr3        10      * |         1
##   [34]     chr3        10      * |         1
##   [35]     chr3        10      * |         1
##   [36]     chr3        10      * |         1
##   -------
##   seqinfo: 5 sequences from an unspecified genome; no seqlengths

I have traced the issue down to the piece of code that tries to construct mcols for outer matches, but fails if there are no mcols to start with and instead of generating a DataFrame of the appropriate number of rows (and zero cols), it always seems to generate a 0,0 DataFrame, which then fails to merge with (another empty) mcols frame on the right. If there are some metadata in the right ranges, then all works without a glitch.

https://github.com/sa-lee/plyranges/blob/da3c4f40292ac3a4cd0ad9c4b383ca9ab0a3db91/R/ranges-overlap-joins-outer.R#L44

@sa-lee
Copy link
Collaborator

sa-lee commented Oct 29, 2019

Thanks for the report - will get around to this in the next week or so. In the meantime you are more then welcome to contribute a PR that fixes this issue :)

@sa-lee
Copy link
Collaborator

sa-lee commented Nov 6, 2019

This should be fixed now. I've pushed the changes to both release and devel branches on Bioconductor.

@sa-lee sa-lee closed this as completed Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants