Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vec_slice() cannot slice 0-column data.frames properly #179

Closed
yutannihilation opened this issue Jan 31, 2019 · 13 comments
Closed

vec_slice() cannot slice 0-column data.frames properly #179

yutannihilation opened this issue Jan 31, 2019 · 13 comments

Comments

@yutannihilation
Copy link
Contributor

yutannihilation commented Jan 31, 2019

library(vctrs)
library(tibble)

vec_slice(tibble(.rows = 3), 1:3)
#> # A tibble: 0 x 0

vec_slice(tibble(x = tibble(.rows = 3)), 1:3)
#> Error: Positive column indexes in `[` must match number of columns:
#> * `.data` has 0 columns
#> * Position 1 equals 1
#> * Position 2 equals 2
#> * Position 3 equals 3
#> Backtrace:
#>     █
#>  1. └─vctrs::vec_slice(tibble(x = tibble(.rows = 3)), 1:3)
#>  2.   └─base::lapply(x, `[`, i)
#>  3.     ├─base:::FUN(X[[i]], ...)
#>  4.     └─tibble:::`[.tbl_df`(X[[i]], ...)
#>  5.       └─tibble:::check_names_df(i, x)
#>  6.         └─tibble:::check_names_df_numeric(j, x)

Created on 2019-01-31 by the reprex package (v0.2.1.9000)

@yutannihilation
Copy link
Contributor Author

It seems rownames are lost here:

vctrs/R/size.R

Lines 108 to 109 in ed62f1e

out <- lapply(x, vec_slice, i)
vec_restore(out, x)

@yutannihilation
Copy link
Contributor Author

yutannihilation commented Feb 7, 2019

To fix this, I feel we have to provide the expected length to vec_restore(). Otherwise it cannot know what length the result is supposed to be.

library(vctrs)

x <- tibble::tibble(.rows = 3)
out <- lapply(x, vec_slice, 1)

# the expected length is 1, but neither of out nor x know it here
vec_restore(out, x)
#> # A tibble: 0 x 0

Created on 2019-02-07 by the reprex package (v0.2.1)

Does this look OK...? if so, I'll make a PR.

vec_restore <- function (x, to, ...) {
    UseMethod("vec_restore", to)
}

vec_restore.data.frame <- function (x, to, n = NULL, ...) {
  ...
  attr_to[["row.names"]] <- .set_row_names(n %||% df_length(x))
  ...
}

@hadley
Copy link
Member

hadley commented Feb 7, 2019

I think if you have to supply arguments, it's a sign that you shouldn't be using vec_restore() — I think I may have used vec_restore() in places (like here) where it's not actually suitable.

@yutannihilation
Copy link
Contributor Author

Hmm..., thanks, I agree it's a bad sign. But, I don't think I "shouldn't be using vec_restore()," because vec_restore() is still useful (as long as it doesn't raise errors) for restoring classes and attributes except for rownames.

Probably, we can add rownames again after vec_restore(). Or, do you think it's better to remove vec_restore() here?

@hadley
Copy link
Member

hadley commented Feb 8, 2019

The more I think about it, the more I think this use of vec_restore() is inappropriate — this clause is just a hack around the slowness of [.data.frame and there's no guarantee that it will work with other data frame subclasses.

I'm not sure what we should do here given that [.tibble is much more efficient, so there's no need for the hack. I suspect we should be testing identical(class(x), "data.frame") and then calling new_data_frame() on the output of the lapply().

Alternatively, this might be resolved by #124, where we'll make vec_slice generic, so it would make more sense to provide our own specialised implementation of row-subsetting a data frame.

@yutannihilation
Copy link
Contributor Author

Thanks, given the current vec_slice() is just a hack for speed, that's convincing. Then I'll just wait for other discussions.

@hadley

This comment has been minimized.

@hadley hadley reopened this Feb 21, 2019
@yutannihilation

This comment has been minimized.

@hadley hadley closed this as completed in 3f8ded6 Feb 26, 2019
@hadley
Copy link
Member

hadley commented Feb 26, 2019

It turns out that the fixes needed for the newly focussed #199, also ended up fixing this issue. I think this is correct because I was just too aggressively throwing away the row names attribute (throwing the baby out with the bath water), but I'd appreciate it if you'd take a look at the patch and see if it makes sense to you too.

@yutannihilation
Copy link
Contributor Author

@hadley Sorry, I don't think this is "fixed." The second one below no more raises an error, but the result seems still incorrect; the dimensions should be 3 x 0, and 3 x 1.

library(vctrs)
library(tibble)

packageDescription("vctrs")["GithubSHA1"]
#> $GithubSHA1
#> [1] "3b80a81418d0f8037f86e875e80e5a73b2744485"

vec_slice(tibble(.rows = 3), 1:3)
#> # A tibble: 0 x 0

vec_slice(tibble(x = tibble(.rows = 3)), 1:3)
#> # A tibble: 0 x 1
#> # ... with 1 variable: x <lgl>

Created on 2019-02-27 by the reprex package (v0.2.1)

@hadley
Copy link
Member

hadley commented Feb 27, 2019

Ok, that's actually reassuring to me 😄

@hadley hadley reopened this Feb 27, 2019
@hadley hadley closed this as completed in 80644d1 Feb 27, 2019
@hadley
Copy link
Member

hadley commented Feb 27, 2019

Ok, I am now pretty sure I have fixed it properly — it's sort of a hack on top of a hack, but if we're going to coerce into a list to avoid expensive [.data.frame calls, then we also need to create the correct row.names attribute. In the future, we'll resolve this more rigorously by implementing vec_slice() in C, and more carefully figuring out what we need to do when x is a data.frame vs. a data frame subclass.

@yutannihilation
Copy link
Contributor Author

Confirmed the fix, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants