Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vec_rbind() enforces row names when input is a named list #966

Closed
thomasp85 opened this issue Mar 27, 2020 · 11 comments · Fixed by #1094
Closed

vec_rbind() enforces row names when input is a named list #966

thomasp85 opened this issue Mar 27, 2020 · 11 comments · Fixed by #1094
Assignees
Labels
Milestone

Comments

@thomasp85
Copy link
Member

When passing in a named list of data frames without row names to vec_rbind() the return value is a data frame with row names. This change can potentially trigger a hard-to-debug renaming message, e.g.

dfs <- list(a = data.frame(x = 1), a = data.frame(x = 1))
vctrs::vec_rbind(!!!dfs)
#> New names:
#> * a -> a...1
#> * a -> a...2
#>       x
#> a...1 1
#> a...2 1
@lionel-
Copy link
Member

lionel- commented Mar 28, 2020

Both vec_cbind() and vec_rbind() are sensitive to input names, so you need to unname the list.

Would with_name_repair_errors() help finding the source of the repair messages?

@thomasp85
Copy link
Member Author

I understand the reason, but this spills over into dplyr::bind_rows() where you suddenly get overwhelmed with name repair messages even though you pass in tibbles (that doesn’t have row names). I think this will end up confusing a huge amount of users (it confused both me and @hadley for a reasonable amount of time).

I think a lot of people creates tibbles in an lapply call and binds them together. lapply might decide to name the elements and this will sometimes trigger a name repair

@DavisVaughan
Copy link
Member

CRAN dplyr didn't set row names using the input names, and only used them when .id was set.

I imagine we just need to tweak bind_rows() to something like

if (is_null(.id)) {
  names(dots) <- NULL
} else {
  if (!is_string(.id)) {
    bad_args(".id", "must be a scalar string, ", "not {friendly_type_of(.id)} of length {length(.id)}")
  }
  if (!is_named(dots)) {
    names(dots) <- seq_along(dots)
  }
}

@thomasp85
Copy link
Member Author

If this is fixed in dplyr instead I’m good with it. I opened here at the request of Hadley

@lionel-
Copy link
Member

lionel- commented Mar 28, 2020

Maybe we could support a sentinel for easily zapping the names, that bind_rows() could use.

vec_rbind(.names_to = "") or vec_rbind(.names_to = zap())

@hadley
Copy link
Member

hadley commented Mar 29, 2020

Alternatively, maybe this behaviour could occur only if the data frame already had row names?

This is a case where I worry that applying the vector naming rules to row names might lead us down a painful path.

@lionel-
Copy link
Member

lionel- commented Mar 30, 2020

I think this would be confusing, I'd prefer disabling the rows-naming altogether.

Applying this behaviour in an existing function like bind_cols() can definitely cause problems. In a new function this is less clear. It might actually be positive. Since vec_cbind() is also sensitive to input names, it is good practice to think about names when splicing a list in one of the bind functions.

@hadley
Copy link
Member

hadley commented Mar 30, 2020

Why aren't these the same case?

vctrs::vec_c(a = c(1, 2, 3))
#> Error: Can't merge the outer name `a` with a vector of length > 1.
#> Please supply a `.name_spec` specification.

vctrs::vec_rbind(a = data.frame(1:3))
#>    X1.3
#> a1    1
#> a2    2
#> a3    3

Created on 2020-03-30 by the reprex package (v0.3.0)

@lionel-
Copy link
Member

lionel- commented Mar 30, 2020

It's a good idea to make these consistent. This wouldn't help with Thomas' example with 1-row data frames though.

@hadley
Copy link
Member

hadley commented Mar 30, 2020

For @thomasp85's original problem, just improving the name repair would help:

library(vctrs)

vec_rbind(
  `1` = data.frame(x = 1:11),
  `11` = data.frame(x = 1)
)
#> New names:
#> * `11` -> `11...1`
#> * `11` -> `11...12`
#>          x
#> 11...1   1
#> 12       2
#> 13       3
#> 14       4
#> 15       5
#> 16       6
#> 17       7
#> 18       8
#> 19       9
#> 110     10
#> 111     11
#> 11...12  1

Created on 2020-03-30 by the reprex package (v0.3.0)

But I really do think it should be harder to opt into row names

@lionel-
Copy link
Member

lionel- commented Mar 30, 2020

Making the sentinel zap() the default of .names_to and adding a .name_spec parameter would solve the issue.

lionel- added a commit to lionel-/vctrs that referenced this issue Apr 16, 2020
lionel- added a commit to lionel-/vctrs that referenced this issue Apr 16, 2020
@lionel- lionel- self-assigned this Apr 16, 2020
lionel- added a commit that referenced this issue Apr 16, 2020
@lionel- lionel- added this to the 0.3.0 milestone Apr 20, 2020
@lionel- lionel- added the ui label Apr 20, 2020
lionel- added a commit to lionel-/vctrs that referenced this issue May 6, 2020
For consistency with `vec_c()`.
Closes r-lib#966.
lionel- added a commit to lionel-/vctrs that referenced this issue May 7, 2020
For consistency with `vec_c()`.
Closes r-lib#966.
lionel- added a commit that referenced this issue May 7, 2020
For consistency with `vec_c()`.
Closes #966.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants