Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot_wider() with id_cols of length 0 and unused_fn #698

Closed
Darxor opened this issue Nov 29, 2022 · 3 comments · Fixed by #789
Closed

pivot_wider() with id_cols of length 0 and unused_fn #698

Darxor opened this issue Nov 29, 2022 · 3 comments · Fixed by #789
Labels
feature feature

Comments

@Darxor
Copy link
Contributor

Darxor commented Nov 29, 2022

Currently specifying character(0) / numeric(0) in id_cols leads to behavior similar to NULL (the default), but also produces a warning:

df <- data.frame(
  a   = LETTERS[1:2],
  b   = LETTERS[3:4],
  val = 1:2
)

df |>
  tidytable::pivot_wider(
    id_cols = character(0),
    names_from = a,
    values_from = val
  )
#> Warning in `[.data.table`(~.df, , `:=`(., NULL)): Column '.' does not exist to
#> remove
#> # A tidytable: 2 × 3
#>   b         A     B
#>   <chr> <int> <int>
#> 1 C         1    NA
#> 2 D        NA     2

Created on 2022-11-29 with reprex v2.0.2

{tidyr} handles it differently. They apply function passed in the argument unused_fn (default is to omit columns) to all columns not mentioned in id_cols, names_from, or values_from, which leads to this:

df <- data.frame(
  a   = LETTERS[1:2],
  b   = LETTERS[3:4],
  val = 1:2
)

df |>
  tidyr::pivot_wider(
    id_cols = character(0),
    names_from = a,
    values_from = val
  )
#> # A tibble: 1 × 2
#>       A     B
#>   <int> <int>
#> 1     1     2

Created on 2022-11-29 with reprex v2.0.2

Another note: with NULL in id_cols, {tidyr} considers all columns not mentioned by names_from, or values_from as id_cols, which affects id_expand (currently also not implemented, but a thing to keep in mind for future).

I can tackle this if you want, though this may lead to a rather big change in how pivot_wider() is coded 😅

@markfairbanks markfairbanks added the feature feature label Nov 29, 2022
@markfairbanks
Copy link
Owner

markfairbanks commented Nov 29, 2022

Interesting - I wasn't aware of this functionality.

I can tackle this if you want, though this may lead to a rather big change in how pivot_wider() is coded

Yeah if you want to take a shot at this feel free to.

I think this is the general approach we need. But let me know if I'm missing anything.

Also - this is sort of pseudo-code below, I don't know if this is exactly how it will work.

Once unused_cols are identified there are two parts:

  1. If is.null(unused_fn) the columns are dropped from the data frame pre-pivoting (relatively straightforward)
  2. If !is.null(unused_fn) we need to aggregate the unused_cols with the unused_fn

For part 2 - wouldn't we basically just need something like this?

(Probably with a conditional if statement)

unused_df <- select(.df, all_of(id_cols), all_of(unused_cols))
unused_df <- summarize(unused_df, across(all_of(unused_cols), unused_fn), .by = all_of(id_cols))

out <- bind_cols(out, unused_df)

# And maybe a relocate step to have the correct column order? Or maybe select?
# out <- select(out, all_of(id_cols), all_of(unused_cols), everything())

@Darxor
Copy link
Contributor Author

Darxor commented Nov 30, 2022

Interesting - I wasn't aware of this functionality.

I think my approach with tidyverse sometimes relies on a lesser-known functionality, as you can see by my issues, haha.

I will look into this, yeah! Sounds about right with that approach, I will have to think about how to identify id_cols and unused_cols correctly (order of operation concerns me a bit). I've ran some tests from {tidyr} that seem related to this, and currently they are not passed.

BTW, I think its also a good idea to port some tests over from tidyverse and look for missing / non-matching / broken functionality between packages.

@markfairbanks
Copy link
Owner

All set - sorry this one took so long to get to!

library(tidytable)

df <- data.frame(
  a   = LETTERS[1:2],
  b   = LETTERS[3:4],
  val = 1:2
)

df %>%
  pivot_wider(
    id_cols = character(0),
    names_from = a,
    values_from = val
  )
#> # A tidytable: 1 × 2
#>       A     B
#>   <int> <int>
#> 1     1     2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants