Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

factor values in pivot_longer.() are converted to character #202

Closed
marianschmidt opened this issue Mar 2, 2021 · 9 comments
Closed

factor values in pivot_longer.() are converted to character #202

marianschmidt opened this issue Mar 2, 2021 · 9 comments
Labels
feature feature

Comments

@marianschmidt
Copy link

First, great work on the implementation of .value in pivot_longer.(). Even with pivot_fast=FALSE, performance is great.

One last tiny bit of improvement I would suggest, is that the function should preserve original column types. In the example below, df$color column should be a factor. This is maintained by pivot_wider.() [so wide_df$color.1, wide_df$color.1 etc. are factors using the default setting]. But using pivot_longer(). the df$color column is modified to character.

Also, I am not sure if option fast_pivot is working as intended or maybe just the description is not correct. In the case below where names_to = c(".value", "nr_of_bike"), pivot_fast=TRUE only makes "nr_of_bike" a factor, but not the ".value" columns.

For reference, I have also added the tidyr::pivot_longer() code to show that tidyverse preserves column types with default settings.

#reprex tidytable pivot_longer.()

library(tidytable, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)
set.seed(2048)

rows <- 100
ids <- 50
#simple data set with many different IDs and 1M rows, 3 cols
df <- tibble(id = as.character(sample(1:ids, size = rows, replace = TRUE)), #using character variable as ID
                bike = sample(c("mountain", "allround", "road", "bmx", NA_character_), size = rows, replace = TRUE),
                year = sample(1980:2020, size = rows, replace = TRUE),
             color = factor(sample(c("silver", "green", "blue", NA_character_), size = rows, replace = TRUE))) %>%
  #calculate a chronological counter for bike per id
  tidytable::arrange.(id, year) %>%
  #calculate new renumbered variable group by case_id_var
  tidytable::mutate.(nr_of_bike = as.integer(tidytable::row_number.()), .by = id)
#by creating one line per id and repeat all vars nr_of_bike times. New vars have .nr as suffix

### get names from df to provide variables that need to be transposed to pivot function
trans_vars <- names(df)[!names(df) %in% c("id", "nr_of_bike")]


### perform pivot_wider
wide_df <- df %>% 
  tidytable::pivot_wider.(
  names_from = "nr_of_bike", 
  values_from = tidyselect::all_of(trans_vars),
  names_sep = "."
)

### show wide_df as result
wide_df
#> # tidytable [45 x 16]
#>    id    bike.1 bike.2 bike.3 bike.4 bike.5 year.1 year.2 year.3 year.4 year.5
#>    <chr> <chr>  <chr>  <chr>  <chr>  <chr>   <int>  <int>  <int>  <int>  <int>
#>  1 1     road   allro~ mount~ <NA>   <NA>     1984   1990   1993     NA     NA
#>  2 10    road   allro~ <NA>   <NA>   <NA>     1982   1987   2009     NA     NA
#>  3 11    road   road   bmx    mount~ <NA>     1983   2000   2003   2006     NA
#>  4 12    allro~ mount~ bmx    <NA>   <NA>     1984   1988   2018     NA     NA
#>  5 13    allro~ <NA>   <NA>   <NA>   <NA>     1990     NA     NA     NA     NA
#>  6 14    allro~ <NA>   <NA>   <NA>   <NA>     1983     NA     NA     NA     NA
#>  7 15    mount~ <NA>   <NA>   <NA>   <NA>     2004     NA     NA     NA     NA
#>  8 16    bmx    <NA>   <NA>   <NA>   <NA>     1999     NA     NA     NA     NA
#>  9 17    mount~ road   allro~ <NA>   <NA>     2004   2006   2012     NA     NA
#> 10 18    bmx    <NA>   <NA>   <NA>   <NA>     1983     NA     NA     NA     NA
#> # ... with 35 more rows, and 5 more variables: color.1 <fct>, color.2 <fct>,
#> #   color.3 <fct>, color.4 <fct>, color.5 <fct>

#pivot_wider.() preserves the original column type of color as a vector
### get variable names for pivot_longer (all variables that have a number suffix after the dot)
varying_vars <- colnames(wide_df) %>% stringr::str_subset(.,
                                                          paste0("\\.", "(?=[:digit:]$|(?=[:digit:](?=[:digit:]$))|(?=N(?=A$)))"))
constant_vars <- colnames(wide_df)[!colnames(wide_df) %in% c(varying_vars)]


### perform tidyr::pivot_longer()
wide_df %>% tidyr::pivot_longer(
  -c(tidyselect::all_of(constant_vars)),
  names_to = c(".value", "nr_of_bike"),
  names_pattern = "(.*)\\.(.*)",
  values_drop_na = TRUE
)
#> # A tibble: 100 x 5
#>    id    nr_of_bike bike      year color 
#>    <chr> <chr>      <chr>    <int> <fct> 
#>  1 1     1          road      1984 green 
#>  2 1     2          allround  1990 blue  
#>  3 1     3          mountain  1993 silver
#>  4 10    1          road      1982 green 
#>  5 10    2          allround  1987 silver
#>  6 10    3          <NA>      2009 blue  
#>  7 11    1          road      1983 <NA>  
#>  8 11    2          road      2000 green 
#>  9 11    3          bmx       2003 silver
#> 10 11    4          mountain  2006 blue  
#> # ... with 90 more rows
#tidyr preserves column type of "color" when pivoting longer


### perform tidytable::pivot_longer.()
wide_df %>% tidytable::pivot_longer.(
  -c(tidyselect::all_of(constant_vars)),
  names_to = c(".value", "nr_of_bike"),
  names_pattern = "(.*)\\.(.*)",
  values_drop_na = TRUE,
  fast_pivot = FALSE
) %>%
  #sort by id and nr_of_bike
  arrange.(id, nr_of_bike)
#> # tidytable [100 x 5]
#>    id    nr_of_bike bike      year color 
#>    <chr> <chr>      <chr>    <int> <chr> 
#>  1 1     1          road      1984 green 
#>  2 1     2          allround  1990 blue  
#>  3 1     3          mountain  1993 silver
#>  4 10    1          road      1982 green 
#>  5 10    2          allround  1987 silver
#>  6 10    3          <NA>      2009 blue  
#>  7 11    1          road      1983 <NA>  
#>  8 11    2          road      2000 green 
#>  9 11    3          bmx       2003 silver
#> 10 11    4          mountain  2006 blue  
#> # ... with 90 more rows

#tidytable changes column type of "color" when pivoting longer

Created on 2021-03-02 by the reprex package (v1.0.0)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.3 (2020-10-10)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2021-03-02                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source                                  
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)                          
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.0.3)                          
#>  broom         0.7.5   2021-02-19 [1] CRAN (R 4.0.3)                          
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.0.0)                          
#>  cli           2.3.1   2021-02-23 [1] CRAN (R 4.0.3)                          
#>  colorspace    2.0-0   2020-11-11 [1] CRAN (R 4.0.3)                          
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.3)                          
#>  data.table    1.14.0  2021-02-21 [1] CRAN (R 4.0.3)                          
#>  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.0.3)                          
#>  dbplyr        2.1.0   2021-02-03 [1] CRAN (R 4.0.3)                          
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.3)                          
#>  dplyr       * 1.0.4   2021-02-02 [1] CRAN (R 4.0.3)                          
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.0)                          
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.0)                          
#>  fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.3)                          
#>  forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.0.3)                          
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)                          
#>  generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.3)                          
#>  ggplot2     * 3.3.3   2020-12-30 [1] CRAN (R 4.0.3)                          
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)                          
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.0.0)                          
#>  haven         2.3.1   2020-06-01 [1] CRAN (R 4.0.0)                          
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.0)                          
#>  hms           1.0.0   2021-01-13 [1] CRAN (R 4.0.3)                          
#>  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)                          
#>  httr          1.4.2   2020-07-20 [1] CRAN (R 4.0.2)                          
#>  jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.0.3)                          
#>  knitr         1.31    2021-01-27 [1] CRAN (R 4.0.3)                          
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.3)                          
#>  lubridate     1.7.9.2 2020-11-13 [1] CRAN (R 4.0.3)                          
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.3)                          
#>  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.0.0)                          
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.0.0)                          
#>  pillar        1.5.0   2021-02-22 [1] CRAN (R 4.0.3)                          
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)                          
#>  ps            1.5.0   2020-12-05 [1] CRAN (R 4.0.3)                          
#>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.0.0)                          
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.3)                          
#>  Rcpp          1.0.6   2021-01-15 [1] CRAN (R 4.0.3)                          
#>  readr       * 1.4.0   2020-10-05 [1] CRAN (R 4.0.2)                          
#>  readxl        1.3.1   2019-03-13 [1] CRAN (R 4.0.0)                          
#>  reprex        1.0.0   2021-01-27 [1] CRAN (R 4.0.3)                          
#>  rlang         0.4.10  2020-12-30 [1] CRAN (R 4.0.3)                          
#>  rmarkdown     2.7     2021-02-19 [1] CRAN (R 4.0.3)                          
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.3)                          
#>  rvest         0.3.6   2020-07-25 [1] CRAN (R 4.0.2)                          
#>  scales        1.1.1   2020-05-11 [1] CRAN (R 4.0.0)                          
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)                          
#>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)                          
#>  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.0.0)                          
#>  tibble      * 3.1.0   2021-02-25 [1] CRAN (R 4.0.3)                          
#>  tidyr       * 1.1.2   2020-08-27 [1] CRAN (R 4.0.2)                          
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.0)                          
#>  tidytable   * 0.5.8.9 2021-03-02 [1] Github (markfairbanks/tidytable@7ab2e20)
#>  tidyverse   * 1.3.0   2019-11-21 [1] CRAN (R 4.0.0)                          
#>  utf8          1.1.4   2018-05-24 [1] CRAN (R 4.0.0)                          
#>  vctrs         0.3.6   2020-12-17 [1] CRAN (R 4.0.3)                          
#>  withr         2.4.1   2021-01-26 [1] CRAN (R 4.0.3)                          
#>  xfun          0.21    2021-02-10 [1] CRAN (R 4.0.3)                          
#>  xml2          1.3.2   2020-04-23 [1] CRAN (R 4.0.0)                          
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.0)                          
#> 
#> [1] C:/Users/ga27jar/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.3/library
@markfairbanks
Copy link
Owner

markfairbanks commented Mar 2, 2021

Good catch on this one - honestly it's odd that data.table::melt() doesn't preserve types. I'll have to come up with a workaround on this one.

data.table::melt() has two options:

  • variable.factor (defaults to TRUE)
    • This is what fast_pivot sets to TRUE. So the description is correct.
  • value.factor (defaults to FALSE)

You would think that data.table would preserve types, but the value.factor option seem to override the default types of the dataset.

@markfairbanks
Copy link
Owner

markfairbanks commented Mar 2, 2021

Smaller reprex:

pacman::p_load(tidytable, tidyverse, data.table)

test_df <- tidytable(a = factor("a"), b = factor("b"))

# tidyr
test_df %>%
  pivot_longer(everything())
#> # A tibble: 2 x 2
#>   name  value
#>   <chr> <fct>
#> 1 a     a    
#> 2 b     b

# tidytable
test_df %>%
  pivot_longer.()
#> # tidytable [2 × 2]
#>   name  value
#>   <chr> <chr>
#> 1 a     a    
#> 2 b     b

For comparison:

pacman::p_load(tidytable, tidyverse, data.table)

test_df <- tidytable(a = factor("a"), b = factor("b"))

# data.table
test_df %>%
  melt(measure.vars = names(test_df), variable.factor = FALSE) %>%
  as_tidytable()
#> # tidytable [2 × 2]
#>   variable value
#>   <chr>    <chr>
#> 1 a        a    
#> 2 b        b

@marianschmidt
Copy link
Author

marianschmidt commented Mar 2, 2021

I think, this issue exists, independent of the fast_pivot option, does it?

BTW, in this complex setting of reshaping from many columns to many columns (550 wide_df columns to reshape to about 140 long_df columns with varying number of rows 10,000-500,000), I observe that fast_pivot=TRUE is not faster than fast_pivot=FALSE. Still everything is much faster than the good old reshape2 package and much less memory consuming (I don't show them below to make the comparison between tidyr, tidytable, tidytable_fast more clear).

3aeb4685-d39a-4f50-b040-736921bf8bbc

@markfairbanks
Copy link
Owner

markfairbanks commented Mar 2, 2021

I think, this issue exists, independent of the fast_pivot option, does it?

Yep, this issue only has to do with the value.factor setting. fast_pivot adjusts variable.factor.

In the case that values are factors you want value.factor = TRUE, in the case that values are characters you want value.factor = FALSE. I'm guessing there's a case where different values can be different types (I might be wrong on this one...).

BTW, in this complex setting of reshaping from many columns to many columns (550 wide_df columns to reshape to about 140 long_df columns with varying number of rows 10,000-500,000), I observe that fast_pivot=TRUE is not faster than fast_pivot=FALSE

I think this option only seems to help with simpler pivoting where there is only one "value" column:

pacman::p_load(tidytable)

test_df <- map_dfc.(letters, ~ tidytable(!!.x := 1:50000))

bench::mark(
  normal = pivot_longer.(test_df),
  fast_pivot = pivot_longer.(test_df, fast_pivot = TRUE),
  check = FALSE
)
#> # A tibble: 2 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 normal      10.43ms  12.13ms      78.5   15.55MB     89.7
#> 2 fast_pivot   2.53ms   3.28ms     246.     9.95MB    121.

@markfairbanks markfairbanks added the feature feature label Mar 5, 2021
@markfairbanks markfairbanks changed the title Preserve column types for .value in pivot_longer.() Preserve column types for .value in pivot_longer.() Mar 19, 2021
@markfairbanks markfairbanks changed the title Preserve column types for .value in pivot_longer.() factor values in pivot_longer.() are converted to character Apr 5, 2021
@markfairbanks
Copy link
Owner

markfairbanks commented Apr 5, 2021

New reprex that pinpoints the issue:

pacman::p_load(tidytable)

fct_df <- tidytable(x = factor("a"), y = factor("b"))
chr_df <- tidytable(x = "a", y = "b")
dbl_df <- tidytable(x = 1, y = 2)

df_list <- list(fct_df, chr_df, dbl_df)

df_list %>%
  map.(pivot_longer.)
#> [[1]]
#> # A tidytable: 2 × 2
#>   name  value
#>   <chr> <chr>
#> 1 x     a    
#> 2 y     b    
#> 
#> [[2]]
#> # A tidytable: 2 × 2
#>   name  value
#>   <chr> <chr>
#> 1 x     a    
#> 2 y     b    
#> 
#> [[3]]
#> # A tidytable: 2 × 2
#>   name  value
#>   <chr> <dbl>
#> 1 x         1
#> 2 y         2

Looks like this only affects the cases where all of the cols selected are factors.

@markfairbanks
Copy link
Owner

Solution - check if the columns are factor and adjust value.factor in data.table::melt().

Spot check of the time cost:

pacman::p_load(tidytable)

df_names <- c(letters, LETTERS, paste0(letters, LETTERS))

test_df <- df_names %>%
  map_dfc.(~ tidytable(!!.x := sample(as.factor(letters), 1000000, TRUE)))

factor_check <- function(data, cols) {
  all_names <- names(data)
  
  fct_flag <- map_lgl.(test_df, is.factor)
  names(fct_flag) <- all_names
  
  values_factor <- all(fct_flag[cols])
  
  values_factor
}

bench::mark(
  type_check = factor_check(test_df, df_names[1:50]),
  iterations = 50
)
#> # A tibble: 1 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 type_check    107µs    116µs     4273.    12.6KB        0

@markfairbanks
Copy link
Owner

All set:

pacman::p_load(tidytable)

test_df <- tidytable(a = factor("a"), b = factor("b"))

test_df %>%
  pivot_longer.()
#> # A tidytable: 2 × 2
#>   name  value
#>   <chr> <fct>
#> 1 a     a    
#> 2 b     b

@marianschmidt
Copy link
Author

Hi @markfairbanks ,

I have just installed your latest dev version from github and the reprex on top still results in tidytable::pivot_longer.() converting the factor color columns to a character column.
Additionally, the new dev version re-arranges the columns in the long tidy.table with year now appearing after color.

#reprex tidytable pivot_longer.()

library(tidytable, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)
#> Warning: Paket 'tidyr' wurde unter R Version 4.0.4 erstellt
#> Warning: Paket 'dplyr' wurde unter R Version 4.0.4 erstellt
set.seed(2048)

rows <- 100
ids <- 50
#simple data set with many different IDs and 1M rows, 3 cols
df <- tibble(id = as.character(sample(1:ids, size = rows, replace = TRUE)), #using character variable as ID
                bike = sample(c("mountain", "allround", "road", "bmx", NA_character_), size = rows, replace = TRUE),
                year = sample(1980:2020, size = rows, replace = TRUE),
             color = factor(sample(c("silver", "green", "blue", NA_character_), size = rows, replace = TRUE))) %>%
  #calculate a chronological counter for bike per id
  tidytable::arrange.(id, year) %>%
  #calculate new renumbered variable group by case_id_var
  tidytable::mutate.(nr_of_bike = as.integer(tidytable::row_number.()), .by = id)
#by creating one line per id and repeat all vars nr_of_bike times. New vars have .nr as suffix

### get names from df to provide variables that need to be transposed to pivot function
trans_vars <- names(df)[!names(df) %in% c("id", "nr_of_bike")]


### perform pivot_wider
wide_df <- df %>% 
  tidytable::pivot_wider.(
  names_from = "nr_of_bike", 
  values_from = tidyselect::all_of(trans_vars),
  names_sep = "."
)

### show wide_df as result
wide_df
#> # A tidytable: 45 x 16
#>    id    bike.1  bike.2  bike.3 bike.4 bike.5 year.1 year.2 year.3 year.4 year.5
#>    <chr> <chr>   <chr>   <chr>  <chr>  <chr>   <int>  <int>  <int>  <int>  <int>
#>  1 1     road    allrou~ mount~ <NA>   <NA>     1984   1990   1993     NA     NA
#>  2 10    road    allrou~ <NA>   <NA>   <NA>     1982   1987   2009     NA     NA
#>  3 11    road    road    bmx    mount~ <NA>     1983   2000   2003   2006     NA
#>  4 12    allrou~ mounta~ bmx    <NA>   <NA>     1984   1988   2018     NA     NA
#>  5 13    allrou~ <NA>    <NA>   <NA>   <NA>     1990     NA     NA     NA     NA
#>  6 14    allrou~ <NA>    <NA>   <NA>   <NA>     1983     NA     NA     NA     NA
#>  7 15    mounta~ <NA>    <NA>   <NA>   <NA>     2004     NA     NA     NA     NA
#>  8 16    bmx     <NA>    <NA>   <NA>   <NA>     1999     NA     NA     NA     NA
#>  9 17    mounta~ road    allro~ <NA>   <NA>     2004   2006   2012     NA     NA
#> 10 18    bmx     <NA>    <NA>   <NA>   <NA>     1983     NA     NA     NA     NA
#> # ... with 35 more rows, and 5 more variables: color.1 <fct>, color.2 <fct>,
#> #   color.3 <fct>, color.4 <fct>, color.5 <fct>

#pivot_wider.() preserves the original column type of color as a vector
### get variable names for pivot_longer (all variables that have a number suffix after the dot)
varying_vars <- colnames(wide_df) %>% stringr::str_subset(.,
                                                          paste0("\\.", "(?=[:digit:]$|(?=[:digit:](?=[:digit:]$))|(?=N(?=A$)))"))
constant_vars <- colnames(wide_df)[!colnames(wide_df) %in% c(varying_vars)]


### perform tidyr::pivot_longer()
wide_df %>% tidyr::pivot_longer(
  -c(tidyselect::all_of(constant_vars)),
  names_to = c(".value", "nr_of_bike"),
  names_pattern = "(.*)\\.(.*)",
  values_drop_na = TRUE
)
#> # A tibble: 100 x 5
#>    id    nr_of_bike bike      year color 
#>    <chr> <chr>      <chr>    <int> <fct> 
#>  1 1     1          road      1984 green 
#>  2 1     2          allround  1990 blue  
#>  3 1     3          mountain  1993 silver
#>  4 10    1          road      1982 green 
#>  5 10    2          allround  1987 silver
#>  6 10    3          <NA>      2009 blue  
#>  7 11    1          road      1983 <NA>  
#>  8 11    2          road      2000 green 
#>  9 11    3          bmx       2003 silver
#> 10 11    4          mountain  2006 blue  
#> # ... with 90 more rows
#tidyr preserves column type of "color" when pivoting longer


### perform tidytable::pivot_longer.()
wide_df %>% tidytable::pivot_longer.(
  -c(tidyselect::all_of(constant_vars)),
  names_to = c(".value", "nr_of_bike"),
  names_pattern = "(.*)\\.(.*)",
  values_drop_na = TRUE,
  fast_pivot = FALSE
) %>%
  #sort by id and nr_of_bike
  arrange.(id, nr_of_bike)
#> # A tidytable: 100 x 5
#>    id    nr_of_bike bike     color   year
#>    <chr> <chr>      <chr>    <chr>  <int>
#>  1 1     1          road     green   1984
#>  2 1     2          allround blue    1990
#>  3 1     3          mountain silver  1993
#>  4 10    1          road     green   1982
#>  5 10    2          allround silver  1987
#>  6 10    3          <NA>     blue    2009
#>  7 11    1          road     <NA>    1983
#>  8 11    2          road     green   2000
#>  9 11    3          bmx      silver  2003
#> 10 11    4          mountain blue    2006
#> # ... with 90 more rows

#tidytable changes column type of "color" when pivoting longer

Created on 2021-04-07 by the reprex package (v2.0.0)

@markfairbanks
Copy link
Owner

I have just installed your latest dev version from github and the reprex on top still results in tidytable::pivot_longer.() converting the factor color columns to a character column.

Sorry about this, I should have tested back on the original dataset you sent. I see now what I missed.

My initial thought is that this more complex case isn't solvable with data.table. data.table has a pretty simple option setting of value.factor = TRUE or value.factor = FALSE.

So in the case above either all of "bike", "color", and "year" will return as factors, or it will return like it did above (where factor columns are converted to character).

Additionally, the new dev version re-arranges the columns in the long tidy.table with year now appearing after color.

Thanks for catching this, I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature feature
Projects
None yet
Development

No branches or pull requests

2 participants