Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lbl_df() incompatible with dev version of tibble #3

Closed
krlmlr opened this issue Nov 2, 2017 · 19 comments
Closed

lbl_df() incompatible with dev version of tibble #3

krlmlr opened this issue Nov 2, 2017 · 19 comments

Comments

@krlmlr
Copy link

krlmlr commented Nov 2, 2017

I'm seeing check failures with sjlabelled: https://github.com/tidyverse/tibble/blob/e17970597fe63132c42229c22a6c06d653637351/revdep/problems.md#sjlabelled.

tibble is now using pillar to format the columns, this changes the internals of the return value for trunc_mat(). It seems that print.lbl_df() is relying on these internals.

I'd like to suggest that we work on a better interface in pillar to integrate additional header information in r-lib/pillar#49. So far I was lacking a use case, but labeled data frames seem like a good fit.

@strengejacke
Copy link
Owner

strengejacke commented Nov 2, 2017

Ok, so I would remove the lbl_df() functions (and print.lbl_df() from sjlabelled.

Some users suggested this "extension" to the print-method, because they were interested in seeing variable labels when printing (labelled) data frames... But I'm happy with handling over this print-feature to your package. :-)

@strengejacke
Copy link
Owner

Ok, closed by 8f04720

@krlmlr
Copy link
Author

krlmlr commented Nov 3, 2017

Thanks for the quick response. This should do for the CRAN release. However, I'd prefer not to integrate labeled columns in tibbles, but keep them available via sjlabelled.

So, one quick way to handle this is to override type_sum() before handing over to print.tbl_df(). Something like this:

format.lbl_df <- function(.....) {
  x[] <- map(x, add_type_sum_override_class)
  NextMethod()
}

add_override_type_sum_class <- function(x) {
  class(x) <- c("override_type_sum", class(x))
  x
}

type_sum.override_type_sum <- function(x) {
  attr(x, "label")
}

Note that this will keep working with the current tibble implementation, and the width computation is still correct (not sure about the current implementation). Caveat: you get to keep the <> around the label.

The next level would be a custom column header. We don't have callbacks for this in pillar yet, but I'm happy to provide them. Same idea, but you're overriding a different method. This gives support for multiline headers and even more flexibility, e.g., we could support indicating a minimum width for the data, or word wrapping.

@strengejacke
Copy link
Owner

Ok, I implemented your suggestion, but I get an error when converting the numeric vectors to factors (using the current GitHub versions of tibble and pillar):

18. set_width(ret, width = nchar(type, type = "width") + 2L) 
17. pillar_type(x, ...) 
16. (function (x, title = NULL, width = NULL, ...) 
{
    title <- pillar_title(title, ...)
    type <- pillar_type(x, ...) ... 
15. mapply(FUN = f, ..., SIMPLIFY = FALSE) 
14. Map(.f, .x, .y, ...) 
13. map2(x, names(x), pillar) 
12. pillar::colonnade(df, has_row_id = if (star) "*" else TRUE, needs_dots = needs_dots) 
11. shrink_mat(df, width, rows, n, star = has_rownames(x)) 
10. trunc_mat(x, n = n, width = width, n_extra = n_extra) 
9. format.tbl_df(x, ..., n = n, width = width, n_extra = n_extra) 
8. NextMethod() at lbl_df.R#40
7. format.lbl_df(x, ..., n = n, width = width, n_extra = n_extra) 
6. format(x, ..., n = n, width = width, n_extra = n_extra) 
5. paste0(..., "\n") 
4. cat(paste0(..., "\n"), sep = "") 
3. cat_line(format(x, ..., n = n, width = width, n_extra = n_extra)) 
2. print.tbl_df(x) 
1. function (x, ...) 
UseMethod("print")(x) 

I'll commit my changes tonight and provide some more information, including reprex.

@strengejacke strengejacke reopened this Nov 3, 2017
@strengejacke
Copy link
Owner

strengejacke commented Nov 3, 2017

My current code:

#' @export
lbl_df <- function(x) {
  # add class attribute, if necessary
  if (!"lbl_df" %in% class(x)) class(x) <- c("lbl_df", class(x))

  x
}

#' @importFrom purrr map
#' @export
format.lbl_df <- function(x, ..., n = NULL, width = NULL, n_extra = NULL) {
  x[] <- purrr::map(x, label_type_sum)
  NextMethod()
}

label_type_sum <- function(x) {
  class(x) <- c("label_type_sum", class(x))
  x
}

#' @export
type_sum.label_type_sum <- function(x) {
  attr(x, "label")
}

The first example works with lbl_df() but does not print labels in the header. The last example, when applying lbl_df() on factors, gives an error.

library(tidyverse)
library(strengejacke)
data(efc)

efc %>%
  select(e15relat, e16sex, e17age) %>%
  as_tibble() %>%
  set_label(c("Relationship", "Elder's gender", "Elder's age")) %>%
  lbl_df()
#> # A tibble: 908 x 3
#>    e15relat e16sex e17age
#>       <dbl>  <dbl>  <dbl>
#>  1     2.00   2.00   83.0
#>  2     2.00   2.00   88.0
#>  3     1.00   2.00   82.0
#>  4     1.00   2.00   67.0
#>  5     2.00   2.00   84.0
#>  6     2.00   2.00   85.0
#>  7     1.00   1.00   74.0
#>  8     4.00   2.00   87.0
#>  9     2.00   2.00   79.0
#> 10     2.00   2.00   83.0
#> # ... with 898 more rows

efc %>%
  select(e15relat, e16sex, e17age) %>%
  set_label(c("Relationship", "Elder's gender", "Elder's age")) %>%
  as_label()
#> # A tibble: 908 x 3
#>    e15relat                e16sex e17age
#>    <fctr>                  <fctr>  <dbl>
#>  1 child                   female   83.0
#>  2 child                   female   88.0
#>  3 spouse/partner          female   82.0
#>  4 spouse/partner          female   67.0
#>  5 child                   female   84.0
#>  6 child                   female   85.0
#>  7 spouse/partner          male     74.0
#>  8 daughter or son -in-law female   87.0
#>  9 child                   female   79.0
#> 10 child                   female   83.0
#> # ... with 898 more rows

efc %>%
  select(e15relat, e16sex, e17age) %>%
  set_label(c("Relationship", "Elder's gender", "Elder's age")) %>%
  as_label() %>% 
  lbl_df()
#> Error in if (is.infinite(width)) {: argument is of length zero

efc %>%
  select(e15relat, e16sex, e17age) %>%
  set_label(c("Relationship", "Elder's gender", "Elder's age")) %>%
  as_label() %>% 
  str()
#> Classes 'tbl_df', 'tbl' and 'data.frame':    908 obs. of  3 variables:
#>  $ e15relat: Factor w/ 8 levels "spouse/partner",..: 2 2 1 1 2 2 1 4 2 2 ...
#>   ..- attr(*, "label")= Named chr "Relationship"
#>   .. ..- attr(*, "names")= chr "e15relat"
#>  $ e16sex  : Factor w/ 2 levels "male","female": 2 2 2 2 2 2 1 2 2 2 ...
#>   ..- attr(*, "label")= Named chr "Elder's gender"
#>   .. ..- attr(*, "names")= chr "e16sex"
#>  $ e17age  : atomic  83 88 82 67 84 85 74 87 79 83 ...
#>   ..- attr(*, "label")= Named chr "Elder's age"
#>   .. ..- attr(*, "names")= chr "e17age"

@krlmlr
Copy link
Author

krlmlr commented Nov 3, 2017

The code looks correct, the error is most likely from pillar which supports width = Inf. Can you please show a traceback?

@strengejacke
Copy link
Owner

The traceback is here: #3 (comment)
Or did you mean something else?

@krlmlr
Copy link
Author

krlmlr commented Nov 3, 2017

I'd like to see a traceback for Error in if (is.infinite(width)) .

@strengejacke
Copy link
Owner

Yes, that is the related traceback for this error.

@krlmlr
Copy link
Author

krlmlr commented Nov 3, 2017

I got confused, can you please retry the most recent version of pillar?

strengejacke added a commit that referenced this issue Nov 3, 2017
@strengejacke
Copy link
Owner

Doesn't seem to solve the issue.

My current pkg:
tibble * 1.3.4.9002 2017-11-03 Github (tidyverse/tibble@15f9fed)
pillar 0.0.0.9000 2017-11-03 Github (r-lib/pillar@8418d3c)

I have commited the latest changes in sjlabelled via browser, so you might install via GitHub.

@strengejacke
Copy link
Owner

strengejacke commented Nov 3, 2017

I restarted RStudio / R and now get following error:

Error: Can't convert NULL to a character vector

> traceback()
22: stop(cnd)
21: abort(paste0("Can't convert ", x_type, " to ", to_type))
20: abort_coercion(.x, .to)
19: coerce_type_vec(x, friendly_type("character"), string = , character = set_chr_encoding(set_attrs(x, 
        NULL), encoding))
18: as_character(type_sum(x))
17: pillar_type(x, ...)
16: (function (x, title = NULL, width = NULL, ...) 
    {
        title <- pillar_title(title, ...)
        type <- pillar_type(x, ...)
        data <- pillar_shaft(x, ...)
        ret <- structure(list(title = title, type = type, data = data), 
            class = "pillar")
        ret <- set_width(ret, width)
        ret
    })(dots[[1L]][[1L]], dots[[2L]][[1L]])
15: mapply(FUN = f, ..., SIMPLIFY = FALSE)
14: Map(.f, .x, .y, ...)
13: map2(x, names(x), pillar)
12: pillar::colonnade(df, has_row_id = if (star) "*" else TRUE, needs_dots = needs_dots)
11: shrink_mat(df, width, rows, n, star = has_rownames(x))
10: trunc_mat(x, n = n, width = width, n_extra = n_extra)
9: format.tbl_df(x, ..., n = n, width = width, n_extra = n_extra)
8: NextMethod() at lbl_df.R#40
7: format.lbl_df(x, ..., n = n, width = width, n_extra = n_extra)
6: format(x, ..., n = n, width = width, n_extra = n_extra)
5: paste0(..., "\n")
4: cat(paste0(..., "\n"), sep = "")
3: cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
2: print.tbl_df(x)
1: function (x, ...) 
   UseMethod("print")(x)

@krlmlr
Copy link
Author

krlmlr commented Nov 3, 2017

Thanks. Can you please reinstall pillar, restart RStudio and retry?

@krlmlr
Copy link
Author

krlmlr commented Nov 3, 2017

Note that the error is most likely caused by type_sum() returning NULL. My fixes in pillar make sure it doesn't crash, but you may need to adapt your implementation.

@strengejacke
Copy link
Owner

Works fine now! I haven't adapted anything yet. I just wonder, why are there no values inside <> for the first two variables, although these are factors?

library(tidyverse)
library(strengejacke)

data(efc)
efc %>%
  select(e15relat, e16sex, e17age) %>%
  as_label() %>%
  set_label(c("Relationship", "Elder's gender", "Elder's age")) %>%
  lbl_df()
#> # A tibble: 908 x 3
#>    e15relat                e16sex e17age
#>    <>                      <>      <dbl>
#>  1 child                   female   83.0
#>  2 child                   female   88.0
#>  3 spouse/partner          female   82.0
#>  4 spouse/partner          female   67.0
#>  5 child                   female   84.0
#>  6 child                   female   85.0
#>  7 spouse/partner          male     74.0
#>  8 daughter or son -in-law female   87.0
#>  9 child                   female   79.0
#> 10 child                   female   83.0
#> # ... with 898 more rows

The attributes are there, so my class-type_sum does not return NULL:

efc %>%
  select(e15relat, e16sex, e17age) %>%
  as_label() %>%
  set_label(c("Relationship", "Elder's gender", "Elder's age")) %>%
  map(~ attr(.x, "label"))
#> $e15relat
#>       e15relat 
#> "Relationship" 
#> 
#> $e16sex
#>           e16sex 
#> "Elder's gender" 
#> 
#> $e17age
#>        e17age 
#> "Elder's age"

So, will it be possible to print labels when using the lbl_df-class? And why is it currently that there are empty <> for factors?

@strengejacke
Copy link
Owner

strengejacke commented Nov 3, 2017

Get attributes lost somewhere before printing?

@strengejacke
Copy link
Owner

I set a breakpoint into my type_sum.label_type_sum(), and just saw that vectors are w/o attributes:

type_sum.label_type_sum <- function(x) {
  lab <- attr(x, "label")

  if (is.null(lab))
    "no label"
  else
    lab
}
Browse[2]> str(x)
 Factor w/ 8 levels "spouse/partner",..: 2 2 1 1 2 2 1 4 2 2

@krlmlr
Copy link
Author

krlmlr commented Nov 3, 2017

They may be lost during the call to head(). Can you try overriding head.lbl_df() to make sure that the attributes stay put?

@strengejacke
Copy link
Owner

Thanks, works now!

library(tidyverse)
library(strengejacke)

data(efc)
efc %>%
  select(e15relat, e16sex, e17age) %>%
  as_label() %>%
  set_label(c("Relationship", "Elder's gender", "Elder's age")) %>%
  lbl_df()
#> # A tibble: 908 x 3
#>    e15relat                e16sex                  e17age
#>    <Relationship>          <Elder's gender> <Elder's age>
#>  1 child                   female                    83.0
#>  2 child                   female                    88.0
#>  3 spouse/partner          female                    82.0
#>  4 spouse/partner          female                    67.0
#>  5 child                   female                    84.0
#>  6 child                   female                    85.0
#>  7 spouse/partner          male                      74.0
#>  8 daughter or son -in-law female                    87.0
#>  9 child                   female                    79.0
#> 10 child                   female                    83.0
#> # ... with 898 more rows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants