Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exact the "output" as a new column from a collateral class column [question] #20

Closed
JauntyJJS opened this issue Mar 13, 2023 · 4 comments

Comments

@JauntyJJS
Copy link

Hello,

I have created a function that tries to clean up text in (dd/mm/yyyy) to dates. I have included a test tibble test_data some bad inputs as well.

library("dplyr")
#> Warning: package 'dplyr' was built under R version 4.2.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library("magrittr")
#> Warning: package 'magrittr' was built under R version 4.2.2
library("purrr")
#> Warning: package 'purrr' was built under R version 4.2.2
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#> 
#>     set_names
library("collateral")
#> Warning: package 'collateral' was built under R version 4.2.2
library("lubridate")
#> Warning: package 'lubridate' was built under R version 4.2.2
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

convert_dmy_text_to_date <- function(input) {
  if(length(class(input)) == 1) {
    if(class(input) == "character") {
      return(as.Date.character(lubridate::dmy(input)))
    } else if(class(input) == "logical") {
      return(NA)
    }
  }
  return(as.Date.character(lubridate::ymd(input)))
}

test_data <- tibble::tibble(
  test_date = list(
    "25/10/2022",
    "620/12/2022",
    as.POSIXct(x= "2022-10-07", tz = "UTC"),
    NA,
    0,
    "28/22/2022"
  )
)

#test <- hk_data$`Date of scan`
test <- test_data %>%
  dplyr::mutate(
    converted_date_log = collateral::map_peacefully(
      .x = .data[["test_date"]],
      .f = convert_dmy_text_to_date
    )
  )

test
#> # A tibble: 6 × 2
#>   test_date  converted_date_log
#>   <list>     <collat>          
#> 1 <chr [1]>  R _ _ _ _         
#> 2 <chr [1]>  R _ _ W _         
#> 3 <dttm [1]> R _ _ _ _         
#> 4 <lgl [1]>  R _ _ _ _         
#> 5 <dbl [1]>  R _ _ W _         
#> 6 <chr [1]>  R _ _ W _

Created on 2023-03-13 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 22621)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2023-03-13
#>  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
#>  collateral  * 0.5.2   2021-10-25 [1] CRAN (R 4.2.2)
#>  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.2.2)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr       * 1.1.0   2023-01-29 [1] CRAN (R 4.2.2)
#>  evaluate      0.20    2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.2)
#>  fs            1.6.1   2023-02-06 [1] CRAN (R 4.2.2)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
#>  htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
#>  knitr         1.42    2023-01-25 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
#>  lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.2.2)
#>  magrittr    * 2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
#>  pillar        1.8.1   2022-08-19 [1] CRAN (R 4.2.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
#>  purrr       * 1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.2)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.2)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.2)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.2.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.2)
#>  rmarkdown     2.20    2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
#>  styler        1.9.1   2023-03-04 [1] CRAN (R 4.2.2)
#>  tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.2)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
#>  timechange    0.2.0   2023-01-11 [1] CRAN (R 4.2.2)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.2)
#>  vctrs         0.5.2   2023-01-23 [1] CRAN (R 4.2.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
#>  xfun          0.37    2023-01-31 [1] CRAN (R 4.2.2)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.2)
#> 
#>  [1] D:/Jeremy/PortableR/RPortableLibraries/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

I have managed to create a column called converted_date_log to see the collateral output.
May I ask if there is a way to "unnest" to the output component of converted_date_log?

For example

test <- test_data %>%
  dplyr::mutate(
    converted_date = collateral::extract_output(.data[["converted_date_log"]])
  )

to give

image

I understand that it is possible to do it this way

test <- test_data %>%
  dplyr::mutate(
    converted_date = purrr::map_vec(
      .x = .data[["test_date"]],
      .f = convert_dmy_text_to_date
    ),
    converted_date_log = collateral::map_peacefully(
      .x = .data[["test_date"]],
      .f = convert_dmy_text_to_date
    )
  )

but it felt like I am using purrr twice.

@jimjam-slam
Copy link
Owner

Hi @JauntyJJS! We don't have a specifically named extract_output helper, although I've considered adding one. The problem is mostly just that some people will want list output, while others will want typed vectors. purrr::map and purrr::map_* already has this functionality, so I tend to recommend that people use those:

test %>% mutate(res = map(converted_date_log, "result"))
# # A tibble: 6 × 3
#   test_date  converted_date_log res       
#   <list>     <collat>           <list>    
# 1 <chr [1]>  R _ _ _ _          <date [1]>
# 2 <chr [1]>  R _ _ W _          <date [1]>
# 3 <dttm [1]> R _ _ _ _          <date [1]>
# 4 <lgl [1]>  R _ _ _ _          <lgl [1]> 
# 5 <dbl [1]>  R _ _ W _          <date [1]>
# 6 <chr [1]>  R _ _ W _          <date [1]>

test %>% mutate(res = map_vec(converted_date_log, "result"))
# # A tibble: 6 × 3
#   test_date  converted_date_log res       
#   <list>     <collat>           <date>    
# 1 <chr [1]>  R _ _ _ _          2022-10-25
# 2 <chr [1]>  R _ _ W _          NA        
# 3 <dttm [1]> R _ _ _ _          2022-10-07
# 4 <lgl [1]>  R _ _ _ _          NA        
# 5 <dbl [1]>  R _ _ W _          NA        
# 6 <chr [1]>  R _ _ W _          NA 

(Note that although you're still using purrr twice here, you're at least not running your own function twice—the second use here is just to extract the result).

You should, in theory, be able to extract other components the same way:

test %>% mutate(
  err = map_chr(converted_date_log, c("error", "message"), .default = NA_character_))
# # A tibble: 6 × 3
#   test_date  converted_date_log err  
#   <list>     <collat>           <chr>
# 1 <chr [1]>  R _ _ _ _          NA   
# 2 <chr [1]>  R _ _ W _          NA   
# 3 <dttm [1]> R _ _ _ _          NA   
# 4 <lgl [1]>  R _ _ _ _          NA   
# 5 <dbl [1]>  R _ _ W _          NA   
# 6 <chr [1]>  R _ _ W _          NA

But warnings, messages and output can be a pain, because they return character(0) instead of NULL when there's no such component present:

test %>% mutate(
  warn = map_chr(converted_date_log, "warnings", .default = NA_character_))
# Error in `mutate()`:
# ! Problem while computing `warn =
#   map_chr(converted_date_log, "warnings",
#   .default = NA_character_)`.
# Caused by error in `map_chr()`:
# ℹ In index: 1.
# Caused by error:
# ! Result must be length 1, not 0.
# Run `rlang::last_error()` to see where the error occurred.

That's a bit of a pain, to be frank. I might file an issue with purrr on this, because it's not clear to me why there is a difference between purrr::safely's output and purrr::quietly's output (or why the .default argument of map_chr shouldn't handle character(0) as well). But if they don't want to change it, I might add some helpers for some of these cases!

@jimjam-slam
Copy link
Owner

I'm also going to at @Maximilian-Stefan-Ernst, because I just spied tidyverse/purrr#843 while checking things for the issue and thought they might like collateral 😛

@jimjam-slam
Copy link
Owner

Looks like the very issue of dealing with character(0) was addressed in tidyverse/purrr#254, but none of the varieties of argument discussed (.null, .empty, .missing, missing, ...) seem to work.

@JauntyJJS
Copy link
Author

Hi @jimjam-slam,

Thank you for your recommendations. The suggestion is good enough for me to proceed.
As for the issue with character(0), we shall see how it goes with the purrr developers.

For now I can just manually check if there are any warnings, errors or messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants