Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diffs produces a data.frame with list columns #367

Open
jonocarroll opened this issue Feb 19, 2024 · 0 comments
Open

diffs produces a data.frame with list columns #367

jonocarroll opened this issue Feb 19, 2024 · 0 comments

Comments

@jonocarroll
Copy link

I am using comparedf to compare two versions of a data.frame containing dates, and want to summarise the change ("value was x, now is y"). diffs.comparedf creates list-columns for values.x and values.y which don't behave nicely.

As a worked example, consider a data.frame with vector columns

dates <- as.Date(c("2024-01-01", "2024-01-02"))

df_veccols <- data.frame(num = 1:2, date = I(dates))
df_veccols
#>   num       date
#> 1   1 2024-01-01
#> 2   2 2024-01-02
str(df_veccols)
#> 'data.frame':    2 obs. of  2 variables:
#>  $ num : int  1 2
#>  $ date: AsIs, format: "2024-01-01" "2024-01-02"

paste(df_veccols$num, df_veccols$date, sep = ": ")
#> [1] "1: 2024-01-01" "2: 2024-01-02"

However, this internal function

tolist <- function(df)
{
df$values.x <- I(as.list(df$values.x)) # need the I() for factors and dates to show up right
df$values.y <- I(as.list(df$values.y))
df
}

creates a version with list columns, like this

df_listcols <- data.frame(num = 1:2, date = I(as.list(dates)))
df_listcols
#>   num       date
#> 1   1 2024-01-01
#> 2   2 2024-01-02
str(df_listcols)
#> 'data.frame':    2 obs. of  2 variables:
#>  $ num : int  1 2
#>  $ date:List of 2
#>   ..$ : Date, format: "2024-01-01"
#>   ..$ : Date, format: "2024-01-02"
#>   ..- attr(*, "class")= chr "AsIs"

paste(df_listcols$num, df_listcols$date, sep = ": ")
#> [1] "1: 19723" "2: 19724"

I believe the core incompatibility is with

as.character(I(as.list(dates[1])))
#> [1] "19723"

# vs

as.character(I(dates[1]))
#> [1] "2024-01-01"

df_listcols$date
#> [[1]]
#> [1] "2024-01-01"
#> 
#> [[2]]
#> [1] "2024-01-02"

unlist(df_listcols$date)
#> [1] 19723 19724

In arsenal, this is

df1 <- data.frame(num = 1:2, dates = dates)
df2 <- data.frame(num = 1:2, dates = dates + c(0, 1))

diffs <- arsenal::diffs(arsenal::comparedf(df1, df2))
diffs
#>   var.x var.y ..row.names..   values.x   values.y row.x row.y
#> 1 dates dates             2 2024-01-02 2024-01-03     2     2
str(diffs)
#> 'data.frame':    1 obs. of  7 variables:
#>  $ var.x        : chr "dates"
#>  $ var.y        : chr "dates"
#>  $ ..row.names..: int 2
#>  $ values.x     :List of 1
#>   ..$ : AsIs, format: "2024-01-02"
#>   ..- attr(*, "class")= chr "AsIs"
#>  $ values.y     :List of 1
#>   ..$ : AsIs, format: "2024-01-03"
#>   ..- attr(*, "class")= chr "AsIs"
#>  $ row.x        : int 2
#>  $ row.y        : int 2

Should I expect this work?

paste(diffs$var.x, "was", diffs$values.x, "now is", diffs$values.y)
#> [1] "dates was 19724 now is 19725"

As a workaround, I currently need to do this (unless there is a better way)

paste(diffs$var.x, "was", diffs$values.x[[1]], "now is", diffs$values.y[[1]])
#> [1] "dates was 2024-01-02 now is 2024-01-03"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant