Skip to content

vec_rbind with packed and nested data frames #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeroen opened this issue Mar 5, 2019 · 6 comments
Closed

vec_rbind with packed and nested data frames #220

jeroen opened this issue Mar 5, 2019 · 6 comments

Comments

@jeroen
Copy link
Member

jeroen commented Mar 5, 2019

Real world examples of packed and nested data

No packed / nested data frames

#store all pages in a list first
baseurl <- "https://projects.propublica.org/nonprofits/api/v2/search.json?order=revenue&sort_order=desc"
pages <- list()
for(i in 0:10){
  mydata <- jsonlite::fromJSON(paste0(baseurl, "&page=", i))
  message("Retrieving page ", i)
  pages[[i+1]] <- mydata$organizations
}

# both OK
alldata1 <- jsonlite::rbind_pages(pages)
alldata2 <- dplyr::bind_rows(pages)

# not OK
alldata3 <- do.call(vctrs::vec_rbind, pages)

Packed data frames

# with packed data frames
urls <- sprintf('https://api.github.com/repos/tidyverse/ggplot2/commits?page=%d', 1:5)
pages <- lapply(urls, jsonlite::fromJSON)

# ok
alldata1 <- jsonlite::rbind_pages(pages)

# unsupported
alldata2 <- dplyr::bind_rows(pages)

# error
alldata3 <- do.call(vctrs::vec_rbind, pages)

Nested data frames

The flatten argument in fromJSON converts packed into nested data frames:

# with nested data frames
urls <- sprintf('https://api.github.com/repos/tidyverse/ggplot2/commits?page=%d', 1:5)
pages <- lapply(urls, jsonlite::fromJSON, flatten = TRUE)

# ok
alldata1 <- jsonlite::rbind_pages(pages)
alldata2 <- dplyr::bind_rows(pages)

#error
alldata3 <- do.call(vctrs::vec_rbind, pages)
@lionel-
Copy link
Member

lionel- commented Mar 8, 2019

@jeroen It seems jsonlite creates data frames whose rownames attribute is not negative:

rlang:::sexp_attrib(pages[[1]])
#> $names
#> [1] "sha"          "node_id"      "commit"       "url"
#> [5] "html_url"     "comments_url" "author"       "committer"
#> [9] "parents"
#>
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> [1] NA 30

I'm not sure this is standard, but it seems base R is ok with it. Here is how it normally looks like:

rlang:::sexp_attrib(data.frame(1:10))
#> $names
#> [1] "X1.10"
#>
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> [1]  NA -10

@hadley Would it make sense to use abs() instead of - here?

return -INTEGER(rn)[1];

@jeroen
Copy link
Member Author

jeroen commented Mar 11, 2019

Hmm, is this part of the new altrep shorthand for a series of integers? Indeed I see

rlang:::sexp_attrib(pages[[1]])
#> $names
#> [1] "sha"          "node_id"      "commit"       "url"
#> [5] "html_url"     "comments_url" "author"       "committer"
#> [9] "parents"
#>
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> [1] NA 30

But row.names or attributes gives the expected results:

attributes(pages[[1]])
#> $names
#> [1] "sha"          "node_id"      "commit"       "url"          "html_url"     "comments_url" "author"      
#> [8] "committer"    "parents"     
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

@lionel-
Copy link
Member

lionel- commented Mar 11, 2019

It is supposed to be negative (see also ?.row_names_info), but R uses abs() presumably as a defensive measure: https://github.com/wch/r-source/blob/6ba047b7f335c0bf8be3e78300c2b5c1330d366a/src/main/attrib.c#L207

I'll change it to abs() as well.

@jeroen
Copy link
Member Author

jeroen commented Mar 11, 2019

I'm still seeing Error in rep_len(NA_integer_, n) : invalid 'length.out' value when running the examples above?

@lionel-
Copy link
Member

lionel- commented Mar 11, 2019

With the current master?

@jeroen
Copy link
Member Author

jeroen commented Mar 11, 2019

Updated all of my dev packages and it works now. Probably some dev dependency...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants