Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list_datasets only grabs first 50 results #141

Closed
combinatorist opened this issue Feb 3, 2017 · 8 comments

Comments

@combinatorist
Copy link

commented Feb 3, 2017

This is because the Google Big Query API only lists 50 by default. You can override this if you pass a maxResults argument or set all to True.

https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list#parameters

Please pass one or both of these arguments via bigrquery.list_datasets.

@combinatorist

This comment has been minimized.

Copy link
Author

commented Feb 3, 2017

I'm happy to provide more context to this issue, but I couldn't find a default issue template or anything in your README.md.

Adding @meridithperatikos to help watch this issue.

@meridithperatikos

This comment has been minimized.

Copy link

commented Feb 3, 2017

Relates to #108.

@combinatorist

This comment has been minimized.

Copy link
Author

commented Feb 3, 2017

Just to be explicit: #108 refers to list_tables (within a given dataset), but it has an analogous problem, so they could easily be fixed together.

@Alexander-McLean

This comment has been minimized.

Copy link

commented Mar 13, 2017

Here's a quick fix, in case anyone else has the same issue:

list_datasets <- function(project) { assert_that(is.string(project)) url <- sprintf("projects/%s/datasets", Project) query <- list() query$maxResults <- 999999 data <- bigrquery:::bq_get(url, query = query)$datasets unlist(lapply(data, function(x) x$datasetReference$datasetId)) }

@combinatorist

This comment has been minimized.

Copy link
Author

commented Mar 16, 2017

@Alexander-McLean, I'm assuming you meant this? (use triple quotes "```" for multi-line code)

list_datasets <- function(project) { 
  assert_that(is.string(project)) 
  url <- sprintf("projects/%s/datasets", Project) 
  query <- list() 
  query$maxResults <- 999999 
  data <- bigrquery:::bq_get(url, query = query)$datasets 
  unlist(lapply(data, function(x) x$datasetReference$datasetId)) 
}
@combinatorist

This comment has been minimized.

Copy link
Author

commented Mar 16, 2017

Thanks, @Alexander-McLean!

I got it to run with the following tweaks:

  • commented out assert_that (not installed for me)
  • lower cased the "Project" variable - proper case variable wasn't defined (at least in the scope of the function)
> list_datasets <- function(project) { 
+ #    assert_that(is.string(project)) 
+     url <- sprintf("projects/%s/datasets", project) 
+     query <- list() 
+     query$maxResults <- 999999 
+     data <- bigrquery:::bq_get(url, query = query)$datasets 
+     unlist(lapply(data, function(x) x$datasetReference$datasetId)) 
+ }
> 
> 
> list_datasets('bigquery-public-data')
 [1] "baseball"                "bls"                     "cloud_storage_geo_index"
 [4] "common_eu"               "common_us"               "fec"                    
 [7] "ghcn_d"                  "ghcn_m"                  "github_repos"           
[10] "hacker_news"             "irs_990"                 "medicare"               
[13] "new_york"                "noaa_gsod"               "open_images"            
[16] "samples"                 "san_francisco"           "stackoverflow"          
[19] "usa_names"              
> 

Notice, this public project doesn't have enough datasets to prove we're breaking over the 50, but I ran it on the private project where we noticed the problem and it successfully retrieved 178 datasets.

@combinatorist

This comment has been minimized.

Copy link
Author

commented Mar 16, 2017

@Alexander-McLean, I'm not really familiar with R. I just tried adapting the same approach to list_tables (#108), but it didn't work (for me). Any ideas?

> list_tables <- function(dataset) { 
+     # assert_that(is.string(dataset)) 
+     url <- sprintf("dataset/%s/tables", dataset) 
+     query <- list() 
+     query$maxResults <- 999999 
+     data <- bigrquery:::bq_get(url, query = query)$tables 
+     unlist(lapply(data, function(x) x$tableReference$tableId)) 
+ }
> 
> 
> list_tables('bigquery-public-data:baseball')
 Hide Traceback
 
 Rerun with Debug
 Error: HTTP error [404] Not Found 
4.
stop("HTTP error [", req$status, "] ", out, call. = FALSE) 
3.
process_request(req) 
2.
bigrquery:::bq_get(url, query = query) 
1.
list_tables("bigquery-public-data:baseball") 
> 
> 
> list_tables('baseball')
 Hide Traceback
 
 Rerun with Debug
 Error: HTTP error [404] Not Found 
4.
stop("HTTP error [", req$status, "] ", out, call. = FALSE) 
3.
process_request(req) 
2.
bigrquery:::bq_get(url, query = query) 
1.
list_tables("baseball") 
> 
> 

@hadley hadley added the bug label Apr 18, 2017

@hadley hadley closed this in 4dbc2ee Apr 19, 2017

@combinatorist

This comment has been minimized.

Copy link
Author

commented Apr 20, 2017

Thanks, @hadley!

Zsedo pushed a commit to Zsedo/bigrquery that referenced this issue Jun 26, 2017

Paginate list_tables
Fixes r-dbi#141

And fix a bunch of R CMD check problems :(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.