-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the extract job: BigQuery -> Cloud Storage #119
Conversation
#' either as a string in the format used by BigQuery, or as a list with | ||
#' \code{project_id}, \code{dataset_id}, and \code{table_id} entries | ||
#' @param project project name | ||
#' @param destinationUris Specify the extract destination URI. Note: for large files, you may need to specify a wild-card since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you lost the end of the sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs to be wrapper
@realAkhmed are you interested in finishing off this PR? |
@hadley Absolutely! Just wasn't sure if the package is still actively developed. This particular PR was very useful for me internally since it opens way for what I called as |
@realAkhmed agreed that this is indeed a faster route in general -- but CSV as a transport format isn't great for the case that your table includes nested or repeated fields. I'd be curious what @hadley knows about potentially converting avro to a dataframe, since that would give us full fidelity exports. |
#' @export | ||
insert_extract_job <- function(source_table, project, destinationUris, | ||
compression = "NONE", destinationFormat = "CSV", | ||
fieldDelimiter = ",", printHeader = TRUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should use snake case to be consistent with the rest of bigrquery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also recommend putting one parameter on each line.
job <- wait_for(job) | ||
|
||
if(job$status$state == "DONE") { | ||
(job$statistics$extract$destinationUriFileCounts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra parens?
💯 claps for this! |
I'm just going to merge this so I can work on locally. |
Thanks @hadley ! Somewhat slow to respond at the moment (the teaching season) -- will come back to address the issues raised by you and @craigcitro once done with it! |
Hello. Have been using this function for a while in my personal fork and always wanted to share it with the community.
This is the code for doing a BigQuery extract job.
Basically, taking a BigQuery table and extracting it into a CSV file(s) inside Cloud Storage bucket.
The following code contains an example of how to use it
Shakespeare is a small dataset so you won't get charged a lot for that example.
The structure of this file is modeled after
insert_upload_job
andinsert_query_job