Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery): Add support for ML model export #7451

Merged
merged 8 commits into from Sep 1, 2020

Conversation

quartzmo
Copy link
Member

@quartzmo quartzmo commented Aug 12, 2020

closes: #7061

  • Add model support to Project#extract and Project#extract_job
  • Add ExtractJob#model?
  • Add ExtractJob#table?
  • Add ExtractJob#ml_tf_saved_model?
  • Add ExtractJob#ml_xgboost_booster?
  • Add Model#extract
  • Add Model#extract_job

@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 12, 2020
* Add model support to Project#extract and #extract_job
* Add ExtractJob#model?
* Add ExtractJob#ml_tf_saved_model?
* Add ExtractJob#ml_xgboost_booster?
* Add Model#extract and #extract_job

closes: googleapis#7061
@quartzmo quartzmo self-assigned this Aug 12, 2020
@quartzmo quartzmo added the api: bigquery Issues related to the BigQuery API. label Aug 12, 2020
@quartzmo quartzmo marked this pull request as ready for review August 13, 2020 18:13
@quartzmo quartzmo requested a review from a team as a code owner August 13, 2020 18:13
Tempfile.open "temp_extract_model" do |tmp|
extract_url = "gs://#{bucket.name}/#{model_id}"

# sut
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just shorthand for the method being tested

http://xunitpatterns.com/SUT.html

#
# @return [Boolean] `true` when `CSV`, `false` otherwise.
#
def csv?
val = @gapi.configuration.extract.destination_format
return true if val.nil?
return true if table? && val.nil?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make explicit that false is returned when extracting models? (i.e. documenting it, and maybe the code could say return false unless table? instead which would be more clear than this which assumes that val is nil when extracting models).

Similar comment for the similar methods surrounding this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -253,7 +319,7 @@ def location= value
end

##
# Sets the compression type.
# Sets the compression type. Not applicable when extracting models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it cause any problem if set when extracting models? (Should we just leave it be, or throw an exception?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our usual policy is to pass it through to the backend service.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a good example of letting the backend sort it out, as they may add support for this down the line.

##
# Exports the model to a Google Cloud Storage file using
# an asynchronous method. In this method, an {ExtractJob} is immediately
# returned. The caller may poll the service by repeatedly calling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe: Exports the model to a Google CloudStorage file asynchronously, immediately returning an {ExtractJob} that can be used to track the progress of the export job. The caller may poll the service...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much improved, thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

# The geographic location for the job ("US", "EU", etc.) can be set via
# {ExtractJob::Updater#location=} in a block passed to this method. If
# the model is a full resource representation (see {#resource_full?}),
# the location of the job will be automatically set to the location of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ...will automatically be set...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

# be used.
# @param [Hash] labels A hash of user-provided labels associated with
# the job. You can use these to organize and group your jobs. Label
# keys and values can be no longer than 63 characters, can only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ...63 characters, and can only...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

# @param [Hash] labels A hash of user-provided labels associated with
# the job. You can use these to organize and group your jobs. Label
# keys and values can be no longer than 63 characters, can only
# contain lowercase letters, numeric characters, underscores and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other param descriptions you include letters (a-z), etc. examples. Include them here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to copy in the entirety of the latest version of the label requirements from https://cloud.google.com/bigquery/docs/labels-intro#requirements

# the job. You can use these to organize and group your jobs. Label
# keys and values can be no longer than 63 characters, can only
# contain lowercase letters, numeric characters, underscores and
# dashes. International characters are allowed. Label values are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does international characters mean? (e.g. does this mean only "lower case" international characters? Does that exclude alphabets that do not have the concept of upper and lower case?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I'll see if the official documentation offers anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out the link below is outdated and now goes to the wrong page. Will fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to copy in the entirety of the latest version of the label requirements from https://cloud.google.com/bigquery/docs/labels-intro#requirements, and omit the link.

# labels](https://cloud.google.com/bigquery/docs/creating-managing-labels#requirements).
# @param [Boolean] dryrun If set, don't actually run this job. Behavior
# is undefined however for non-query jobs and may result in an error.
# Deprecated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is a new one being added. Given that, can we just omit this argument rather than creating it for the first time as already deprecated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Member

@dazuma dazuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@quartzmo
Copy link
Member Author

@shollyman PTAL, I believe this PR is ready to merge.

Copy link

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this all together. The testing scenarios have great coverage, checking for the the TF saved model etc.

I feel somehow guilty for how the level of repetition this request induced. So many rewrites of the label validation rule comments. :)

@@ -253,7 +319,7 @@ def location= value
end

##
# Sets the compression type.
# Sets the compression type. Not applicable when extracting models.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a good example of letting the backend sort it out, as they may add support for this down the line.

@quartzmo quartzmo merged commit 7626b17 into googleapis:master Sep 1, 2020
@quartzmo quartzmo deleted the bigquery-model-extract branch September 1, 2020 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: Implement BQML Model Export
3 participants