Save a h2o.ai model to S3 bucket in python #9260

exalate-issue-sync · 2023-05-12T11:50:31Z

I have been using the command below to save my h2O model into a s3 bucket in python3 (I am using amazon EMR):

h2o.save_model(model=best_gbm1,path='s3://bucketname/folder1/folder2', force=False)
but I do get the following error:

H2OServerError: HTTP 500 Server Error: Server error java.lang.RuntimeException: Error: Not implemented Request: None

is it possible to save a H2O model directly to a S3 bucket

exalate-issue-sync · 2023-05-12T11:50:33Z

Lauren DiPerna commented: issue is posted on StackOverFlow [here |https://stackoverflow.com/questions/55182284/save-a-h2o-ai-model-to-s3-bucket-in-python]

exalate-issue-sync · 2023-05-12T11:50:35Z

Pavel Pscheidl commented: Currently, this is not supported.

PersistS3 Class, line 263.

{code:java}
// Store Value v to disk.
@OverRide public void store(Value v) {
if( !v._key.home() ) return;
throw H2O.unimpl(); // VA only
}
{code}

exalate-issue-sync · 2023-05-12T11:50:36Z

Pavel Pscheidl commented: S3A supports it.

PersistHDFS class:
{code:java}
@OverRide public void store(Value v) {
// Should be used only if ice goes to HDFS
assert this == H2O.getPM().getIce();
assert !v.isPersisted();

byte[] m = v.memOrLoad();
assert (m == null || m.length == v._max); // Assert not saving partial files
store(new Path(_iceRoot, getIceName(v)), m);

}
{code}

exalate-issue-sync · 2023-05-12T11:50:38Z

Michal Kurka commented: reclassified to an improvement, minor priority - preferred way is to use S3A/S3N (on EMR)

exalate-issue-sync · 2023-05-12T11:50:40Z

Prabhu Subramanian commented: Hi All,

Is this also applicable for the below export?

{code:python}h2o.export_file(data_frame ,path='s3a://…..'){code}

exalate-issue-sync · 2023-05-12T11:50:42Z

Michal Kurka commented: [~accountid:5b9be0a796cb052b5f65d3a5] yes, the same applies to all export functions - you need to use “s3a” for your exports

exalate-issue-sync · 2023-05-12T11:50:43Z

Prabhu Subramanian commented: Hi Michal,

I know this might not be related to this ticket, but I needed some help in understanding the error I am trying to look into, which is related to this ticket. I would really appreciate it if you can help me with the error below which is in a way related to the ticket.

{code:python}h2o.export_file(data_frame ,path='s3a://bucket_name/path/dataset.csv'){code}

Error below:

{code:python}H2OServerError: HTTP 500 Server Error:
Server error water.api.HDFSIOException:
Error: HDFS IO Failure:
accessed URI : s3://com.squarkai.seer.develop.project-8/test/Churn_Train.csv
configuration: Configuration: core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, /Users/prabhusubramanian/Desktop/F Folder/RA Squark/h2o-3.32.0.2/core-site.xml
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: InvalidAccessKeyIdThe AWS Access Key Id you provided does not exist in our records.{code}

exalate-issue-sync · 2023-05-12T11:50:45Z

Michal Kurka commented: this looks like you provided invalid AWS access key id, can you make sure it is correct?

exalate-issue-sync · 2023-05-12T11:50:47Z

Prabhu Subramanian commented: Hi Michal,

Credentials provided through the XML file actually works for {{h2o.import_file('s3://…')}}

But not for the export statements, even with the {{s3a}} or {{s3n}}. I tried all the possibilities, but no success with the correct credentials provided. I am sure the credentials are right, because of the import statements working well, but not the export statements.

exalate-issue-sync · 2023-05-12T11:50:49Z

Kunal Mishra commented: I’ll throw a +1 in for implementing saving to S3 natively! As it is, I’ll probably save locally and use the R package {{aws.s3}} to work around the limitation, for anyone else looking for alternative solutions.

exalate-issue-sync · 2023-05-12T11:50:50Z

Michal Kurka commented: [~accountid:5cc0b0886fbf5a10040d2945] thanks for the input, I think it would be a great change to add

exalate-issue-sync · 2023-05-12T11:50:52Z

Kunal Mishra commented: Yup. Leaving an implementation here for anybody who comes through looking for the same thing!

{code:r}save_h2o_model_to_s3 <- function(h2o_model, s3_path, save_type = 'model', local_save_dir = tempdir(), keep_local = FALSE, show_progress = TRUE, force = TRUE) {
#' @description: Saves an H2O model to S3
#' @param h2o_model: a reference to the H2O model that needs to be saved
#' @param s3_path: a string containing the name the object should have in S3 (i.e., its "object key" or its intended S3 URI), as supplied to aws.s3::put_object()
#' @param save_type: a string, indicating which h2o.save function to use, between 'model', 'mojo', and 'model_details'
#' @param local_save_dir: An absolute path to the directory in which h2o_model will be saved
#' @param keep_local: Whether or not the local version of the saved h2o_model should be deleted after being pushed to S3
#' @param show_progress: A logical indicating whether to show a progress bar for uploads. Default is given by options("verbose").
#' @param force: A logical, indicating whether to overwrite files that already exist.
#' @returns: The h2o_model, invisibly

if (save_type == 'model') {
    local_save_path <- h2o::h2o.saveModel(object = h2o_model, path = local_save_dir, force = force)
} else if (save_type == 'mojo') {
    local_save_path <- h2o::h2o.save_mojo(object = h2o_model, path = local_save_dir, force = force)
} else if (save_type == 'model_details') {
    local_save_path <- h2o::h2o.saveModelDetails(object = h2o_model, path = local_save_dir, force = force)
} else {
    assertthat::assert_that(FALSE, msg = 'Unsupported save_type passed to save_h2o_model_to_s3(). Supported types are limited to "model", "model_details", and "mojo"')
}

aws.s3::put_object(
    file = local_save_path,
    object = s3_path,
    multipart = T
)

if (!keep_local) {
    suppressWarnings(file.remove(local_save_path))
}

return(invisible(h2o_model))

}{code}

exalate-issue-sync · 2023-05-12T11:50:54Z

Prabhu Subramanian commented: Should we expect this fix in the upcoming version?
Has this been fixed? or ignored?

exalate-issue-sync · 2023-05-12T11:50:55Z

Michal Kurka commented: [~accountid:5b9be0a796cb052b5f65d3a5] resolved as “fixed”, meaning the code change was implemented and the target release will have this feature working

Fix version was set to 3.34.0.1 which is H2O’s next major release you can expect in 1-2 months.

exalate-issue-sync · 2023-05-12T11:50:57Z

Michal Kurka commented: [~accountid:5b9be0a796cb052b5f65d3a5] you are welcome to try this feature in our nightly builds

[http://h2o-release.s3.amazonaws.com/h2o/master/latest.html|http://h2o-release.s3.amazonaws.com/h2o/master/latest.html]

Please keep in mind I just resolved the ticket today and the current nightly will not have it yet. It should appear there after a day or 2.

exalate-issue-sync · 2023-05-12T11:50:59Z

Prabhu Subramanian commented: Thank you very much, Michal! Looking forward to it. Appreciate your updates.

h2o-ops · 2023-05-15T00:47:24Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-6364
Assignee: Michal Kurka
Reporter: Reyhaneh Esmaielbeiki
State: Resolved
Fix Version: 3.34.0.1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#5423

exalate-issue-sync bot added S3 save_model labels May 12, 2023

h2o-ops closed this as completed May 15, 2023

h2o-ops added the fixVersion/3.34.0.1 label May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save a h2o.ai model to S3 bucket in python #9260

Save a h2o.ai model to S3 bucket in python #9260

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

h2o-ops commented May 15, 2023

Save a h2o.ai model to S3 bucket in python #9260

Save a h2o.ai model to S3 bucket in python #9260

Comments

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

h2o-ops commented May 15, 2023