Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save a h2o.ai model to S3 bucket in python #9260

Closed
exalate-issue-sync bot opened this issue May 12, 2023 · 17 comments
Closed

Save a h2o.ai model to S3 bucket in python #9260

exalate-issue-sync bot opened this issue May 12, 2023 · 17 comments

Comments

@exalate-issue-sync
Copy link

I have been using the command below to save my h2O model into a s3 bucket in python3 (I am using amazon EMR):

h2o.save_model(model=best_gbm1,path='s3://bucketname/folder1/folder2', force=False)
but I do get the following error:

H2OServerError: HTTP 500 Server Error: Server error java.lang.RuntimeException: Error: Not implemented Request: None

is it possible to save a H2O model directly to a S3 bucket

@exalate-issue-sync
Copy link
Author

Lauren DiPerna commented: issue is posted on StackOverFlow [here |https://stackoverflow.com/questions/55182284/save-a-h2o-ai-model-to-s3-bucket-in-python]

@exalate-issue-sync
Copy link
Author

Pavel Pscheidl commented: Currently, this is not supported.

PersistS3 Class, line 263.

{code:java}
// Store Value v to disk.
@OverRide public void store(Value v) {
if( !v._key.home() ) return;
throw H2O.unimpl(); // VA only
}
{code}

@exalate-issue-sync
Copy link
Author

Pavel Pscheidl commented: S3A supports it.

PersistHDFS class:
{code:java}
@OverRide public void store(Value v) {
// Should be used only if ice goes to HDFS
assert this == H2O.getPM().getIce();
assert !v.isPersisted();

byte[] m = v.memOrLoad();
assert (m == null || m.length == v._max); // Assert not saving partial files
store(new Path(_iceRoot, getIceName(v)), m);

}
{code}

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: reclassified to an improvement, minor priority - preferred way is to use S3A/S3N (on EMR)

@exalate-issue-sync
Copy link
Author

Prabhu Subramanian commented: Hi All,

Is this also applicable for the below export?

{code:python}h2o.export_file(data_frame ,path='s3a://…..'){code}

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: [~accountid:5b9be0a796cb052b5f65d3a5] yes, the same applies to all export functions - you need to use “s3a” for your exports

@exalate-issue-sync
Copy link
Author

Prabhu Subramanian commented: Hi Michal,

I know this might not be related to this ticket, but I needed some help in understanding the error I am trying to look into, which is related to this ticket. I would really appreciate it if you can help me with the error below which is in a way related to the ticket.

{code:python}h2o.export_file(data_frame ,path='s3a://bucket_name/path/dataset.csv'){code}

Error below:

{code:python}H2OServerError: HTTP 500 Server Error:
Server error water.api.HDFSIOException:
Error: HDFS IO Failure:
accessed URI : s3://com.squarkai.seer.develop.project-8/test/Churn_Train.csv
configuration: Configuration: core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, /Users/prabhusubramanian/Desktop/F Folder/RA Squark/h2o-3.32.0.2/core-site.xml
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: InvalidAccessKeyIdThe AWS Access Key Id you provided does not exist in our records.{code}

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: this looks like you provided invalid AWS access key id, can you make sure it is correct?

@exalate-issue-sync
Copy link
Author

Prabhu Subramanian commented: Hi Michal,

Credentials provided through the XML file actually works for {{h2o.import_file('s3://…')}}

But not for the export statements, even with the {{s3a}} or {{s3n}}. I tried all the possibilities, but no success with the correct credentials provided. I am sure the credentials are right, because of the import statements working well, but not the export statements.

@exalate-issue-sync
Copy link
Author

Kunal Mishra commented: I’ll throw a +1 in for implementing saving to S3 natively! As it is, I’ll probably save locally and use the R package {{aws.s3}} to work around the limitation, for anyone else looking for alternative solutions.

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: [~accountid:5cc0b0886fbf5a10040d2945] thanks for the input, I think it would be a great change to add

@exalate-issue-sync
Copy link
Author

Kunal Mishra commented: Yup. Leaving an implementation here for anybody who comes through looking for the same thing!

{code:r}save_h2o_model_to_s3 <- function(h2o_model, s3_path, save_type = 'model', local_save_dir = tempdir(), keep_local = FALSE, show_progress = TRUE, force = TRUE) {
#' @description: Saves an H2O model to S3
#' @param h2o_model: a reference to the H2O model that needs to be saved
#' @param s3_path: a string containing the name the object should have in S3 (i.e., its "object key" or its intended S3 URI), as supplied to aws.s3::put_object()
#' @param save_type: a string, indicating which h2o.save function to use, between 'model', 'mojo', and 'model_details'
#' @param local_save_dir: An absolute path to the directory in which h2o_model will be saved
#' @param keep_local: Whether or not the local version of the saved h2o_model should be deleted after being pushed to S3
#' @param show_progress: A logical indicating whether to show a progress bar for uploads. Default is given by options("verbose").
#' @param force: A logical, indicating whether to overwrite files that already exist.
#' @returns: The h2o_model, invisibly

if (save_type == 'model') {
    local_save_path <- h2o::h2o.saveModel(object = h2o_model, path = local_save_dir, force = force)
} else if (save_type == 'mojo') {
    local_save_path <- h2o::h2o.save_mojo(object = h2o_model, path = local_save_dir, force = force)
} else if (save_type == 'model_details') {
    local_save_path <- h2o::h2o.saveModelDetails(object = h2o_model, path = local_save_dir, force = force)
} else {
    assertthat::assert_that(FALSE, msg = 'Unsupported save_type passed to save_h2o_model_to_s3(). Supported types are limited to "model", "model_details", and "mojo"')
}

aws.s3::put_object(
    file = local_save_path,
    object = s3_path,
    multipart = T
)

if (!keep_local) {
    suppressWarnings(file.remove(local_save_path))
}

return(invisible(h2o_model))

}{code}

@exalate-issue-sync
Copy link
Author

Prabhu Subramanian commented: Should we expect this fix in the upcoming version?
Has this been fixed? or ignored?

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: [~accountid:5b9be0a796cb052b5f65d3a5] resolved as “fixed”, meaning the code change was implemented and the target release will have this feature working

Fix version was set to 3.34.0.1 which is H2O’s next major release you can expect in 1-2 months.

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: [~accountid:5b9be0a796cb052b5f65d3a5] you are welcome to try this feature in our nightly builds

[http://h2o-release.s3.amazonaws.com/h2o/master/latest.html|http://h2o-release.s3.amazonaws.com/h2o/master/latest.html]

Please keep in mind I just resolved the ticket today and the current nightly will not have it yet. It should appear there after a day or 2.

@exalate-issue-sync
Copy link
Author

Prabhu Subramanian commented: Thank you very much, Michal! Looking forward to it. Appreciate your updates.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 15, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-6364
Assignee: Michal Kurka
Reporter: Reyhaneh Esmaielbeiki
State: Resolved
Fix Version: 3.34.0.1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#5423

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant