Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add H2O import folder support to GCS #8442

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Closed

Add H2O import folder support to GCS #8442

exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments

Comments

@exalate-issue-sync
Copy link

  • GCS backend in H2O-3 doesn’t support folder imports, support for this should be added
  • the pattern parameter is ignored also, support for this parameter should be added

Note that single file imports work.

{code:python}import h2o

h2o.init()

path = 'gs://h2o-gcs-public-demo-data/folder_ingest/'
h2o.import_file(path=path)

h2o.shutdown(){code}

gives

{code:python}---------------------------------------------------------------------------
H2OServerError Traceback (most recent call last)
in
5
6 path = 'gs://h2o-gcs-public-demo-data/folder_ingest/'
----> 7 h2o.import_file(path=path)
8
9 h2o.shutdown()

~/miniconda3/envs/h2o3/lib/python3.6/site-packages/h2o/h2o.py in import_file(path, destination_frame, parse, header, sep, col_names, col_types, na_strings, pattern, skipped_columns, custom_non_data_line_markers)
474 else:
475 return H2OFrame()._import_parse(path, pattern, destination_frame, header, sep, col_names, col_types, na_strings,
--> 476 skipped_columns, custom_non_data_line_markers)
477
478

~/miniconda3/envs/h2o3/lib/python3.6/site-packages/h2o/frame.py in _import_parse(self, path, pattern, destination_frame, header, separator, column_names, column_types, na_strings, skipped_columns, custom_non_data_line_markers)
428 rawkey = h2o.lazy_import(path, pattern)
429 self._parse(rawkey, destination_frame, header, separator, column_names, column_types, na_strings,
--> 430 skipped_columns, custom_non_data_line_markers)
431 return self
432

~/miniconda3/envs/h2o3/lib/python3.6/site-packages/h2o/frame.py in _parse(self, rawkey, destination_frame, header, separator, column_names, column_types, na_strings, skipped_columns, custom_non_data_line_markers)
442 na_strings=None, skipped_columns=None, custom_non_data_line_markers = None):
443 setup = h2o.parse_setup(rawkey, destination_frame, header, separator, column_names, column_types, na_strings,
--> 444 skipped_columns, custom_non_data_line_markers)
445 return self._parse_raw(setup)
446

~/miniconda3/envs/h2o3/lib/python3.6/site-packages/h2o/h2o.py in parse_setup(raw_frames, destination_frame, header, separator, column_names, column_types, na_strings, skipped_columns, custom_non_data_line_markers)
767 kwargs["custom_non_data_line_markers"] = custom_non_data_line_markers;
768
--> 769 j = api("POST /3/ParseSetup", data=kwargs)
770 if "warnings" in j and j["warnings"]:
771 for w in j["warnings"]:

~/miniconda3/envs/h2o3/lib/python3.6/site-packages/h2o/h2o.py in api(endpoint, data, json, filename, save_to)
121 # type checks are performed in H2OConnection class
122 _check_connection()
--> 123 return h2oconn.request(endpoint, data=data, json=json, filename=filename, save_to=save_to)
124
125

~/miniconda3/envs/h2o3/lib/python3.6/site-packages/h2o/backend/connection.py in request(self, endpoint, data, json, filename, save_to)
476 save_to = save_to(resp)
477 self._log_end_transaction(start_time, resp)
--> 478 return self._process_response(resp, save_to)
479
480 except (requests.exceptions.ConnectionError, requests.exceptions.HTTPError) as e:

~/miniconda3/envs/h2o3/lib/python3.6/site-packages/h2o/backend/connection.py in _process_response(response, save_to)
827 # Note that it is possible to receive valid H2OErrorV3 object in this case, however it merely means the server
828 # did not provide the correct status code.
--> 829 raise H2OServerError("HTTP %d %s:\n%r" % (status_code, response.reason, data))
830
831

H2OServerError: HTTP 500 Server Error:
Server error water.util.DistributedException:
Error: DistributedException from /127.0.0.1:54321: 'This H2O node couldn't find the file(s) to parse. Please check files and/or working directories.'
Request: None
}{code}

@exalate-issue-sync
Copy link
Author

Pavel Pscheidl commented: [https://github.com/googleapis/google-cloud-python/issues/1216|https://github.com/googleapis/google-cloud-python/issues/1216|smart-link]

[https://github.com/googleapis/google-cloud-python/issues/920|https://github.com/googleapis/google-cloud-python/issues/920|smart-link]

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7190
Assignee: Pavel Pscheidl
Reporter: Joseph Granados
State: Closed
Fix Version: 3.28.0.3
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant