Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta lake fails to import (in Python) #7695

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 3 comments
Closed

Delta lake fails to import (in Python) #7695

exalate-issue-sync bot opened this issue May 11, 2023 · 3 comments

Comments

@exalate-issue-sync
Copy link

Delta lake file import was added to H2O (https://h2oai.atlassian.net/browse/PUBDEV-7923), but it fails for Python API

key is to disable the workaround in Python and it will start working:
H2OFrame.__LOCAL_EXPANSION_ON_SINGLE_IMPORT__ = False

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: It doesn’t work when a slash is inserted at the end of the directory name (eg. by user or by python - hence the workaround). The regular expression that filters out the log files needs to be revised in order to work with/without slash.

@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: Good point.

The error show the crc and json paths but converted all the forward slashes and colon to underscore:{{"dbfs:/mnt/delta/events3/_delta_log/00000000000000000000.crc"}} ->

{{dbfs__mnt_delta_events3__delta_log_00000000000000000000.crc}}

{noformat}H2OResponseError: Server error water.exceptions.H2OIllegalArgumentException:
Error: File type mismatch. Cannot parse files [dbfs__mnt_delta_events3__delta_log_00000000000000000000.crc] and [dbfs__mnt_delta_events3__delta_log_00000000000000000000.json] of type CSV and CSV as one dataset.
Request: POST /3/ParseSetup
data: {'check_header': '0', 'source_frames': '["dbfs:/mnt/delta/events3/_delta_log/00000000000000000000.crc","dbfs:/mnt/delta/events3/_delta_log/00000000000000000000.json","dbfs:/mnt/delta/events3/part-00000-cb615987-a915-4367-b18b-d505dbbf958d-c000.snappy.parquet","dbfs:/mnt/delta/events3/part-00001-7a577fa6-773f-46b7-a018-bd26de54c4fa-c000.snappy.parquet","dbfs:/mnt/delta/events3/part-00002-3e42bfeb-3bf8-463f-8942-fe18946e6928-c000.snappy.parquet","dbfs:/mnt/delta/events3/part-00003-7f670e25-13f1-4021-9d32-50c8298addcf-c000.snappy.parquet","dbfs:/mnt/delta/events3/part-00004-c8085fc2-e9bd-4c3e-92ee-3913ab133d33-c000.snappy.parquet","dbfs:/mnt/delta/events3/part-00005-e6c20306-d446-47c9-935c-68a52e66a6c1-c000.snappy.parquet","dbfs:/mnt/delta/events3/part-00006-54d95d00-6c03-4544-83d9-c5c0dca02df2-c000.snappy.parquet","dbfs:/mnt/delta/events3/part-00007-1c79a572-c3d7-47bc-b66b-7edfa16a285d-c000.snappy.parquet"]', 'single_quotes': 'False'}{noformat}

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7951
Assignee: Michal Kurka
Reporter: Neema Mashayekhi
State: Resolved
Fix Version: 3.32.0.4
Attachments: Available (Count: 1)
Development PRs: Available

Linked PRs from JIRA

#5255

Attachments From Jira

Attachment Name: Screen Shot 2021-01-27 at 1.31.28 PM.png
Attached By: Michal Kurka
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7951/Screen Shot 2021-01-27 at 1.31.28 PM.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant