Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean-up and clarify Hive import documentation #8294

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 5 comments
Closed

Clean-up and clarify Hive import documentation #8294

exalate-issue-sync bot opened this issue May 11, 2023 · 5 comments

Comments

@exalate-issue-sync
Copy link

No description provided.

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: Users are confused about the right way to import data to H2O using Hive: [http://docs.h2o.ai/h2o/latest-stable/h2o-docs/getting-data-into-h2o.html#direct-hive-import|http://docs.h2o.ai/h2o/latest-stable/h2o-docs/getting-data-into-h2o.html#direct-hive-import]

Some people end-up using JDBC interface which is known to be sub-optimal.

We need to make sure there is a clearly preferred method, with a clear description. We also need to make sure the documentation doesn’t list Hive support as experimental.

Feedback from [~accountid:5d1185d20d64020c403ab5de]

{quote}Our documentation says "This feature is still experimental. In addition, Hive2 support in H2O is not yet suitable for large datasets."

{quote}

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: Also not just for Hive, but JDBC in general:

The handling of categorical values is different between file ingest and JDBC ingests: the JDBC treats categorical-values as Strings. Strings are not compressed in any way in H2O memory and using JDBC interface might need more memory and additional data post-processing (converting to categoricals explicitly).

@exalate-issue-sync
Copy link
Author

Angela Bartz commented: Pull request #4528 submitted to rel-zahradnik.

@exalate-issue-sync
Copy link
Author

Angela Bartz commented: Pull request #4528 merged into rel-zahradnik.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7343
Assignee: Angela Bartz
Reporter: Michal Kurka
State: Resolved
Fix Version: 3.30.0.3
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4528

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant