Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to Convert Categorical Columns on Big Dataset and Identity Column #3132

Closed
exalate-issue-sync bot opened this issue May 22, 2023 · 5 comments
Closed
Assignees

Comments

@exalate-issue-sync
Copy link

When we try to upload a big table (18M rows) using
train_age<- hc$asH2OFrame(wide,"wide") fails.. Throws the following exception:
Error: ai.h2o.sparkling.backend.exceptions.RestApiCommunicationException: H2O node 10.100.90.29:54321 responded with
Status code: 500 : Server Error

probably caused by a NullPointeException in ai.h2o.sparkling.extensions.internals.UpdateCategoricalIndicesTask.map(UpdateCategoricalIndicesTask.java:67)"

I have attached a file with more information

@exalate-issue-sync
Copy link
Author

Marek Novotny commented: [~accountid:5f915eee9c31840076f0f317] Thanks for reporting the problem, can you also share schema of your dataset, you can anonymize the column names if needed.

@exalate-issue-sync
Copy link
Author

Juan Campos commented: I’ve attached in a csv file the schema (variables(1).csv), it’s big, with 2995 columns.

[^variables(1).csv]

The dataset has 18M rows.

I’ve detected the problem is in “obfs_bkuuid“, this one is a id , (each row has a different value). if I remove this column from sql before asH2OFrame , it works.

@exalate-issue-sync
Copy link
Author

Marek Novotny commented: I think it will be related to the changes i did for [https://h2oai.atlassian.net/jira/software/c/projects/SW/issues/SW-2449|https://h2oai.atlassian.net/jira/software/c/projects/SW/issues/SW-2449?filter=reportedbyme]. This functionality is covered by tests, but it’s not all internal edge cases are covered.

@DinukaH2O
Copy link

JIRA Issue Migration Info

Jira Issue: SW-2470
Assignee: Marek Novotny
Reporter: Juan Campos
State: Resolved
Fix Version: 3.32.0.2-1
Attachments: Available (Count: 2)
Development PRs: Available

Linked PRs from JIRA

#2375

Attachments From Jira

Attachment Name: exception.txt
Attached By: Juan Campos
File Link:https://h2o-jira-github-migration.s3.amazonaws.com/Sparkling-Water/SW-2470/exception.txt

Attachment Name: variables(1).csv
Attached By: Juan Campos
File Link:https://h2o-jira-github-migration.s3.amazonaws.com/Sparkling-Water/SW-2470/variables(1).csv

@hasithjp
Copy link
Member

JIRA Issue Migration Info Cont'd

Jira Issue Created Date: 2020-10-22T05:42:21.243-0700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants