You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When writing tests for [https://h2oai.atlassian.net/browse/PUBDEV-8876|https://h2oai.atlassian.net/browse/PUBDEV-8876|smart-link], encountered this issue:
{noformat} ~/repos/h2o/h2o-3/h2o-py/h2o/h2o.py in parse_setup(raw_frames, destination_frame, header, separator, column_names, column_types, na_strings, skipped_columns, custom_non_data_line_markers, partition_by, quotechar, escapechar)
874 if len(column_names) != len(j["column_types"]): raise ValueError(
875 "length of col_names should be equal to the number of columns: %d vs %d"
--> 876 % (len(column_names), len(j["column_types"])))
877 j["column_names"] = column_names
878 counter = 0
ValueError: length of col_names should be equal to the number of columns: 1000000 vs 331186{noformat}
The text was updated successfully, but these errors were encountered:
Sebastien Poirier commented: comments from internal discussion:
{noformat}michalkurka 4 days ago
There is a magic number, the default chunk size for Uploaded data is 4MB, your file is about 22MB, and it is split into 6 chunks. It looks like the header is longer than one chunk and the data is incorrectly parsed.
Something like this should only happen for datasets like yours, super-short and ultra-wide.
The error is caused by the header data takes more then one chunk of memory to store. In this case, if we can auto detect this condition and re-assign the chunk size, this should avoid the problem.
When writing tests for [https://h2oai.atlassian.net/browse/PUBDEV-8876|https://h2oai.atlassian.net/browse/PUBDEV-8876|smart-link], encountered this issue:
{noformat} ~/repos/h2o/h2o-3/h2o-py/h2o/h2o.py in parse_setup(raw_frames, destination_frame, header, separator, column_names, column_types, na_strings, skipped_columns, custom_non_data_line_markers, partition_by, quotechar, escapechar)
874 if len(column_names) != len(j["column_types"]): raise ValueError(
875 "length of col_names should be equal to the number of columns: %d vs %d"
--> 876 % (len(column_names), len(j["column_types"])))
877 j["column_names"] = column_names
878 counter = 0
The text was updated successfully, but these errors were encountered: