You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An example of error that occurs on Windows when trying to upload a pandas DataFrame to H2O:
{noformat} File ".\src\anomaly_detection_95.py", line 36, in get_aggregated_rows
h2o_df = h2o.H2OFrame(all_rows)
File "c:\users\mllaugel\desktop\humano_fraud_isof\venv\lib\site-packages\h2o\frame.py", line 109, in init
self._upload_python_object(python_obj, destination_frame, header, separator,
File "c:\users\mllaugel\desktop\humano_fraud_isof\venv\lib\site-packages\h2o\frame.py", line 149, in _upload_python_object
csv_writer.writerows(data_to_write)
File "C:\Users\mllaugel\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 205: character maps to {noformat}
apart from the fact that the snippet above is not a best practice, it should not throw an error anyway.
On Windows, the {{.csv}} file is loaded by pandas using {{utf-8}} (see below), and before upload, the H2O Py client writes the frame into a tmp {{.csv}} file (see [https://github.com/h2oai/h2o-3/blob/03cc6c86a179021418ae0f21df372ab7df0fdd86/h2o-py/h2o/frame.py#L142|https://github.com/h2oai/h2o-3/blob/03cc6c86a179021418ae0f21df372ab7df0fdd86/h2o-py/h2o/frame.py#L142|smart-link] ) using a different encoding.
The error occurs because pandas apparently loads {{.csv}} files by default using {{utf-8}} encoding, the information is not available in doc but could find this by inspecting pandas code:
{code:python}# Windows does not default to utf-8. Set to utf-8 for a consistent behavior
encoding_passed, encoding = encoding, encoding or "utf-8" {code}
then when writing, as we don’t specify the {{encoding='utf-8'}}param when opening the tmp file being written, it tries to write it in default {{cp1252}}, raising the error due to some incompatible chars.
On top of fixing our {{_upload_python_object}} function, I’d recommend to review all usages of {{open(...)}} in our Py code base, for both read and write, and ensure that we always enforce {{utf-8}} encoding for a consistent behavior.
The text was updated successfully, but these errors were encountered:
An example of error that occurs on Windows when trying to upload a pandas DataFrame to H2O:
{noformat} File ".\src\anomaly_detection_95.py", line 36, in get_aggregated_rows
h2o_df = h2o.H2OFrame(all_rows)
File "c:\users\mllaugel\desktop\humano_fraud_isof\venv\lib\site-packages\h2o\frame.py", line 109, in init
self._upload_python_object(python_obj, destination_frame, header, separator,
File "c:\users\mllaugel\desktop\humano_fraud_isof\venv\lib\site-packages\h2o\frame.py", line 149, in _upload_python_object
csv_writer.writerows(data_to_write)
File "C:\Users\mllaugel\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 205: character maps to {noformat}
calling code:
{code:python} h2o.init()
all_rows = pd.read_csv(os.path.join(self.location, "training_data.csv"))
h2o_df = h2o.H2OFrame(all_rows){code}
apart from the fact that the snippet above is not a best practice, it should not throw an error anyway.
On Windows, the {{.csv}} file is loaded by pandas using {{utf-8}} (see below), and before upload, the H2O Py client writes the frame into a tmp {{.csv}} file (see [https://github.com/h2oai/h2o-3/blob/03cc6c86a179021418ae0f21df372ab7df0fdd86/h2o-py/h2o/frame.py#L142|https://github.com/h2oai/h2o-3/blob/03cc6c86a179021418ae0f21df372ab7df0fdd86/h2o-py/h2o/frame.py#L142|smart-link] ) using a different encoding.
The error occurs because pandas apparently loads {{.csv}} files by default using {{utf-8}} encoding, the information is not available in doc but could find this by inspecting pandas code:
{code:python}# Windows does not default to utf-8. Set to utf-8 for a consistent behavior
encoding_passed, encoding = encoding, encoding or "utf-8" {code}
then when writing, as we don’t specify the {{encoding='utf-8'}}param when opening the tmp file being written, it tries to write it in default {{cp1252}}, raising the error due to some incompatible chars.
On top of fixing our {{_upload_python_object}} function, I’d recommend to review all usages of {{open(...)}} in our Py code base, for both read and write, and ensure that we always enforce {{utf-8}} encoding for a consistent behavior.
The text was updated successfully, but these errors were encountered: