Error when reading file with non-utf8 bytes in verbose mode #628

st-pasha · 2017-12-04T10:35:01Z

>>> import datatable as dt
>>> src = b"A,\x80\n2,3\n"
>>> dt.fread(src, verbose=True)
  Character 3 in the input is '\n', treating input as raw text
[1] Check arguments
  Using 8 threads (omp_get_max_threads()=8, nth=0)
  NAstrings = ["NA"]
  None of the NAstrings look like numbers.
  showProgress = 1
[3] Detect and skip BOM
[4] Detect end-of-line character(s)
  Detected eol as \n only.
[6] Skipping initial rows if needed
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 36: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/pasha/github/datatable2/datatable/fread.py", line 850, in debug
    print(_log_color(message), flush=True)
  File "/Users/pasha/py36/lib/python3.6/site-packages/blessed/formatters.py", line 239, in __call__
    for idx, ucs_part in enumerate(args):
SystemError: <class 'enumerate'> returned a result with an error set
...

Note: same error does not appear when verbose=False.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when reading file with non-utf8 bytes in verbose mode #628

Error when reading file with non-utf8 bytes in verbose mode #628

st-pasha commented Dec 4, 2017

Error when reading file with non-utf8 bytes in verbose mode #628

Error when reading file with non-utf8 bytes in verbose mode #628

Comments

st-pasha commented Dec 4, 2017