How to fix: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte” #9267

leticiabragas2 · 2021-04-08T16:35:15Z

how to solve this problem?
linux operating system 18.01

sample_data_folder = mne.datasets.testing.data_path()
sample_data_raw_file = os.path.join(sample_data_folder, 'NihonKohden', 'DA0935F5.EEG')

raw = mne.io.read_raw_nihon(sample_data_raw_file)

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-36-62ba074d16ef> in <module>
      2 sample_data_raw_file = os.path.join(sample_data_folder, 'NihonKohden', 'DA0935F5.EEG')
      3 
----> 4 raw = mne.io.read_raw_nihon(sample_data_raw_file)

~/anacondalele/envs/mne/lib/python3.8/site-packages/mne/io/nihon/nihon.py in read_raw_nihon(fname, preload, verbose)
     44     mne.io.Raw : Documentation of attribute and methods.
     45     """
---> 46     return RawNihon(fname, preload, verbose)
     47 
     48 

<decorator-gen-224> in __init__(self, fname, preload, verbose)

~/anacondalele/envs/mne/lib/python3.8/site-packages/mne/io/nihon/nihon.py in __init__(self, fname, preload, verbose)
    352 
    353         # Get annotations from LOG file
--> 354         annots = _read_nihon_annotations(fname, orig_time=info['meas_date'])
    355         self.set_annotations(annots)
    356 

~/anacondalele/envs/mne/lib/python3.8/site-packages/mne/io/nihon/nihon.py in _read_nihon_annotations(fname, orig_time)
    247             n_logs = np.fromfile(fid, np.uint8, 1)[0]
    248             fid.seek(t_blk_address + 0x14)
--> 249             t_logs = np.fromfile(fid, '|S45', n_logs).astype('U45')
    250 
    251             for t_log in t_logs:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 7: ordinal not in range(128)

The text was updated successfully, but these errors were encountered:

welcome · 2021-04-08T16:35:16Z

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️

cbrnr · 2021-04-08T19:01:23Z

I'm not really familiar with the Nihon Kohden format, but the error occurs because the reader tries to decode a non-ASCII character:

np.fromfile(fid, '|S45', n_logs)

I'm not sure why it is first decoded as ASCII and then converted to Unicode, because changing the whole line 249 to

t_logs = np.fromfile(fid, '|U45', n_logs)

would probably solve the problem. But again, I don't know if non-ASCII characters are allowed by the format.

@fraimondo @larsoner any idea?

cbrnr · 2021-04-09T07:38:20Z

Also, @leticiabragas2 the file DA0935F5.EEG is not part of the MNE testing data sets - is this a file you recorded yourself?

cbrnr · 2021-04-09T07:44:47Z

Can you share all files that are part of the DA0935F5 data set? Specifically, I need the EEG data file (extension .EEG).

MatthiasEb · 2021-05-07T14:42:51Z

Hi, I'm having the same issue. Indeed, it is related to non-ASCII characters in the annotations. I attached Examplefiles.zip. EDFBrowser converts without issues, so it doesn't seem to be an issue of the .EEG format itself.

The suggestion:
t_logs = np.fromfile(fid, '|U45', n_logs)
proposed by @cbrnr does not work, it raises another ValueError related to encoding, the decoding via ASCII and cast to utf-8 seems to be necessary.

As there is no way to suppress loading the annotations, the only way for me to load these files for now is to remove the annotations from the folder.

agramfort · 2021-05-09T15:53:15Z

see #9384

fraimondo · 2021-05-10T08:44:23Z

I don't know if non-ASCII characters are allowed by the format.

For EDF+, this is the text from the EDF+ specification (https://www.edfplus.info/specs/edfplus.html#header):
"These annotations may only contain UCS characters (ISO 10646, the 'Universal Character Set', which is identical to the Unicode version 3+ character set) encoded by UTF-8."

The problem with NK files is that we do not have the specification. I've seen this problem with many formats too. #9384 might fix it, until a new encoding comes.

MatthiasEb · 2021-05-10T08:55:08Z

Thanks for the prompt responses.

#9384 might fix it, until a new encoding comes.

I have absolutely no idea how regularly the NK files might change. If you think that this might be practically hard to maintain, at least a flag to suppress the automatic reading of the annotations would be reasonable, I guess.

agramfort · 2021-05-10T09:03:18Z

feel free to take over. I tried to provide a way out. Unlikely I have time to finish this myself this week

…

MatthiasEb · 2021-05-10T09:13:59Z

feel free to take over. I tried to provide a way out.

...and I'm really grateful for that. I've never contributed to anything and have limited time myself at the moment. I can try, but this might take a while and I might need some guidance...

fraimondo · 2021-05-10T10:41:11Z

It’s not about how they change, but what we find. We are working on a trial/error basis here. Having example files will help us determine which encodings are supported by the NK format.

…

On 10 May 2021, at 10:55, MatthiasEb ***@***.***> wrote: Thanks for the prompt responses. #9384 <#9384> might fix it, until a new encoding comes. I have absolutely no idea how regularly the NK files might change. If you think that this might be practically hard to maintain, at least a flag to suppress the automatic reading of the annotations would be reasonable, I guess. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9267 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCJDA2ZWDSXHKURPFEUJXTTM6NQJANCNFSM42TH2OLQ>.

MatthiasEb · 2021-05-10T11:27:55Z

Ah. Got it. Thanks.

leticiabragas2 added the BUG label Apr 8, 2021

cbrnr mentioned this issue Apr 9, 2021

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6437: character maps to <undefined> #9272

Closed

leticiabragas2 closed this as completed Apr 11, 2021

leticiabragas2 reopened this Apr 11, 2021

agramfort mentioned this issue May 9, 2021

support latin1 in annotations in nihon files #9384

Merged

larsoner closed this as completed in #9384 May 11, 2021

MatthiasEb mentioned this issue Mar 10, 2022

support latin1 encoding for channels in read_raw_nihon #10428

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fix: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte” #9267

How to fix: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte” #9267

leticiabragas2 commented Apr 8, 2021 •

edited

welcome bot commented Apr 8, 2021

cbrnr commented Apr 8, 2021

cbrnr commented Apr 9, 2021

cbrnr commented Apr 9, 2021

MatthiasEb commented May 7, 2021

agramfort commented May 9, 2021

fraimondo commented May 10, 2021

MatthiasEb commented May 10, 2021

agramfort commented May 10, 2021 via email

MatthiasEb commented May 10, 2021

fraimondo commented May 10, 2021 via email

MatthiasEb commented May 10, 2021

How to fix: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte” #9267

How to fix: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte” #9267

Comments

leticiabragas2 commented Apr 8, 2021 • edited

welcome bot commented Apr 8, 2021

cbrnr commented Apr 8, 2021

cbrnr commented Apr 9, 2021

cbrnr commented Apr 9, 2021

MatthiasEb commented May 7, 2021

agramfort commented May 9, 2021

fraimondo commented May 10, 2021

MatthiasEb commented May 10, 2021

agramfort commented May 10, 2021 via email

MatthiasEb commented May 10, 2021

fraimondo commented May 10, 2021 via email

MatthiasEb commented May 10, 2021

leticiabragas2 commented Apr 8, 2021 •

edited