Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fix: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte” #9267

Closed
leticiabragas2 opened this issue Apr 8, 2021 · 12 comments · Fixed by #9384
Closed

How to fix: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte” #9267

leticiabragas2 opened this issue Apr 8, 2021 · 12 comments · Fixed by #9384
Labels

Comments

@leticiabragas2
Copy link

leticiabragas2 commented Apr 8, 2021

how to solve this problem?
linux operating system 18.01

sample_data_folder = mne.datasets.testing.data_path()
sample_data_raw_file = os.path.join(sample_data_folder, 'NihonKohden', 'DA0935F5.EEG')

raw = mne.io.read_raw_nihon(sample_data_raw_file)
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-36-62ba074d16ef> in <module>
      2 sample_data_raw_file = os.path.join(sample_data_folder, 'NihonKohden', 'DA0935F5.EEG')
      3 
----> 4 raw = mne.io.read_raw_nihon(sample_data_raw_file)

~/anacondalele/envs/mne/lib/python3.8/site-packages/mne/io/nihon/nihon.py in read_raw_nihon(fname, preload, verbose)
     44     mne.io.Raw : Documentation of attribute and methods.
     45     """
---> 46     return RawNihon(fname, preload, verbose)
     47 
     48 

<decorator-gen-224> in __init__(self, fname, preload, verbose)

~/anacondalele/envs/mne/lib/python3.8/site-packages/mne/io/nihon/nihon.py in __init__(self, fname, preload, verbose)
    352 
    353         # Get annotations from LOG file
--> 354         annots = _read_nihon_annotations(fname, orig_time=info['meas_date'])
    355         self.set_annotations(annots)
    356 

~/anacondalele/envs/mne/lib/python3.8/site-packages/mne/io/nihon/nihon.py in _read_nihon_annotations(fname, orig_time)
    247             n_logs = np.fromfile(fid, np.uint8, 1)[0]
    248             fid.seek(t_blk_address + 0x14)
--> 249             t_logs = np.fromfile(fid, '|S45', n_logs).astype('U45')
    250 
    251             for t_log in t_logs:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 7: ordinal not in range(128)
@welcome
Copy link

welcome bot commented Apr 8, 2021

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️

@cbrnr
Copy link
Contributor

cbrnr commented Apr 8, 2021

I'm not really familiar with the Nihon Kohden format, but the error occurs because the reader tries to decode a non-ASCII character:

np.fromfile(fid, '|S45', n_logs)

I'm not sure why it is first decoded as ASCII and then converted to Unicode, because changing the whole line 249 to

t_logs = np.fromfile(fid, '|U45', n_logs)

would probably solve the problem. But again, I don't know if non-ASCII characters are allowed by the format.

@fraimondo @larsoner any idea?

@cbrnr
Copy link
Contributor

cbrnr commented Apr 9, 2021

Also, @leticiabragas2 the file DA0935F5.EEG is not part of the MNE testing data sets - is this a file you recorded yourself?

@cbrnr
Copy link
Contributor

cbrnr commented Apr 9, 2021

Can you share all files that are part of the DA0935F5 data set? Specifically, I need the EEG data file (extension .EEG).

@MatthiasEb
Copy link
Contributor

Hi, I'm having the same issue. Indeed, it is related to non-ASCII characters in the annotations. I attached Examplefiles.zip. EDFBrowser converts without issues, so it doesn't seem to be an issue of the .EEG format itself.

The suggestion:
t_logs = np.fromfile(fid, '|U45', n_logs)
proposed by @cbrnr does not work, it raises another ValueError related to encoding, the decoding via ASCII and cast to utf-8 seems to be necessary.

As there is no way to suppress loading the annotations, the only way for me to load these files for now is to remove the annotations from the folder.

@agramfort
Copy link
Member

see #9384

@fraimondo
Copy link
Contributor

I don't know if non-ASCII characters are allowed by the format.

For EDF+, this is the text from the EDF+ specification (https://www.edfplus.info/specs/edfplus.html#header):
"These annotations may only contain UCS characters (ISO 10646, the 'Universal Character Set', which is identical to the Unicode version 3+ character set) encoded by UTF-8."

The problem with NK files is that we do not have the specification. I've seen this problem with many formats too. #9384 might fix it, until a new encoding comes.

@MatthiasEb
Copy link
Contributor

Thanks for the prompt responses.

#9384 might fix it, until a new encoding comes.

I have absolutely no idea how regularly the NK files might change. If you think that this might be practically hard to maintain, at least a flag to suppress the automatic reading of the annotations would be reasonable, I guess.

@agramfort
Copy link
Member

agramfort commented May 10, 2021 via email

@MatthiasEb
Copy link
Contributor

feel free to take over. I tried to provide a way out.

...and I'm really grateful for that. I've never contributed to anything and have limited time myself at the moment. I can try, but this might take a while and I might need some guidance...

@fraimondo
Copy link
Contributor

fraimondo commented May 10, 2021 via email

@MatthiasEb
Copy link
Contributor

Ah. Got it. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants