Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDF / EDF+ header I/O #11602

Open
DimitriPapadopoulos opened this issue Mar 27, 2023 · 9 comments
Open

EDF / EDF+ header I/O #11602

DimitriPapadopoulos opened this issue Mar 27, 2023 · 9 comments
Labels

Comments

@DimitriPapadopoulos
Copy link
Contributor

DimitriPapadopoulos commented Mar 27, 2023

Description of the problem

The header I/O code in mne.io.edf._read_edf_header() does not look entirely correct to me.

Patient birthdate in 'local patient identification' field

According to 2.1.3. Additional specifications in EDF+:

birthdate in dd-MMM-yyyy format using the English 3-character abbreviations of the month in capitals.

Therefore, this code is incorrect:

birthdate = datetime.strptime(id_info[2], "%d-%b-%Y")

Indeed, %b stands for Month as locale’s abbreviated name, so it doesn't fit the English 3-character abbreviations of the month. It is perfectly OK to support locale month names for leniency and backwards compatibility, but the code should mainly support JAN, FEB, MAR, etc.

Check for 'Startdate' in 'local recording identification'

According to 2.1.3. Additional specifications in EDF+:

4. The 'local recording identification' field must start with the subfields (subfields do not contain, but are separated by, spaces):
- The text 'Startdate'.

The following code should check for Startdate:

if len(rec_info) == 5:

Typically:

        if len(rec_info) == 5 and rec_info[0] == 'Startdate':

The code could only emit a warning for leniency and backwards compatibility.

Startdate in 'local recording identification' field

Same as the birthdate above. According to 2.1.3. Additional specifications in EDF+:

The startdate itself in dd-MMM-yyyy format using the English 3-character abbreviations of the month in capitals.

Therefore, this code is incorrect:

startdate = datetime.strptime(rec_info[1], "%d-%b-%Y")

Indeed, %b stands for Month as locale’s abbreviated name, so it doesn't fit the English 3-character abbreviations of the month. It is perfectly OK to support locale month names for leniency and backwards compatibility, but the code should mainly support JAN, FEB, MAR and so on.


Additional information

EDF and EDF+ specifications

@DimitriPapadopoulos
Copy link
Contributor Author

The date stuff can be fixed by using Babel, or perhaps temporarily switching the locale to avoid an additional dependency.

@cbrnr
Copy link
Contributor

cbrnr commented Mar 27, 2023

I'm fine with the proposed changes, but please only raise a warning to avoid being stricter than before. I would check if any month abbreviations are English and warn if they are in some other language. If possible don't use any dependency.

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Mar 27, 2023

It appears Python itself is buggy. From the documentation:

Directive Meaning Example Notes
%b Month as locale’s abbreviated name. Jan, Feb, …, Dec (en_US);
Jan, Feb, …, Dez (de_DE)
(1)

However, under a French system locale:

>>> print(date(2023, 2, 1).strftime("%d-%b-%Y"))
01-Feb-2023
>>> 

instead of something like this:

>>> print(date(2023, 2, 1).strftime("%d-%b-%Y"))
01-Févr.-2023
>>> 

I need to create a bug against Python. Then I'll double-check how Python behaves in practice to address this – or not.

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Mar 27, 2023

Not a Python bug after all. The locale documentation states:

The locale module is implemented on top of the _locale module, which in turn uses an ANSI C locale implementation if available.

So, the default locale in Python is ANSI C, not the system locale. This, of course, is a good thing for reproducibility, but slightly disturbing. This is especially true when you're attempting to debug I/O on EDF files that are not compliant – I am attempting to read EDF files that contain Dutch month abbreviations, and the date is not read. What about adding a warning in that case?

@DimitriPapadopoulos
Copy link
Contributor Author

And what about (temporarily) forcing the locale in case MNE is used as a library from 3rd-party code that may have set the locale?

@cbrnr
Copy link
Contributor

cbrnr commented Mar 27, 2023

It isn't really important for this issue, because the date is supposed to use digits anyway.

@cbrnr
Copy link
Contributor

cbrnr commented Mar 27, 2023

Ah sorry, it is supposed to use the English three-letter abbreviations. What about checking against this set of 12 names manually?

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Mar 27, 2023

It works as it is, unless someone changes the Python locale. In the short therm, I feel it is more important to log warnings when reading non-compliant EDF+ files.

@cbrnr
Copy link
Contributor

cbrnr commented Mar 27, 2023

Yes, we should definitely raise warnings when encountering non-standard EDF files, at least for situations that currently work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants