Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch edflib.c on Windows to enable opening UTF-8 encoded paths #237

Merged
merged 2 commits into from
Nov 9, 2023

Conversation

myd7349
Copy link
Contributor

@myd7349 myd7349 commented Sep 4, 2023

This should fix file opening issue with non-ASCII characters in paths when local encoding is not UTF-8.

This should fix file opening issue with non-ASCII characters in paths
when local encoding is not UTF-8.
@skjerns
Copy link
Collaborator

skjerns commented Sep 5, 2023

that looks good :) could you also write a test case that tests for the new behaviour?

@myd7349
Copy link
Contributor Author

myd7349 commented Sep 5, 2023

Hi! @skjerns I have tested this PR on this branch: https://github.com/myd7349/pyedflib/commits/ci

And it seems that the current test cases are able to cover the changes made in this PR.

Without changes made by this PR:

https://github.com/myd7349/pyedflib/actions/runs/6086619850/job/16513382367

We can see that there are a lot of UserWarnings caused by Unicode path (windows-2019 Python 3.9):

pyedflib\tests\test_edfreader.py ..............                          [ 25%]
pyedflib\tests\test_edfwriter.py ...........................             [ 73%]
pyedflib\tests\test_highlevel.py ...............                         [100%]

============================== warnings summary ===============================
pyedflib/tests/test_edfreader.py::TestEdfReader::test_BdfReader_Read_accented_file
pyedflib/tests/test_edfreader.py::TestEdfReader::test_read_incorrect_file
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:224: UserWarning: Attempting to write Unicode file D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_file_��'���.bdf on Windows. Consider changing your locale to UTF8.
    self.handle = open_file_writeonly(self.path, self.file_type, self.n_channels)

pyedflib/tests/test_edfreader.py: 12 warnings
pyedflib/tests/test_edfwriter.py: 626 warnings
pyedflib/tests/test_highlevel.py: 590 warnings
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:953: DeprecationWarning: `sample_rate` is deprecated and will be removed in a future release.                           Please use `sample_frequency` instead
    warnings.warn("`sample_rate` is deprecated and will be removed in a future release. \

pyedflib/tests/test_edfreader.py::TestEdfReader::test_BdfReader_Read_accented_file
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfreader.py:320: UserWarning: the filename D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_file_��'���.bdf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) 
    f = pyedflib.EdfReader(self.bdf_accented_file, pyedflib.READ_ALL_ANNOTATIONS, pyedflib.DO_NOT_CHECK_FILE_SIZE)

pyedflib/tests/test_edfreader.py::TestEdfReader::test_EdfReader_Legacy_Header_Info
  C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\case.py:550: DeprecationWarning: Variable 'gender' is deprecated, use 'sex' instead.
    method()

pyedflib/tests/test_edfreader.py::TestEdfReader::test_EdfReader_headerInfos
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfreader.py:182: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    np.testing.assert_equal(f.getGender(), 'Male')  # deprecated

pyedflib/tests/test_edfreader.py::TestEdfReader::test_read_incorrect_file
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfreader.py:368: UserWarning: the filename D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_file_��'���.bdf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) 
    f = pyedflib.EdfReader(self.bdf_accented_file)

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:801: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    self.assertEqual(f.getGender(), expected,

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:808: DeprecationWarning: Function 'setGender' is deprecated, use 'setSex' instead.
    f.setGender(sex)  # deprecated

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:822: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    self.assertEqual(f.getGender(), expected,

pyedflib/tests/test_edfwriter.py: 1 warning
pyedflib/tests/test_highlevel.py: 14 warnings
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:800: DeprecationWarning: The 'sample_rate' parameter is deprecated. Please use 'sample_frequency' instead.
    warnings.warn("The 'sample_rate' parameter is deprecated. Please use "

pyedflib/tests/test_highlevel.py::TestHighLevel::test_fortran_write
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:811: UserWarning: signals are in Fortran order. Will automatically transfer to C order for compatibility with edflib.
    warnings.warn('signals are in Fortran order. Will automatically '

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_unicode
  D:\\a\\pyedflib\\pyedflib\\pyedflib\\highlevel.py:354: UserWarning: the filename D:\\a\\pyedflib\\pyedflib\\pyedflib\\tests\\data\\tmp_utf8-\u4e2d\u6587\u017a\u0105\u015f\u3186\uc6b4\u02b7\u1a04\u2161\u0259\u041f\u0440\U0001f916.edf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) \n    with pyedflib.EdfReader(edf_file) as f:

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_unicode_at_start
  D:\\a\\pyedflib\\pyedflib\\pyedflib\\edfwriter.py:224: UserWarning: Attempting to write Unicode file D:\\a\\pyedflib\\pyedflib\\pyedflib\\tests\\data\\\u4e2d\u6587\u017a\u0105\u015f\u3186\uc6b4\u02b7\u1a04\u2161\u0259\u041f\u0440\U0001f916.edf on Windows. Consider changing your locale to UTF8.\n    self.handle = open_file_writeonly(self.path, self.file_type, self.n_channels)

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_unicode_at_start
  D:\\a\\pyedflib\\pyedflib\\pyedflib\\highlevel.py:354: UserWarning: the filename D:\\a\\pyedflib\\pyedflib\\pyedflib\\tests\\data\\\u4e2d\u6587\u017a\u0105\u015f\u3186\uc6b4\u02b7\u1a04\u2161\u0259\u041f\u0440\U0001f916.edf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) \n    with pyedflib.EdfReader(edf_file) as f:

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_write_accented
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:224: UserWarning: Attempting to write Unicode file D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_��'���.edf on Windows. Consider changing your locale to UTF8.
    self.handle = open_file_writeonly(self.path, self.file_type, self.n_channels)

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_write_accented
  D:\a\pyedflib\pyedflib\pyedflib\highlevel.py:354: UserWarning: the filename D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_��'���.edf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) 
    with pyedflib.EdfReader(edf_file) as f:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

With this PR, https://github.com/myd7349/pyedflib/actions/runs/6086686496/job/16513596128, we can see that those UserWarning caused by Unicode path on Windows are gone (windows-2019 Python3.9):

pyedflib\tests\test_edfreader.py ..............                          [ 25%]
pyedflib\tests\test_edfwriter.py ...........................             [ 73%]
pyedflib\tests\test_highlevel.py ...............                         [100%]

============================== warnings summary ===============================
pyedflib/tests/test_edfreader.py: 12 warnings
pyedflib/tests/test_edfwriter.py: 626 warnings
pyedflib/tests/test_highlevel.py: 590 warnings
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:953: DeprecationWarning: `sample_rate` is deprecated and will be removed in a future release.                           Please use `sample_frequency` instead
    warnings.warn("`sample_rate` is deprecated and will be removed in a future release. \

pyedflib/tests/test_edfreader.py::TestEdfReader::test_EdfReader_Legacy_Header_Info
  C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\case.py:550: DeprecationWarning: Variable 'gender' is deprecated, use 'sex' instead.
    method()

pyedflib/tests/test_edfreader.py::TestEdfReader::test_EdfReader_headerInfos
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfreader.py:182: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    np.testing.assert_equal(f.getGender(), 'Male')  # deprecated

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:801: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    self.assertEqual(f.getGender(), expected,

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:808: DeprecationWarning: Function 'setGender' is deprecated, use 'setSex' instead.
    f.setGender(sex)  # deprecated

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:822: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    self.assertEqual(f.getGender(), expected,

pyedflib/tests/test_edfwriter.py: 1 warning
pyedflib/tests/test_highlevel.py: 14 warnings
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:800: DeprecationWarning: The 'sample_rate' parameter is deprecated. Please use 'sample_frequency' instead.
    warnings.warn("The 'sample_rate' parameter is deprecated. Please use "

pyedflib/tests/test_highlevel.py::TestHighLevel::test_fortran_write
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:811: UserWarning: signals are in Fortran order. Will automatically transfer to C order for compatibility with edflib.
    warnings.warn('signals are in Fortran order. Will automatically '

@myd7349
Copy link
Contributor Author

myd7349 commented Sep 5, 2023

In the CyEdfReader.open function, it first attempts to encode the file name as utf-8 and then passes it to the EDF file reading interface provided by edflib. If opening the file fails, it will then attempt to use get_short_file_name. However, in c_edf.edfopen_file_writeonly, it first tries get_short_file_name and then attempts utf-8 encoding. To make use of the function interfaces that support utf-8 paths, which we have patched, I have removed the get_short_file_name from the latter, while the corresponding logic in the former has been retained.

@skjerns skjerns merged commit 51f4db8 into holgern:master Nov 9, 2023
20 checks passed
@skjerns
Copy link
Collaborator

skjerns commented Nov 9, 2023

thanks for your addition :)

@myd7349 myd7349 deleted the fopen-utf8 branch December 3, 2023 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants