Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't recognize international file names #1016

Closed
aparamon opened this issue Mar 19, 2018 · 6 comments
Closed

Doesn't recognize international file names #1016

aparamon opened this issue Mar 19, 2018 · 6 comments

Comments

@aparamon
Copy link
Member

To reproduce:

>>> import h5py
>>> print(h5py.version.info)
Summary of the h5py configuration
---------------------------------

h5py    2.8.0rc1
HDF5    1.10.1
Python  3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
sys.platform    win32
sys.maxsize     9223372036854775807
numpy   1.14.2

>>> print(h5py._hl.compat.WINDOWS_ENCODING)
mbcs

Pass international file name:

>>> f = h5py.File('тест.hdf5', 'a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\site-packages\h5py\_hl\files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "C:\ProgramData\Anaconda3\lib\site-packages\h5py\_hl\files.py", line 154, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 78, in h5py.h5f.open
  File "h5py\defs.pyx", line 621, in h5py.defs.H5Fopen
  File "h5py\_errors.pyx", line 123, in h5py._errors.set_exception
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf2 in position 29: invalid continuation byte

Encoded as UTF-8?

>>> f = h5py.File('тест.hdf5'.encode('utf-8'), 'a')
>>> print(f.filename)
тест.hdf5

It appears that h5py doesn't recognize non-ASCII file names.

@aragilar
Copy link
Member

Yeah, this is a limitation of HDF5, see #839.

@aparamon
Copy link
Member Author

aparamon commented Apr 18, 2018

Hmm, but in this case, filename 'тест.hdf5' can definitely be encoded with h5py._hl.compat.WINDOWS_ENCODING and taken by hdf5.dll as ANSI string.
While general problem resides in the limitation of underlying HDF5 library interface indeed, this particular issue is on h5py side.

@aragilar
Copy link
Member

Ah, sorry, I didn't properly look at the error message, the problem is the error handling code (which assumes utf-8 output, which is here). Assuming python has correctly detected your locale (and it's properly set such that the filename can be encoded), then once we fix the error handling, then this should be fine (though why an error is being thrown, I'm unsure).

@aragilar
Copy link
Member

@aparamon #1025 should fix this, can you test this (I don't have easy access to windows)?

@aparamon
Copy link
Member Author

It seems I need to build from the sources, which on Windows is a major pain...
Is there a way I pre-test on existing binaries?

@aragilar
Copy link
Member

Sorry, no (when I setup the appveyor ci, I did try to create wheels, but I ran into problems, don't remember what though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants