Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Support the entire set of filename on Windows #839
This is related to #818, but is for tracking a complete solution to how filenames are dealt with on Windows, which we currently don't have. Note that this going to be a breaking change, but fairly limited (users of external links and VDS on Windows). A summary of the issues can be found in https://www.python.org/dev/peps/pep-0528/ and https://www.python.org/dev/peps/pep-0529/, which caused this issue to come up again. A discussion of unicode and its encoding, especially on Windows, can be found at http://utf8everywhere.org/.
The ANSI interface, similar to Unix-like locale, uses the concept of a current encoding (which for python is represented by the mbcs encoding), which varies from system to system, but is almost never one of the utf-* encodings (instead being latin-1, or being an encoding native to the user's language). This means most filepaths cannot be represented on a single system. This creates interoperability problems which are sometimes hard to diagnose.
HDF5, most likely due to its Unix-heritage, has only supported the ANSI interfaces (or rather, has not explicitly added support for. HDF5 has also previously deemed unicode filesystem support out of scope (https://support.hdfgroup.org/HDF5/doc1.8/Advanced/UsingUnicode/index.html, see Filenames under Caveats, Pitfalls, and Things to Watch For). While this does create problems, see e.g. https://tschoonj.github.io/blog/2014/11/06/hdf5-on-windows-utf-8-filenames-support/ and the other links in #818 (comment), the major problem comes with external links (and VDS, but that is newer and probably less widely used at the moment), which do not specify what encoding the filename is in.