Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Files with non ascii filenames
Note: This reflects my current understanding of this issue and may be incorrect or incomplete – feel free to improve!
TL;DR: Avoid non-ascii filenames inside zip archives if possible, especially if working cross-platform.
The zip format stores filenames as sequence of bytes – it’s up to the zip handling tool how to interpret the names. Most modern OS use UTF-8 for filename encoding. However, Windows traditionally doesn’t (at least for contents of zip archives).
So, if you create a zip archive containing a file named
ä.txt on Mac OS or Linux and then extract this archive on Windows (using Windows’ native support for “compressed folders”, not an external tool like WinZip), it will produce a file named something like
├ñ.txt (the exact name depends on the current codepage). Microsoft has released a hotfix for Windows 7 that is intended to fix this bug, but few computers with Windows 7 will actually have that hotfix installed.
Recent versions of WinZip (I tested with v17.5) and 7-zip (tested with 9.20) seem to handle those filenames correct, though. (This probably is also true for other tools, like WinRar etc., but I did not test any of those.)
Unicode filenames and the general purpose flags field
Recent versions of the zip format specification support unicode filenames explicitly:
Names must be encoded in UTF-8, and the 11th bit in the general purpose flags field (2 bytes at offset 6) must be set. –Source
Since 317fdd0, which was released at v1.0.0.beta1, you can tell RubyZip to set this flag for unicode filenames with
Zip.unicode_names = true. This makes it possible to extract files with non-ascii filenames on Windows 8 without any external tools. Unfortunately, this does not work for older versions (Windows 7, Windows Vista, Windows XP) – they just seem to ignore this flag.
Avoid non-ascii filenames in zip archives, if possible. If you really must use them, and you’re creating the archives on Mac OS X or Linux with a recent RubyZip version, do set
Zip.unicode_names = true. Extracting those archives will work correctly on Mac OS X, Linux and Windows 8. It will not work on older versions of Windows, unless an external tool like WinZip is used.