Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zip: use "utf-8 filename" extended header field for file names if present #122

Closed
unxed opened this issue Sep 29, 2016 · 0 comments
Closed

Comments

@unxed
Copy link
Contributor

unxed commented Sep 29, 2016

sample archive that lists incorrectly:
23-10-2012-b-fasi-eaep.zip

far shows it's content like
Б' ФАСЖ ПД06 СХОКДИА ДАДП (ИМТ).xls
instead of
Β' ΦΑΣΗ ΠΕ06 ΣΧΟΛΕΙΑ ΕΑΕΠ (ΙΝΤ).xls
as it should.

this may be fixed by parsing extended header of zip archive and using "utf8 filename" field if present (falling back to our current logic if no).

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
See 4.6.9 -Info-ZIP Unicode Path Extra Field (0x7075)

See also APPENDIX D - Language Encoding (EFS)
Reading this you should remember that "IBM Code Page 437" actually means "selected OEM code page" in practice. So to decode legacy file names in archive correctly we should read them using OEM code page corresponding to currently selected locale (assuming archive was created on a PC with the same locale selected).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant