zip: use "utf-8 filename" extended header field for file names if present #122

unxed · 2016-09-29T21:58:45Z

sample archive that lists incorrectly:
23-10-2012-b-fasi-eaep.zip

far shows it's content like
Б' ФАСЖ ПД06 СХОКДИА ДАДП (ИМТ).xls
instead of
Β' ΦΑΣΗ ΠΕ06 ΣΧΟΛΕΙΑ ΕΑΕΠ (ΙΝΤ).xls
as it should.

this may be fixed by parsing extended header of zip archive and using "utf8 filename" field if present (falling back to our current logic if no).

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
See 4.6.9 -Info-ZIP Unicode Path Extra Field (0x7075)

See also APPENDIX D - Language Encoding (EFS)
Reading this you should remember that "IBM Code Page 437" actually means "selected OEM code page" in practice. So to decode legacy file names in archive correctly we should read them using OEM code page corresponding to currently selected locale (assuming archive was created on a PC with the same locale selected).

unxed mentioned this issue Sep 29, 2016

error processing archives with non-english characters in the names of archived files/folders #114

Closed

elfmz closed this as completed in bb6663d Oct 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zip: use "utf-8 filename" extended header field for file names if present #122

zip: use "utf-8 filename" extended header field for file names if present #122

unxed commented Sep 29, 2016 •

edited

Loading

zip: use "utf-8 filename" extended header field for file names if present #122

zip: use "utf-8 filename" extended header field for file names if present #122

Comments

unxed commented Sep 29, 2016 • edited Loading

unxed commented Sep 29, 2016 •

edited

Loading