Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error processing archives with non-english characters in the names of archived files/folders #114

Closed
unxed opened this issue Sep 28, 2016 · 18 comments

Comments

@unxed
Copy link
Contributor

unxed commented Sep 28, 2016

sample archive (created on linux) included. listed ok, but unpacking fails.

the other sample archive (created on windows) lists with garbadge in file names, and also does not unpacks.

@unxed
Copy link
Contributor Author

unxed commented Sep 28, 2016

list_ok_unpack_fails.zip

@unxed
Copy link
Contributor Author

unxed commented Sep 28, 2016

Desktop.zip

@unxed
Copy link
Contributor Author

unxed commented Sep 28, 2016

and this sample (created by far2l) just crashes far2l

2g0.ru/test.7z

@unxed
Copy link
Contributor Author

unxed commented Sep 28, 2016

same problem then trying to extract a folder with english name from zip archive.
guess there is two independent problems.
test.zip

(try to browse inside archive and copy the folder "test" anywhere)

@unxed
Copy link
Contributor Author

unxed commented Sep 28, 2016

found the source of the last problem. see:

/home/unxed$ unzip -o  /home/unxed/Downloads/list_ok_unpack_fails.zip "проверка/*.*" -d . 
TIP: If you feel stuck - use Ctrl+Alt+C to terminate everything in this shell.            
Archive:  /home/unxed/Downloads/list_ok_unpack_fails.zip                                  
caution: filename not matched:  проверка/*.*                                              
/home/unxed$ unzip -o  /home/unxed/Downloads/list_ok_unpack_fails.zip "проверка/*" -d .   
Archive:  /home/unxed/Downloads/list_ok_unpack_fails.zip                                  

*.* does not mean "any file" on linux, so unzip can not find anything matching *.* in empty folder and skips extracting it.

*.* should be replaced by * on linux I guess

see #121

@unxed
Copy link
Contributor Author

unxed commented Sep 28, 2016

about Desktop.zip - I see filenames stored in 866 code page in archive created on widows. is there any way to parse whose archives correctly? unzip itself correctly unpacks whose in utf-8 environment

@unxed
Copy link
Contributor Author

unxed commented Sep 28, 2016

guess the file list code page may be detected from ZipOS variable, see multiarc/formats/zip/zip.cpp line 289

@unxed
Copy link
Contributor Author

unxed commented Sep 28, 2016

zip.cpp.zip

fixed this with some shit and sticks using wine code. unfortunately trying to push returns 403, so here is modified zip.cpp

at least it works ok with my test cases

used some wine's code from here: https://github.com/wine-mirror/wine/blob/master/dlls/user32/lstr.c

@unxed
Copy link
Contributor Author

unxed commented Sep 29, 2016

btw, ANSI->OEM conversion may still be required with ZipHeader.PackVer>20 && ZipHeader.PackVer<25

@unxed
Copy link
Contributor Author

unxed commented Sep 29, 2016

fixed, see version included.
zip.cpp.zip

@unxed
Copy link
Contributor Author

unxed commented Sep 29, 2016

and this sample (created by far2l) just crashes far2l
2g0.ru/test.7z

more testing shows far crashing on 7z archives without any non-empty files

commenting
Item->PackSizeHigh = packed_size & 0xffffffff;
Item->PackSize = (packed_size >> 32) & 0xffffffff;
in 7z.cpp fixes this behavour

see #120

@unxed unxed changed the title error porcessing archives with non-english characters in the names of archived files/folders error processing archives with non-english characters in the names of archived files/folders Sep 29, 2016
@unxed
Copy link
Contributor Author

unxed commented Sep 29, 2016

moved some issues from here to separate tickets. they seem to be non-charset releated.

@unxed
Copy link
Contributor Author

unxed commented Sep 29, 2016

upd: the method of encoding detection I used fails on some utf8 archves created on windows. example:
23-10-2012-b-fasi-eaep.zip

maybe utf-8 encoded file name extra field from file header may be used to detect such cases, but multiarc does not currently support it.
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
4.6.8 -Info-ZIP Unicode Comment Extra Field (0x6375)

@unxed
Copy link
Contributor Author

unxed commented Sep 29, 2016

retested with master build. still see far showing
Б' ФАСЖ ПД06 СХОКДИА ДАДП (ИМТ).xls
inside archive instead of
Β' ΦΑΣΗ ΠΕ06 ΣΧΟΛΕΙΑ ΕΑΕΠ (ΙΝΤ).xls
as it should.

this archive stores file names two times: as native zip format suggests (but in utf8 form that is uncommon for windows archivers afaik; but, as archive is created on windows, my code assumes it has OEM charset) and in the other field also in utf-8 form as suggested for storing unicode file names in never versions of format (but our current code ignores this field).

as far lists archive content differently, resulting unzip command becomes incorrect so files can not be extracted.

@elfmz
Copy link
Owner

elfmz commented Sep 29, 2016

yep I've just did not notice difference from 1st view..

@unxed
Copy link
Contributor Author

unxed commented Sep 29, 2016

expected behavour for multiarc is to look inside utf8 extended header field, and, if it is not present or empty, fall back to logic we currently have.

@unxed
Copy link
Contributor Author

unxed commented Sep 29, 2016

btw, see https://github.com/elfmz/far2l/files/499990/zip.cpp.zip
it is updated against master but adds some intellegence from original code (to get round some older windows zip implementations which wrote file names in ANSI charset, as I can guess)

@unxed
Copy link
Contributor Author

unxed commented Sep 29, 2016

Saw ANSI code merged, thanks. Btw, this ticket had become too complicated and hard to read or understand. #122 for the remaining issue. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants