Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use unicode by default #515

Open
6temes opened this issue Jan 26, 2022 · 6 comments
Open

Use unicode by default #515

6temes opened this issue Jan 26, 2022 · 6 comments

Comments

@6temes
Copy link

6temes commented Jan 26, 2022

In 2022, all mainstream OSes are currently supporting Unicode for file names.

For users not using English, the current default configuration produces broken filenames when saving files with characters that are not in the English alphabet.

This is easier fixable by using this configuration:

Zip.unicode_names = true

Zip.force_entry_names_encoding = 'UTF-8'

But it would be probably a better experience if the library was configured to work with Unicode by default.

@hainesr
Copy link
Member

hainesr commented Jan 26, 2022

Hi, thanks for raising this.

I think rubyzip doesn't actually support encodings properly at all, when you compare it to the Zip specification 😟 What should really happen is that by default filenames and comments should be stored in IBM Code Page 437 unless the EFS bit is set, in which case it should be UTF-8. Those are the only two options allowed for in the Zip spec.

However, I think rubyzip just uses and assumes UTF-8 but then doesn't actually store the correct bytesize in the headers. So if you were to use non-ASCII characters I think it would completely mess it all up.

(As an aside, I think Zip.force_entry_names_encoding was added to help fix this behaviour, but it's only ever used when reading a zip file, not writing one.)

So this is on my list of things to properly fix. I think I need to properly implement Zip.unicode_names so it does the right thing, and then look at whether having it on by default is a good idea or not - as you suggest, I think it probably does make sense.

@hainesr hainesr added the bug label Jan 26, 2022
@hainesr hainesr self-assigned this Jan 26, 2022
@6temes
Copy link
Author

6temes commented Jan 26, 2022

Thank you @hainesr !

I didn't realize that Zip.force_entry_names_encoding was not doing anything when saving files. I guess that I was just "changing things until it worked". :)

As for Zip.unicode_names, we are using it when compressing files with Japanese names and, so far, it seems to be working ok. I don't know if any edge cases need to be addressed.

@hainesr
Copy link
Member

hainesr commented Jan 27, 2022

I think rubyzip just uses and assumes UTF-8 but then doesn't actually store the correct bytesize in the headers.

I was a little unfair on rubyzip here. It does do the right thing, re bytesize, so I think that's why using Zip.unicode_names is working for you.

I'm trying to work out why Zip.force_entry_names_encoding is there at all - it's before my time as maintainer I think - and I suspect that it's to help when other zip programs are being a bit naughty about filename encodings. Equally it could be because rubyzip never dealt with Unicode quite correctly...

I'll leave this open as a bug to remind me to look at it properly, and I think the documentation certainly needs to be improved too.

@hainesr hainesr added this to the 3.0 milestone Jan 27, 2022
@6temes
Copy link
Author

6temes commented Jan 28, 2022

Just as a reminder, this wiki page should be updated or deleted after v3.0 is released:

https://github.com/rubyzip/rubyzip/wiki/Files-with-non-ascii-filenames

@hainesr
Copy link
Member

hainesr commented Jan 28, 2022

Thanks for that @6temes.

I have to say I don't understand the advice in the final paragraph which says to not use Zip.unicode_names...

@6temes
Copy link
Author

6temes commented Jan 31, 2022

Hi @hainesr !

I was hesitant to delete the old text, so I left it under my update, but I agree with you that I just made everything confusing. :)

I just fixed it.

@hainesr hainesr modified the milestones: 3.0, Future Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants