Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode filenames do not work in Windows #84

Closed
noniq opened this issue Jul 24, 2013 · 11 comments
Closed

Unicode filenames do not work in Windows #84

noniq opened this issue Jul 24, 2013 · 11 comments

Comments

@noniq
Copy link

noniq commented Jul 24, 2013

I’m using Rubyzip 0.9.9 to create a zip archives containing filenames with accented characters on Mac OS and Linux computers. When I then extract those archives on Windows computers (using WinRar, 7-Zip or Windows’ built-in support for “compressed folders”), the filenames get mangled – most accented characters are replaced by 2 other characters, so it looks like the filenames are encoded as UTF-8 but interpreted by Windows as single byte encoding.

I found #3, but as far as I see, this pull request has never been merged.

Is this a known issue? Are there any workarounds?

If it’s possible to fix this with a sane amount of work I’d be glad to help – any pointers appreciated! 😄

@noniq
Copy link
Author

noniq commented Jul 25, 2013

For the record: I’m now using the following workaround (as Rails initializer) and so far everything works well:

module FixRubyzipUnicodeFilenames
  # Language encoding flag (EFS) bit
  EFS = 0b100000000000

  def initialize(*)
    super
    # Make sure the EFS flag is set, so that unicode filenames are decoded correctly in Windows.
    # “Compressed folders” in Windows 7 and newer support unicode filenames natively; for older
    # versions of Windows (Vista and below) a recent version of WinZip, 7-Zip, … is needed.
    @gp_flags |= EFS
  end
end

module Zip
  class ZipEntry
    prepend ::FixRubyzipUnicodeFilenames
  end
end

So in fact I’m just always setting the EFS flag when creating an archive – are there any downsides of this approach? (Note that at least Mac OS’ native zip support seems to ignore this flag at all and always interprets filenames as UTF-8).

UPDATE: This seems to work correctly only on Windows 8 – Win XP and Windows 7 still mangle the filenames.

@simonoff
Copy link
Member

Hi! #3 was merged.
Unicode filenames supported. But i'm not tested it on all version's of windows. Win 7 works fine for me.
Win XP is deprecated. Windows Vista i suppose too.
Can you make an patch with setting what will enable this EFS bit or not?

@noniq
Copy link
Author

noniq commented Aug 14, 2013

Strange that it seems to work for you … I’d like to understand the problem in full before preparing a patch: Could you please try the following script and check if extracting the generated archive on Windows really gives you a file named äöü.txt?

# encoding: utf-8
require "rubygems"
gem "rubyzip"
require "zip/zip"
require "tempfile"

ARCHIVE = "accented-characters-test.zip"
FileUtils.rm_f(ARCHIVE)
file = Tempfile.new("test")
Zip::ZipFile.open(ARCHIVE, Zip::ZipFile::CREATE) do |zip|
  zip.add("äöü.txt", file.path)
end
puts "Successfully created #{ARCHIVE} (Rubyzip #{Zip::VERSION}, Ruby #{RUBY_VERSION})"

For me it results in:
bildschirmfoto 2013-08-14 um 11 42 29
Screenshot taken on Windows Vista with Windows’ native “compressed folders” app, but it looks the same on Windows 7 (both with “compressed folders” and WinRAR).

I ran the script twice: On Mac OS X using Rubyzip 0.9.7 and Ruby 2.0.0, and on Ubuntu using Rubyzip 0.9.5 and Ruby 1.8.7. The results were identical, so the problem seems not to depend on the Ruby version or OS the archive is created with.

@simonoff
Copy link
Member

Are you tried last version of rubyzip?

Because as i see you using old versions.

WBR, Alexander Simonov
Web developer & High-load application deployer
Web Site: http://simonov.me
E-Mail: alex@simonov.me

On Aug 14, 2013, at 1:17 PM, Stefan Daschek notifications@github.com wrote:

Strange that it seems to work for you … I’d like to understand the problem in full before preparing a patch: Could you please try the following script and check if extracting the generated archive on Windows really gives you a file named äöü.txt?

encoding: utf-8

require "rubygems"
gem "rubyzip"
require "zip/zip"
require "tempfile"

ARCHIVE = "accented-characters-test.zip"
FileUtils.rm_f(ARCHIVE)
file = Tempfile.new("test")
Zip::ZipFile.open(ARCHIVE, Zip::ZipFile::CREATE) do |zip|
zip.add("äöü.txt", file.path)
end
puts "Successfully created #{ARCHIVE} (Rubyzip #{Zip::VERSION}, Ruby #{RUBY_VERSION})"
For me it results in:

Screenshot taken on Windows Vista with Windows’ native “compressed folders” app, but it looks the same on Windows 7 (both with “compressed folders” and WinRAR).

I ran the script twice: On Mac OS X using Rubyzip 0.9.7 and Ruby 2.0.0, and on Ubuntu using Rubyzip 0.9.5 and Ruby 1.8.7. The results were identical, so the problem seems not to depend on the Ruby version or OS the archive is created with.


Reply to this email directly or view it on GitHub.

@noniq
Copy link
Author

noniq commented Aug 14, 2013

You’re right, I should have checked this in advance … However, I now tested again with RubyZip 0.9.9 and also with current master (identifies itself as version 0.9.10) – and the problem remains.

@simonoff
Copy link
Member

I made a few changes on the master. Can you try it again?
To enable support for old clients and set unicode support flag use next code

Zip.unicode_names = true

@noniq
Copy link
Author

noniq commented Aug 19, 2013

I tried again with current master and Zip.unicode_names = true. Results are as follows:

  • Using a recent version of WinZip (17.5): Filenames are correct on all Windows versions (8, 7, XP).
  • Using Windows’ native support for “compressed folders”: Filenames are correct on Windows 8, still incorrect on Windows 7 and Windows XP

Interestingly enough, WinZip seems to get the filenames right even if the unicode flag is not set. But the tooltip when hovering over the icon displays the correct filename only if the flag is set:
bildschirmfoto 2013-08-19 um 11 38 57 copy

I’m not really sure what’s the conclusion of all this, other than warning RubyZip users that using filenames with accented characters might not be a good idea at all … sigh

Do you consider enabling the flag by default in future releases of RubyZip? Could there be any negative effects?

@simonoff
Copy link
Member

Using non-ASCII locales in any case is a bad idea.
This localazied names broken extracting mostly on the Windows < 8.
And i don't think what Windows is a requirement to change default setting. Because if you setting unicode names when you need to be sure what name string is ONLY unicode string, not latin1 + umlauts for example.

@noniq
Copy link
Author

noniq commented Aug 19, 2013

You’re right. What to you think about creating a page in the RubyZip wiki here on GitHub, containing an overview and summary of the problems with non-ascii filenames? I’d be happy to do a first draft, if you agree.

@simonoff
Copy link
Member

Yes it would be greate if you will do it.
Thank you!

@noniq
Copy link
Author

noniq commented Aug 19, 2013

See https://github.com/rubyzip/rubyzip/wiki/Files-with-non-ascii-filenames – feel free to improve :-)

I’m closing this issue, as there is no real fix. Thanks for adding Zip.unicode_names = true!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants