Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Unbreak gem specification loading if Encoding.default_internal = 'ISO-8859-1' #146

Closed
wants to merge 1 commit into from

5 participants

@jeremyevans

Unbreak gem specification loading if Encoding.default_internal = 'ISO-8859-1' (or other encoding) and there are characters in the gem specification that aren't supported by that encoding. If you just use the :encoding option, that sets the external encoding, and ruby will automatically attempt to convert it to the default internal encoding if it is set and different from the external encoding. Using the :mode option with both encodings set means that no conversion is attempted, allowing you to load gems that contain characters that aren't supported by the default internal encoding.

@jeremyevans

Here's a simple example of the problem. Note how this is not an obscure gem. It would be nice if this could get fixed before ruby 1.9.3 is released.

$ ruby19 -E 'ISO-8859-1:ISO-8859-1' -ve "require 'nokogiri'"                          
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-openbsd]
/usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:89:in `read': U+92F8 from UTF-8 to ISO-8859-1 (Encoding::UndefinedConversionError)
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:89:in `load_specification'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:153:in `block (2 levels) in load_gems_in'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:152:in `each'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:152:in `block in load_gems_in'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:149:in `reverse_each'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:149:in `load_gems_in'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:345:in `refresh!'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:78:in `from_gems_in'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:58:in `from_installed_gems'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:866:in `source_index'
    from /usr/local/lib/ruby/1.9.1/rubygems/gem_path_searcher.rb:81:in `init_gemspecs'
    from /usr/local/lib/ruby/1.9.1/rubygems/gem_path_searcher.rb:13:in `initialize'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:824:in `new'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:824:in `block in searcher'
    from <internal:prelude>:10:in `synchronize'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:823:in `searcher'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:484:in `find_files'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:966:in `load_plugins'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:1136:in `<top (required)>'
    from <internal:lib/rubygems/custom_require>:29:in `require'
    from <internal:lib/rubygems/custom_require>:29:in `require'
    from <internal:gem_prelude>:167:in `load_full_rubygems_library'
    from <internal:gem_prelude>:217:in `try_activate'
    from <internal:lib/rubygems/custom_require>:32:in `rescue in require'
    from <internal:lib/rubygems/custom_require>:29:in `require'
    from -e:1:in `<main>
@luislavena
Collaborator

Hello,

Have you tested against latest ruby_1_9_3 codebase? Asking this because several encoding fixes has been made.

@jeremyevans

It doesn't error out with ISO-8859-1 when requiring anymore with 1.9.3, but that appears to be due to a change in File.read, not in rubygems. Arguably it may even be a bug in File.read in 1.9.3:

$ ruby19 -E 'ISO-8859-1:ISO-8859-1' -ve "p File.read('/usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec', :encoding=>'UTF-8').encoding" 
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
#<Encoding:ISO-8859-1>
$ ruby19 -E 'ISO-8859-1:ISO-8859-1' -ve "p File.read('/usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec', :mode=>'r:UTF-8:-').encoding"  
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
#<Encoding:UTF-8>

So rubygems doesn't break only because File.read no longer raises an error. Arguably, File.read should raise an error because the gemspec file is in UTF-8 format and contains UTF-8 characters that are not representable in ISO-8859-1. Changing the :encoding option to :mode keeps the string in UTF-8 format, which it should as the data is UTF-8.

I'd say this is still a bug in rubygems, as it only happens to work because the gemspec is valid ISO-8859-1. Here's an example with US-ASCII that shows rubygems is still broken in this regard:

$ ruby19 -E 'US-ASCII:US-ASCII' -ve "p File.read('/usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec', :encoding=>'UTF-8').encoding" 
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
-e:1:in `read': U+00E9 from UTF-8 to US-ASCII (Encoding::UndefinedConversionError)
    from -e:1:in `<main>'
$ ruby19 -E 'US-ASCII:US-ASCII' -ve "p File.read('/usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec', :mode=>'r:UTF-8:-').encoding"  
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
#<Encoding:UTF-8>
$ ruby19 -E 'US-ASCII:US-ASCII' -ve "require 'nokogiri'"
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
/usr/local/lib/ruby/1.9.1/rubygems/specification.rb:538:in `read': U+00E9 from UTF-8 to US-ASCII (Encoding::UndefinedConversionError)
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:538:in `load'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:269:in `block (2 levels) in _all'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:268:in `each'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:268:in `block in _all'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:267:in `reverse_each'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:267:in `_all'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:409:in `each'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:441:in `find'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:441:in `find_by_path'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:203:in `try_activate'
    from /usr/local/lib/ruby/1.9.1/rubygems/custom_require.rb:58:in `rescue in require'
    from /usr/local/lib/ruby/1.9.1/rubygems/custom_require.rb:35:in `require'
    from -e:1:in `<main>'
@drbrain drbrain was assigned
@drbrain
Owner

I don't see a test for this ☹

@jeremyevans

It obviously fixes an issue, so do you need one if it doesn't break any existing tests?

I can probably write one, but it'll be a few days before I can get to it.

@drbrain
Owner

It fixes the issue today but without a test there's no guarantee that it will stay fixed tomorrow.

@jeremyevans

OK, I added a test that breaks with the current code and is fixed by the one line patch.

@jeremyevans

It's been over two months since this was originally reported, and it's a one-line change that fixes an obvious bug. The test case was added almost a month ago, and there have been no responses since. Is there a reason this hasn't been merged yet?

@jeremyevans

It's been another month, so bumping this again so it isn't forgotten about.

@jeremyevans

Another month, another bump.

@practicingruby
Collaborator

@drbrain: Any reason why this isn't being pulled in? Looks like @jeremyevans added a test as requested.

@jeremyevans

Another month, another bump.

@practicingruby
Collaborator

@jeremyevans: Since it's been a while, this patch might not apply cleanly. Can you try it against the latest code in master and then update the pull request as necessary? If you do that, I'll merge it as long as the tests are still passing. This way either it'll get fixed or someone will revert it and give you a reason why.

Sorry that it has come to that, there's no reason why you should have waited this long for a response.

@luislavena
Collaborator

@sandal I'm taking care of this right now, will see merge and test and some testing before pushing.

@practicingruby
Collaborator

@luislavena Even better, thanks!

@lsegal

@jeremyevans, your test doesn't actually cause a failure for me without the extra patch using rubygems-master and ruby1.9.2p290. Also, ruby -E 'ISO-8859-1:ISO-8859-1' -ve "require 'nokogiri'" doesn't fail either (in the same Ruby). Am I missing something about the bug?

@luislavena
Collaborator

@jeremyevans after solving the conflicts I get no failure from the test, however, using this test case from your comment

C:\Users\Luis>ruby -E 'US-ASCII:US-ASCII' -ve "p File.read('C:/Users/Luis/Tools/Ruby/ruby-1.9.3-p0-i386-mingw32/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0-x86-mingw32.gemspec', :encoding=>'UTF-8').encoding"
ruby 1.9.3p0 (2011-10-30) [i386-mingw32]
#<Encoding:US-ASCII>

That was supposed to trigger Encoding::UndefinedConversionError, but didn't

Neither does:

C:\Users\Luis>ruby -E 'US-ASCII:US-ASCII' -ve "require 'nokogiri'"
ruby 1.9.3p0 (2011-10-30) [i386-mingw32]

Using RubyGems 1.8.15.

Can you provide more information to recreate this issue with RubyGems 1.8.15?

I'll be around for a few more hours, so please let me know.

Thank you.

@jeremyevans

It appears as though recent changes have hidden this issue. The reason it's no longer easily reproducible is that the gem specification files are now persisted differently than before (maybe in to_ruby_for_cache or something that calls). It used to be that gem specification files were stored with 8-bit characters, now they appear to be stored without 8-bit characters:

# Before
$ hexdump -C /usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec | head -n 25 | tail -n 3
00000160  69 6f 6e 20 3d 20 25 71  7b 4e 6f 6b 6f 67 69 72  |ion = %q{Nokogir|
00000170  69 20 28 e9 8b b8 29 20  69 73 20 61 6e 20 48 54  |i (é.¸) is an HT|
00000180  4d 4c 2c 20 58 4d 4c 2c  20 53 41 58 2c 20 61 6e  |ML, XML, SAX, an|
# Now
$ hexdump -C /usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec | head -n 22 | tail -n 3
00000130  73 63 72 69 70 74 69 6f  6e 20 3d 20 22 4e 6f 6b  |scription = "Nok|
00000140  6f 67 69 72 69 20 28 5c  75 7b 65 39 7d 5c 75 7b  |ogiri (\u{e9}\u{|
00000150  38 62 7d 5c 75 7b 62 38  7d 29 20 69 73 20 61 6e  |8b}\u{b8}) is an|

Whether this is still a problem depends on whether you consider 8-bit characters as valid in a gem specification file. If 8-bit characters are now considered invalid, then I suppose the current behavior is acceptable, as a user should just be able to run gem pristine to reinstall the specification without 8-bit characters.

However, if gem specification files with 8-bit characters are still considered valid (to support people upgrading rubygems without reinstalling gems), I think the patch should still be committed.

Edit: The test would still need to be changed before it is committed.

@luislavena
Collaborator

It appears as though recent changes have hidden this issue. The reason it's no longer easily reproducible is that the gem specification files are now persisted differently than before (maybe in to_ruby_for_cache or something that calls).

@jeremyevans yes, the change was Gem::Specification#ruby_code for Issue #165

Would you mind rebasing the branch against latest master? This is the diff I made after solving the conflicts:

https://gist.github.com/19f244f71b3255776c7f

If you rebase and update the test, I will apply.

Thank you.

@jeremyevans

OK. I'll rebase and try to fix the test (hopefully sometime tomorrow). Thanks for your help.

@drbrain
Owner

This may have been partially fixed by #149 which is why it can't be reproduced as easily anymore.

Gem specifications are UTF-8. YAML is UTF-8 and the ruby source files have the UTF-8 encoding magic comment.

@jeremyevans jeremyevans Unbreak gem specification loading if specification file encoding is n…
…ot compatible with Encoding.default_internal

The :encoding option only specifies the external encoding.  If
Encoding.default_internal is set, it will automatically convert
it to the internal encoding.  If it cannot be converted (e.g.
internal encoding is US-ASCII and specification file contains
8-bit characters), an error is raised.

Instead, the :mode option should be given specifying that the
file should be left in its external encoding without converting
it to the default internal encoding.  This allows you to load
gem specification files with 8-bit characters and a default
internal encoding of US-ASCII.
930aaf4
@jeremyevans

OK, I rebased against master and used a single commit for both the test and the fix, with just the spec addition, the spec fails, and with the lib fix, the spec passes.

@luislavena
Collaborator

@jeremyevans awesome!, will check this out once get back home. Thank you for your time.

@luislavena
Collaborator

Landed in master at cedac81

Will update History and backport to 1.8 in a bit.

Thank you!

@luislavena
Collaborator

Backported in d449af3

Closing this now.

@luislavena luislavena closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 11, 2012
  1. @jeremyevans

    Unbreak gem specification loading if specification file encoding is n…

    jeremyevans authored
    …ot compatible with Encoding.default_internal
    
    The :encoding option only specifies the external encoding.  If
    Encoding.default_internal is set, it will automatically convert
    it to the internal encoding.  If it cannot be converted (e.g.
    internal encoding is US-ASCII and specification file contains
    8-bit characters), an error is raised.
    
    Instead, the :mode option should be given specifying that the
    file should be left in its external encoding without converting
    it to the default internal encoding.  This allows you to load
    gem specification files with 8-bit characters and a default
    internal encoding of US-ASCII.
This page is out of date. Refresh to see the latest.
View
2  lib/rubygems/specification.rb
@@ -900,7 +900,7 @@ def self.load file
return unless File.file?(file)
code = if defined? Encoding
- File.read file, :encoding => "UTF-8"
+ File.read file, :mode => 'r:UTF-8:-'
else
File.read file
end
View
23 test/rubygems/test_gem_specification.rb
@@ -308,6 +308,29 @@ def test_self_load_escape_quote
assert_equal @a2, spec
end
+ if defined?(Encoding)
+ def test_self_load_utf8_with_ascii_encoding
+ int_enc = Encoding.default_internal
+ Encoding.default_internal = 'US-ASCII'
+
+ spec2 = @a2.dup
+ bin = "\u5678"
+ spec2.authors = [bin]
+ full_path = spec2.spec_file
+ write_file full_path do |io|
+ io.write spec2.to_ruby_for_cache.force_encoding('BINARY').sub("\\u{5678}", bin.force_encoding('BINARY'))
+ end
+
+ spec = Gem::Specification.load full_path
+
+ spec2.files.clear
+
+ assert_equal spec2, spec
+ ensure
+ Encoding.default_internal = int_enc
+ end
+ end
+
def test_self_load_legacy_ruby
spec = Gem::Deprecate.skip_during do
eval LEGACY_RUBY_SPEC
Something went wrong with that request. Please try again.