Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Unbreak gem specification loading if Encoding.default_internal = 'ISO-8859-1' #146

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
5 participants
Contributor

jeremyevans commented Aug 2, 2011

Unbreak gem specification loading if Encoding.default_internal = 'ISO-8859-1' (or other encoding) and there are characters in the gem specification that aren't supported by that encoding. If you just use the :encoding option, that sets the external encoding, and ruby will automatically attempt to convert it to the default internal encoding if it is set and different from the external encoding. Using the :mode option with both encodings set means that no conversion is attempted, allowing you to load gems that contain characters that aren't supported by the default internal encoding.

Contributor

jeremyevans commented Aug 27, 2011

Here's a simple example of the problem. Note how this is not an obscure gem. It would be nice if this could get fixed before ruby 1.9.3 is released.

$ ruby19 -E 'ISO-8859-1:ISO-8859-1' -ve "require 'nokogiri'"                          
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-openbsd]
/usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:89:in `read': U+92F8 from UTF-8 to ISO-8859-1 (Encoding::UndefinedConversionError)
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:89:in `load_specification'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:153:in `block (2 levels) in load_gems_in'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:152:in `each'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:152:in `block in load_gems_in'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:149:in `reverse_each'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:149:in `load_gems_in'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:345:in `refresh!'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:78:in `from_gems_in'
    from /usr/local/lib/ruby/1.9.1/rubygems/source_index.rb:58:in `from_installed_gems'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:866:in `source_index'
    from /usr/local/lib/ruby/1.9.1/rubygems/gem_path_searcher.rb:81:in `init_gemspecs'
    from /usr/local/lib/ruby/1.9.1/rubygems/gem_path_searcher.rb:13:in `initialize'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:824:in `new'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:824:in `block in searcher'
    from <internal:prelude>:10:in `synchronize'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:823:in `searcher'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:484:in `find_files'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:966:in `load_plugins'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:1136:in `<top (required)>'
    from <internal:lib/rubygems/custom_require>:29:in `require'
    from <internal:lib/rubygems/custom_require>:29:in `require'
    from <internal:gem_prelude>:167:in `load_full_rubygems_library'
    from <internal:gem_prelude>:217:in `try_activate'
    from <internal:lib/rubygems/custom_require>:32:in `rescue in require'
    from <internal:lib/rubygems/custom_require>:29:in `require'
    from -e:1:in `<main>
Member

luislavena commented Aug 27, 2011

Hello,

Have you tested against latest ruby_1_9_3 codebase? Asking this because several encoding fixes has been made.

Contributor

jeremyevans commented Aug 27, 2011

It doesn't error out with ISO-8859-1 when requiring anymore with 1.9.3, but that appears to be due to a change in File.read, not in rubygems. Arguably it may even be a bug in File.read in 1.9.3:

$ ruby19 -E 'ISO-8859-1:ISO-8859-1' -ve "p File.read('/usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec', :encoding=>'UTF-8').encoding" 
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
#<Encoding:ISO-8859-1>
$ ruby19 -E 'ISO-8859-1:ISO-8859-1' -ve "p File.read('/usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec', :mode=>'r:UTF-8:-').encoding"  
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
#<Encoding:UTF-8>

So rubygems doesn't break only because File.read no longer raises an error. Arguably, File.read should raise an error because the gemspec file is in UTF-8 format and contains UTF-8 characters that are not representable in ISO-8859-1. Changing the :encoding option to :mode keeps the string in UTF-8 format, which it should as the data is UTF-8.

I'd say this is still a bug in rubygems, as it only happens to work because the gemspec is valid ISO-8859-1. Here's an example with US-ASCII that shows rubygems is still broken in this regard:

$ ruby19 -E 'US-ASCII:US-ASCII' -ve "p File.read('/usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec', :encoding=>'UTF-8').encoding" 
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
-e:1:in `read': U+00E9 from UTF-8 to US-ASCII (Encoding::UndefinedConversionError)
    from -e:1:in `<main>'
$ ruby19 -E 'US-ASCII:US-ASCII' -ve "p File.read('/usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec', :mode=>'r:UTF-8:-').encoding"  
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
#<Encoding:UTF-8>
$ ruby19 -E 'US-ASCII:US-ASCII' -ve "require 'nokogiri'"
ruby 1.9.3dev (2011-07-31 revision 32789) [x86_64-openbsd5.0]
/usr/local/lib/ruby/1.9.1/rubygems/specification.rb:538:in `read': U+00E9 from UTF-8 to US-ASCII (Encoding::UndefinedConversionError)
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:538:in `load'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:269:in `block (2 levels) in _all'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:268:in `each'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:268:in `block in _all'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:267:in `reverse_each'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:267:in `_all'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:409:in `each'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:441:in `find'
    from /usr/local/lib/ruby/1.9.1/rubygems/specification.rb:441:in `find_by_path'
    from /usr/local/lib/ruby/1.9.1/rubygems.rb:203:in `try_activate'
    from /usr/local/lib/ruby/1.9.1/rubygems/custom_require.rb:58:in `rescue in require'
    from /usr/local/lib/ruby/1.9.1/rubygems/custom_require.rb:35:in `require'
    from -e:1:in `<main>'

@ghost ghost assigned drbrain Sep 1, 2011

Owner

drbrain commented Sep 2, 2011

I don't see a test for this ☹

Contributor

jeremyevans commented Sep 2, 2011

It obviously fixes an issue, so do you need one if it doesn't break any existing tests?

I can probably write one, but it'll be a few days before I can get to it.

Owner

drbrain commented Sep 3, 2011

It fixes the issue today but without a test there's no guarantee that it will stay fixed tomorrow.

Contributor

jeremyevans commented Sep 6, 2011

OK, I added a test that breaks with the current code and is fixed by the one line patch.

Contributor

jeremyevans commented Oct 4, 2011

It's been over two months since this was originally reported, and it's a one-line change that fixes an obvious bug. The test case was added almost a month ago, and there have been no responses since. Is there a reason this hasn't been merged yet?

Contributor

jeremyevans commented Nov 8, 2011

It's been another month, so bumping this again so it isn't forgotten about.

Contributor

jeremyevans commented Dec 5, 2011

Another month, another bump.

Contributor

practicingruby commented Dec 5, 2011

@drbrain: Any reason why this isn't being pulled in? Looks like @jeremyevans added a test as requested.

Contributor

jeremyevans commented Jan 10, 2012

Another month, another bump.

Contributor

practicingruby commented Jan 10, 2012

@jeremyevans: Since it's been a while, this patch might not apply cleanly. Can you try it against the latest code in master and then update the pull request as necessary? If you do that, I'll merge it as long as the tests are still passing. This way either it'll get fixed or someone will revert it and give you a reason why.

Sorry that it has come to that, there's no reason why you should have waited this long for a response.

Member

luislavena commented Jan 10, 2012

@sandal I'm taking care of this right now, will see merge and test and some testing before pushing.

Contributor

practicingruby commented Jan 10, 2012

@luislavena Even better, thanks!

lsegal commented Jan 10, 2012

@jeremyevans, your test doesn't actually cause a failure for me without the extra patch using rubygems-master and ruby1.9.2p290. Also, ruby -E 'ISO-8859-1:ISO-8859-1' -ve "require 'nokogiri'" doesn't fail either (in the same Ruby). Am I missing something about the bug?

Member

luislavena commented Jan 10, 2012

@jeremyevans after solving the conflicts I get no failure from the test, however, using this test case from your comment

C:\Users\Luis>ruby -E 'US-ASCII:US-ASCII' -ve "p File.read('C:/Users/Luis/Tools/Ruby/ruby-1.9.3-p0-i386-mingw32/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0-x86-mingw32.gemspec', :encoding=>'UTF-8').encoding"
ruby 1.9.3p0 (2011-10-30) [i386-mingw32]
#<Encoding:US-ASCII>

That was supposed to trigger Encoding::UndefinedConversionError, but didn't

Neither does:

C:\Users\Luis>ruby -E 'US-ASCII:US-ASCII' -ve "require 'nokogiri'"
ruby 1.9.3p0 (2011-10-30) [i386-mingw32]

Using RubyGems 1.8.15.

Can you provide more information to recreate this issue with RubyGems 1.8.15?

I'll be around for a few more hours, so please let me know.

Thank you.

Contributor

jeremyevans commented Jan 11, 2012

It appears as though recent changes have hidden this issue. The reason it's no longer easily reproducible is that the gem specification files are now persisted differently than before (maybe in to_ruby_for_cache or something that calls). It used to be that gem specification files were stored with 8-bit characters, now they appear to be stored without 8-bit characters:

# Before
$ hexdump -C /usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec | head -n 25 | tail -n 3
00000160  69 6f 6e 20 3d 20 25 71  7b 4e 6f 6b 6f 67 69 72  |ion = %q{Nokogir|
00000170  69 20 28 e9 8b b8 29 20  69 73 20 61 6e 20 48 54  |i (é.¸) is an HT|
00000180  4d 4c 2c 20 58 4d 4c 2c  20 53 41 58 2c 20 61 6e  |ML, XML, SAX, an|
# Now
$ hexdump -C /usr/local/lib/ruby/gems/1.9.1/specifications/nokogiri-1.5.0.gemspec | head -n 22 | tail -n 3
00000130  73 63 72 69 70 74 69 6f  6e 20 3d 20 22 4e 6f 6b  |scription = "Nok|
00000140  6f 67 69 72 69 20 28 5c  75 7b 65 39 7d 5c 75 7b  |ogiri (\u{e9}\u{|
00000150  38 62 7d 5c 75 7b 62 38  7d 29 20 69 73 20 61 6e  |8b}\u{b8}) is an|

Whether this is still a problem depends on whether you consider 8-bit characters as valid in a gem specification file. If 8-bit characters are now considered invalid, then I suppose the current behavior is acceptable, as a user should just be able to run gem pristine to reinstall the specification without 8-bit characters.

However, if gem specification files with 8-bit characters are still considered valid (to support people upgrading rubygems without reinstalling gems), I think the patch should still be committed.

Edit: The test would still need to be changed before it is committed.

Member

luislavena commented Jan 11, 2012

It appears as though recent changes have hidden this issue. The reason it's no longer easily reproducible is that the gem specification files are now persisted differently than before (maybe in to_ruby_for_cache or something that calls).

@jeremyevans yes, the change was Gem::Specification#ruby_code for Issue #165

Would you mind rebasing the branch against latest master? This is the diff I made after solving the conflicts:

https://gist.github.com/19f244f71b3255776c7f

If you rebase and update the test, I will apply.

Thank you.

Contributor

jeremyevans commented Jan 11, 2012

OK. I'll rebase and try to fix the test (hopefully sometime tomorrow). Thanks for your help.

Owner

drbrain commented Jan 11, 2012

This may have been partially fixed by #149 which is why it can't be reproduced as easily anymore.

Gem specifications are UTF-8. YAML is UTF-8 and the ruby source files have the UTF-8 encoding magic comment.

Unbreak gem specification loading if specification file encoding is n…
…ot compatible with Encoding.default_internal

The :encoding option only specifies the external encoding.  If
Encoding.default_internal is set, it will automatically convert
it to the internal encoding.  If it cannot be converted (e.g.
internal encoding is US-ASCII and specification file contains
8-bit characters), an error is raised.

Instead, the :mode option should be given specifying that the
file should be left in its external encoding without converting
it to the default internal encoding.  This allows you to load
gem specification files with 8-bit characters and a default
internal encoding of US-ASCII.
Contributor

jeremyevans commented Jan 11, 2012

OK, I rebased against master and used a single commit for both the test and the fix, with just the spec addition, the spec fails, and with the lib fix, the spec passes.

Member

luislavena commented Jan 11, 2012

@jeremyevans awesome!, will check this out once get back home. Thank you for your time.

Member

luislavena commented Jan 12, 2012

Landed in master at cedac81

Will update History and backport to 1.8 in a bit.

Thank you!

luislavena added a commit that referenced this pull request Jan 12, 2012

Member

luislavena commented Jan 12, 2012

Backported in d449af3

Closing this now.

@luislavena luislavena closed this Jan 12, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment