Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Invalid byte sequence UTF-8 error in ruby 1.9.3 rails 4.0 #13816

Closed
giladbu opened this Issue Jan 23, 2014 · 5 comments

Comments

Projects
None yet
5 participants

giladbu commented Jan 23, 2014

I'm experiencing a problem with trying to to force unicode on a string.

Loading development environment (Rails 4.0.2)
1.9.3p484 :001 > name = "Foo Bar Baz Boing"
 => "Foo Bar Baz Boing"
1.9.3p484 :002 > name.mb_chars.downcase.to_s
 => "f\u0005\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"
1.9.3p484 :003 > name.mb_chars.downcase.to_s
 => "foo bar baz boing"

First try of downcasing the string is invalid and sometime will throw an ArgumentError "invalid byte sequence in UTF-8"

second time its returning fine.

This came in relation to issue: mbleigh/acts-as-taggable-on#385

yagooar commented Jan 28, 2014

I can confirm that bug under Ruby 1.9.3-p448. This looks like an autoload issue to me, but I'm not sure how to dig deeper to debug this.
This is also valid for the methods #upcase and #swapcase (as they internally call Unicode in the same way).

New shell:

irb(main):001:0> require 'active_support/multibyte/unicode'
=> true
irb(main):002:0> ActiveSupport::Multibyte::Unicode.downcase('Foo Bar Baz Boing')
=> "foo bar baz boing"

Exit, then open new shell:

irb(main):001:0> require 'active_support/multibyte'
=> true
irb(main):002:0> ActiveSupport::Multibyte::Chars.new('Foo Bar Baz Boing').downcase
ArgumentError: invalid byte sequence in UTF-8
    from /Users/mateusz/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/activesupport-4.0.2/lib/active_support/multibyte/unicode.rb:377:in `each_codepoint'
    from /Users/mateusz/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/activesupport-4.0.2/lib/active_support/multibyte/unicode.rb:377:in `each'
    from /Users/mateusz/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/activesupport-4.0.2/lib/active_support/multibyte/unicode.rb:377:in `map'
    from /Users/mateusz/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/activesupport-4.0.2/lib/active_support/multibyte/unicode.rb:377:in `apply_mapping'
    from /Users/mateusz/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/activesupport-4.0.2/lib/active_support/multibyte/unicode.rb:295:in `downcase'
    from /Users/mateusz/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/activesupport-4.0.2/lib/active_support/multibyte/chars.rb:123:in `downcase'
    from (irb):2
    from /Users/mateusz/.rbenv/versions/1.9.3-p448/bin/irb:12:in `<main>'
irb(main):003:0> ActiveSupport::Multibyte::Chars.new('Foo Bar Baz Boing').downcase
=> foo bar baz boing

So what I think is happening is that once 'active_support/multibyte/unicode is loaded, it just works.

@leo-souza leo-souza referenced this issue in mbleigh/acts-as-taggable-on Mar 20, 2014

Merged

Use database's lower function for case-insensitive match #498

omegahm commented Jul 11, 2014

I have a similar problem to this. Completely new Rails project (Ruby 1.9.3-p545 and Rails 4.0.8):

rails new utf_hell

then open a Rails console and open a file, which contains the following text:

Bemærk: extra lead

and try to upcase it:

File.open('../utf_file.txt').read.mb_chars.upcase

This sometimes result in "invalid byte sequence in UTF-8", but what's even stranger is that it sometimes gives me random "words" starting with "B", e.g. "BSOLATE", "BARROW", "BOMPAT" and so on. This second problem isn't helped by requiring active_support/multibyte/unicode, however this hack does circumvent the problem (DON'T USE IN PRODUCTION!)

class ActiveSupport::Multibyte::Chars
  alias :old_upcase :upcase

  def upcase
    old_upcase rescue nil # There be dragons!
    old_upcase
  end
end

@rails-bot rails-bot added the stale label Nov 19, 2014

This issue has been automatically marked as stale because it has not been commented on for at least
three months.

The resources of the Rails team are limited, and so we are asking for your help.

If you can still reproduce this error on the 4-1-stable, 4-0-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions.

omegahm commented Nov 20, 2014

It probably has to do with the Ruby version. Everything works fine in 2.1.3 and Rails 4.1.5.

Owner

rafaelfranca commented Nov 20, 2014

Thank you for confirming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment